Digital Documents Integrity Protection Using Invisible Changeable Watermark

.


INTRODUCTION
Digital storage, signal processing, and communication infrastructure developments in recent years have made it possible to distribute digital media widely.Multimedia commerce transactions benefit from the introduction of a flexible and affordable business strategy brought about by digital distribution.The fact that the information is digital also makes it possible for anybody to edit, copy, or access material outside of the restrictions specified for the specific transaction.In regard to this, integrity protection has gained significance in the field of digital technology.It's critical to be able to check the model's integrity once it's been deployed into a safety-critical system to ensure that it hasn't been tampered with.
The performance impact has been neglected in most relevant studies on cryptography and copyright protection for authentication and integrity protection due to their high complexity and relatively considerable redundant implementation overheads.This is particularly true for more straightforward applications where integrity protection and authentication confirmation are the only requirements.
There are other explanations for why encryption by itself isn't always a perfect answer in some situations.For example, sending data without any secrecy or unnecessary expense may be better in certain applications, where encryption involves resource overheads.Other situations include those in which some network management protocols distinguish between integrity and confidentiality, making encryption on its own unacceptable.
A watermarking approach is used to embed a watermark into the model to be protected and verify the system's integrity by checking if the watermark is intact to achieve this goal.
Digital watermarking is an area of computer security that deals with the protection of digital things in a broad sense.Watermarking differs from digital signatures in that a watermark signal is embedded directly into the digital host object.This distinguishes watermarking from digital signatures, a set message embedded in digital files.This hidden information is later utilized to determine who owns the data.
Many watermarking algorithms have been designed with various goals in mind, but they all have the same high-level structure; in general, watermarking an object includes two complementary actions [1].
The first one is embedding, which is done just once and, in most cases, off-line, thus it may take up more computing resources than the second operation.The embedding step embeds a watermark sequence into the host object.A secret key to make the entire process safe, or to improve its security.The validation is the second step: this method is carried out every time they need to check if an object has a watermark signal arise, thus a light and quick approach is recommended.
The specific requirements of each watermarking application dictate the features that must be included in the system and influence the methods chosen for embedding and detecting the watermark.Real-world system characteristics that are often discussed include the data capacity of the watermark, visibility of the embedded mark, immunity of the detector to false alarms, security, and various forms of robustness against distortion (caused by routine processing operations or changes in geometry) and attack [2,3].One feature that isn't as frequently discussed but is crucial for numerous real-world applications is performance, or the speed at which the watermark is detected and embedded.
The main objective of this research paper is to secure the transfer of digital documents.The following is the contribution of this study: • The proposed code detects forgery in the text without the need to compare it with the original text and also locate the forgery.• Its benefit over other methods is that the generated code is spread across the whole test image, not only in the non-interest pixels.• The proposed algorithm detects the location and the number of tampered words.• The watermark is dynamic and based on word information, it protects the watermarked item from both unintentional and intentional attacks, and the watermark signal cannot be retrieved by unauthorized entities.• The weight values are modified in the least significant bits (LSB), the distortion caused by the watermark embedding is lower than that introduced by the method for images.• Light and fast integrity check: Since each word has its watermark, it is easy to discover any tampering.• It overcomes the difficulty that occurs in spatial domain algorithms that cannot provide both robustness and protection against intentional attacks.The study paper is structured as follows: in Section 2, the most recent studies in this topic were examined, along with the most recent research findings.The suggested approach design is shown in Section 3. In Section 4, the findings are examined, and the effectiveness of the suggested algorithm is ascertained.Lastly, Section 5 is conclusion.

RELATED WORKS
Day by day the amount of information and data transferred on the Internet is increasing, especially on social community platforms.Moreover, with the rapid advancement of image processing tools, it has become easier to change data and images as they are being transferred.Transferring exams via the Internet and securing them against deliberate change has become one of the most important problems facing workers in the educational aspect.
In addition to standard cryptographic techniques [4][5][6][7], watermarking has been proposed to solve the above concern .Watermarking is a technique for achieving image authentication and tamper detection by embedding watermark bits into the original image [25].
The watermarking idea may be common, but there are different methods to perform it.In recent years, researchers have proposed several watermarking approaches in the spatial domain or frequency domain and recently the Neural Networks.
An invisible watermark is inserted in the cover text to make the trace imperceptible to the readers.In addition, all of the terms in the original text are marked using an instance-based learning method [30].
A wavelet-based copyright certification method was proposed that does not require the original image for watermark verification and uses cryptography technologies like digital signatures and timestamps to make the copyright certification publicly verifiable [31].
An adaptive copyright protection strategy is proposed.This innovative technique allows the owner layer to change the watermark's intensity using a threshold, which can improve the watermarking algorithm's robustness [32].
To counteract model stealing attempts, Szyller et al. [33] altered the neural network's categorization output.Al-onazi et al. [1] proposes a white box watermarking approach for (Deep) Neural Network integrity protection.Because it inserts a watermark bit string into the network's parameters, it may be used with any type of neural network design (deep or shallow).
To summarize the conclusions from previous research: The splitting of the original image into the region of interest and the region of non-interest is a method used by most known algorithms.The ability of the algorithms to retrieve the watermark from the watermarked image determines their efficiency.The general idea of all watermark algorithms depends on inserting an external image inside the original image as a watermark.Although some of the works relied on watermark verification not to refer to the original image [31], they use the same general idea by inserting an image or logo inside the original image, other than the idea on which the proposed work was based.
This means that the watermark will be included in the region of non-interest, otherwise the image quality will deteriorate.To prove the inaccuracy of the perception.In this work is linked with the watermark while maintaining the image quality.In addition to many tests that the image is subjected to ensure its authenticity, unlike the proposed model.
One of the most important features of the proposed watermark is that it is smart and strict so that the watermark is created for each word within an image and not depending on the image as a whole.The watermark for each word contains the location of the word and the number of its letters, and the letters are converted to binary code using ASCII.

SDWM ALGORITHM
The encoder and the decoder are the two main components of digital watermarking systems [34].To create a watermarked image from a digital document, you'll need an image, a watermark that contains the watermarking information, a security key, and an encoding method to make a watermarked image from a digital document.
Most of the works in the field of watermark depend on separating the image into a region of interest and a region of noninterest and the watermark is created in a region of noninterest.
The suggested watermark is unique in its creation in that it is spread throughout the image words and at the same time, it does not affect the efficiency of the image.The proposed algorithm does not require the original image for watermark verification.For each word in the exam, its watermark is created depending on the word's location and the number of its letters.In the case of forgery, the forged words can be extracted from the watermark.Figure 1 shows the whole proposed watermark procedure.The whole structure of the proposed algorithm in the receiving path is summarized below.Figure 3   Otherwise, the image tempered, and, in this case, the receiver sends a notification to the sender.

IMPLEMENTATION AND EXPERIMENTAL RESULTS
Through the use of a technique called digital watermarking, identifying information can be subtly and imperceptibly encoded into a data carrier without affecting data usage.This technology frequently safeguards text files, databases, and multimedia content from infringement.The evaluation approach won't be the typical one because the process of creating the watermark is entirely different from that of ordinary watermarks.
There are two categories of attacks on conventional watermarks [31].The Median Filter attack, JPEG attack, and the Gaussian Noise Attack are examples of conventional attacks.The translation, cropping, rotation, and scaling attacks are some of the Geometrical Attacks.
The extraction of the watermarked image from the source image is the primary goal of classical algorithms.However, the goal of the suggested approach is to simultaneously maintain the information and the watermark.
The test platform for the experiment is a 2.3 GHz Intel Core i3 processor with 4GB RAM, using MATLAB 2015 Version 8.5.0.197613.
To illustrate the efficiency of the proposed watermark, it practically applied to one of the International Baccalaureate exams.Figure 4 shows an example of the International Baccalaureate exams.
After reading the exam image, the exam image is converted to text using OCR.The word orders are calculated, and the boundary boxes are detected.Figure 5 shows the detected boundaries.
Each word will be saved in a sub-image.The dynamic digital watermark is created for each word (3 digits for word order, 2 digits for word character count, and 7 bits for each character).The characters are converted to binary using ASCI II.The following example is to illustrate the idea and learn how to detect manipulation.In Figure 5, which represents a sample of the International Baccalaureate exam, in the first line, we find the word "and".The word "and" is the number 16 in the text and its number of letters is three letters.According to the ASCII Code in Table 1, the calculated SDWM for 'and' is: 1100001100011101101100001100111100001110111011001 00 If we assume that the word "and" is replaced by the word "or" from the same text, we find that the watermark will be changed because the number of letters of the word has changed (two letters), as well as the ASCI code for the word or, and the new watermark will be: 11000011000111011011000011001111011111110010 In the case of replacing the word "and" with the word "or" from outside the text, we find that the watermark is either not present or the watermark is incorrect: In all these cases, manipulation will be detected because one of the basics of the code is that the word information is retrieved from the created watermark.The position and quantity of tampered words are detected by the suggested algorithm.The capacity of the suggested method to identify the change of two words in the original exam is shown in Figure 6.The word "medicines" was used in place of "drugs" in the example from the previous exam.
Contrary to typical integrity protection, there is a high correlation between neighbouring pixels even though the watermark is included in each word pixel of the text pictures.The comparable correlation between the original and watermarked photographs is displayed in Figures 7 and 8, respectively.
An additional effective feature of the suggested method is the histogram of the original and watermarked image.The histograms of the original and watermarked photographs are shown in Figures 9 and 10.Our method performs much better than the current state of the art in terms of image evaluation metrics such as PSNR (peak-signal-to-noise-ratio), SSIM (structural similarity index map), UIQI (Universal Image Quality Index), and SSIM [35].
To determine the degree of difference between the original image and the one produced by our proposed model (watermarked image), we employ the following established metrics: MSE, MAE, PSNR, SSIM, and UIQI.A greater PSNR ratio indicates better image creation.The value of SSIM increases as image distortion decreases.
Image distortion: To help with the distortion, the following metrics are used: MSE, MAE, Universal Image Quality Index (UIQI), and Structural Similarity Index (SSIM).The mean squared error (MSE) is computed by averaging the squared intensity differences between the reference image and the watermarked image.The (SSIM) is computed by normalising the mean structural similarity value between the original and watermarked images [34][35][36][37][38][39].Any image distortion is modelled by the UIQI.Imperceptibility: It suggests that the watermarked image and the host image are similar [40,41].
Invisibility evaluation: The visual quality is measured by contrasting the original and watermarked images' PSNRs [42][43][44].Figure 11 makes it evident that there is no discernible difference between the exam with the watermark and the original exam.
Figure 11.The watermarked exam A high degree of similarity between the original and watermarked image is displayed in Table 2.This illustrates the suggested code's great efficiency without changing the exam's original content.22.36 Shih and Wu [46] 29.23 Tai et al. [47] 38.0 Yang et al. [48] 29.39 Wang et al. [49] 44.64 Shih et al. [50] 51.24 Another image is used to calculate PSNR.In other words, one image is watermarked, whereas the other is original.A high PSNR suggests that the watermarked image is highly similar to the original.The results in Table 2 made it evident that the suggested method produced a high PSNR.
Table 3 shows a significant discrepancy between PSNR and the suggested model when compared to some other publications.PSNR, however, is effective for comparing intensities.It doesn't offer any structural details.For this reason, picture quality is compared using SSIM or UIQI.
Security of the watermark: because the watermark is dynamic and based on the word information, it protects the watermarked item from both unintentional and intentional attacks, and the watermark signal cannot be retrieved by unauthorized entities.

CONCLUSION
In this research paper, a framework has been proposed through which the highest safety standards are achieved in transferring digital images and saving time, cost, and effort.
A crucial aspect of the suggested watermark is its clever and stringent design, which ensures that the watermark is made specifically for each word in an image rather than relying on the image as a whole.Every word has a watermark that includes its location and letter count; ASCII is used to turn the letters into binary information.
The primary advantage of the proposed watermark is that, in the event of manipulation, the original letters can be recovered because the word size is split by the watermark's size.In the recommended method, the original image is not required for watermark verification.
To demonstrate the high similarity between the original image and the watermarked one MSE, PSNR, MAE, UIQI and SSIM are used.
The first step in our proposed algorithm is converting the document's image into text using Optical Character Recognition (OCR), which is one of the limitations of our approach.
In future work, different methods for the OCR will be implemented and utilized instead of the current traditional OCR.The OCR based on AI will replace the traditional OCR.The work will extend not only to finding the areas that have been subjected to deliberations but also to self-correction of changes will be added.

Figure 1 .
Figure 1.The whole proposed watermark procedure The whole structure of the proposed algorithm in the transmitting path is summarized below.Figure 2 illustrates the proposed algorithm in the transmitting path.⮚ Step1.Read the input image that contains text.⮚ Step2.Read the image contents (Words) by OCR MATLAB Function.⮚ Step3.Save the value and locations of the word in the image.

Figure 2 .
Figure 2. The proposed algorithm in the transmitting path illustrates the proposed algorithm in the receiving path.⮚ Step1.Read the integrity-protected watermarked image.⮚ Step2.Verify the integrity-protected watermarked image.⮚ Step3.Calculate the words numbers, calculate the boundary boxes, and calculate each word characters.⮚ Step4.Segment the image into sub-images based on the words.⮚ Step5.Separate one layer of the image.⮚ Step6.Calculate the watermark, merge the word locations, and word characters then convert the merged words to the binary form.⮚ Step7.Extract the embedded watermark.⮚ Step8.Compare the calculated watermark with the extracted one.If the two watermarks are the same then the image is not tempered, then the image will be printed.

Figure 3 .
Figure 3.The proposed algorithm in the receiving path

Figure 4 .Figure 5 .
Figure 4. Sample of the international baccalaureate exam

Figure 6 .
Figure 6.Detect the location of the tampered words

Table 2 .
The MAE, MSE, PSNR, SSIM, and UIQI of the proposed algorithm

Table 3 .
Signal to noise ratio of some of the existing approaches