CN104899822A - Watermarking embedding and authentication method of positioning PDF electronic invoice falsification - Google Patents

Watermarking embedding and authentication method of positioning PDF electronic invoice falsification Download PDF

Info

Publication number
CN104899822A
CN104899822A CN201510339156.7A CN201510339156A CN104899822A CN 104899822 A CN104899822 A CN 104899822A CN 201510339156 A CN201510339156 A CN 201510339156A CN 104899822 A CN104899822 A CN 104899822A
Authority
CN
China
Prior art keywords
file
measured
information
text
pdf
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510339156.7A
Other languages
Chinese (zh)
Other versions
CN104899822B (en
Inventor
陈帆
张旋
和红杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest Jiaotong University
Original Assignee
Southwest Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest Jiaotong University filed Critical Southwest Jiaotong University
Priority to CN201510339156.7A priority Critical patent/CN104899822B/en
Publication of CN104899822A publication Critical patent/CN104899822A/en
Application granted granted Critical
Publication of CN104899822B publication Critical patent/CN104899822B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Editing Of Facsimile Originals (AREA)
  • Image Processing (AREA)
  • Storage Device Security (AREA)

Abstract

Provided is a watermarking embedding and authentication method of positioning PDF electronic invoice falsification, comprising: A, watermarking generation based on a file reading page, concretely including blocking a reader page, parsing a code page file structure, generating authentication information, and generating watermarking in dependence on the authentication information; B, watermarking embedding based on file codes; C, updating of a cross reference and an end-of-file; and D, generation of a PDF electronic invoice file with watermarking, including performing E authentication information reconstruction, authentication information extraction and F falsification authentication on a PDF electronic file to be detected, and F falsification authentication including falsification authenticity determination and falsification positioning and marking. The method can guarantee invisibility of watermarking, and meanwhile control file increment within 1%. The method has the characteristics of high falsification detection accuracy, and low false drop rate, and can effectively resist replacing, adding, deleting, etc.

Description

A kind ofly can locate the watermark embedment and authentication method that PDF electronic invoice distorts
Technical field
The present invention relates to and a kind ofly can locate the watermark embedment and authentication method that PDF electronic invoice distorts, belong to electronic invoice field of authentication.
Background technology
As far back as 2012, China just started electronic invoice pilot work, reduced the paper using waste opened paper invoice and cause." invoice management method " execution from 1 day April in 2013 that the State Tax Administration puts into effect, requires that all entity and individual issued invoice provide invoice to consumer.The invoice of present stage, when being used as financial accounting voucher, still needs to be printed on by electronic invoice in " specific papery ".That is, the entity and individual issued invoice still need the paper invoice of printing to send to consumer by express delivery, cause the waste of human and material resources.For this reason, in June, 2013 and Dec, China successively in Beijing, two places, Shanghai start pilot electronic invoice.So far electronic invoice is successfully opened more than 5,000 ten thousand.
On November 16th, 2014, the General Office of the State Council issues " about promoting some suggestions that internal trade circulation develops in a healthy way ", clearly proposes " to accelerate to advance electronic invoice application, to improve the reimbursement of electronic accounting voucher, the supplementary measures such as registered on the account and filing keeping ".This is the first public popularization electronic invoice of State Council's aspect, the bright expression of National Electrical business standard Overall Group Deputy Chief of Branch Qiu Yue, along with the fast development of ecommerce, the innovation of invoice business must be driven, the more traditional invoice of electronic invoice has the natural sex advantage that cost is low, efficiency is high, is expected to the coming years be promoted on a large scale in the whole nation.
And during two Conferences in this year, promote electronic invoice, minimizing energy dissipation, equally also become the focus that committee members pay close attention to.And with regard to the pilot situation of current multiple provinces and cities, promote the problem that electronic invoice mainly runs into " difficulty that keeps accounts, reimbursement difficulty ".To this, two Conferences committee member, the Suning group chairman of the board Zhang Jindong that controls interest advises in the motion of " accelerate advance electronic invoice reimbursement keep accounts ", the preferential legal efficacy issues from law, Systematic solution electronic invoice, establishment electronic invoice as the legal status of the legal voucher that keeps accounts, and then advance electronic invoice universal in the whole country.And after the legal status of electronic invoice is identified, thereupon to solve the problem of " difficulty that keeps accounts, reimbursement difficulty ".Due to PDF, to have security high, and portable good, memory capacity is little, and therefore existing electronic invoice adopts PDF mostly.
The existing protection algorism for PDF electronic invoice is the method based on digital signature mostly, and it can differentiate whether invoice is tampered, but can not realize the function of tampering location, and its result of determination cogency is poor.
Adopting the authentication method of watermark embedment can judge distorting of pdf document, also position distorting.But its identifying algorithm great majority utilize redundant space embed watermark in cross reference table, carries out information insertion according to the method to the electronic invoice of PDF, then watermark capacity is very little and security is low, brings restriction to the tampering location after it.
Summary of the invention
The object of this invention is to provide and a kind ofly can locate the watermark embedment and authentication method that PDF electronic invoice distorts.The method ensure realize watermark sightless while, file increment is little, controls within 1%.It is high to the detection accuracy of distorting, and false drop rate is low, effectively can resist replacement, interpolation and deletion etc. and distort.
The present invention solves its technical matters, and the technical scheme adopted is: a kind ofly can locate the watermark embedment and authentication method that PDF electronic invoice distorts, and comprises the steps:
A, to generate based on the watermark of file reading page
A1, Segment: by the reader page P of original PDF electronic invoice file, be divided into I page block P according to the intrinsic distribution coordinate of message unit i, i.e. P={P i| i=1,2 ..., I}, i are page block P isequence number, I is the page block P on whole reader page P isum;
A2, resolution file structure: pdf document structure elucidation is carried out to the code page S of original PDF electronic invoice file, obtains the file header S indicating pdf document version information h, file body S o, cross reference table S xwith end-of-file S tfour parts, that is: S={S h, S o, S x, S t;
Wherein, file body S oby J object O j(j=1,2 ..., J) composition, i.e. S o={ O j| j=1,2 ... J}, j are object number;
Cross reference table S xby J index information X j(j=1,2 ..., J) composition, X j={ object offset address x j, represent the number of times that this object is revised, whether object is by the state used };
End-of-file S tcomprise file description, encrypted message and file body S oin root object number, summary object number and object sum;
A3, authentication information generate:
To each page block P i(i=1,2 ..., I), first utilize Itextsharp assembly to extract each page block P iin content of text T i, T i={ t i,q| q=1,2 .., z i, and text coordinate D i, D i={ d i,q| q=1,2 .., z i, wherein, t i,qfor content of text T iin q message unit, z ifor content of text T ithe message unit number comprised, d i,qfor message unit t i,qcorresponding coordinate information, each message unit t i,qbetween to need to add line last identifier " r n ";
Then, based on secret key k 1generate content of text T iwith text coordinate D isummary info H i, H i=Hash (k 1, T i|| D i); Wherein, Hash (.) represents Hash function, || represent attended operation;
Finally by page block P icontent of text T i, text coordinate D iwith summary info H icouple together and form page block P iauthentication information A i, A i=T i|| D i|| H i;
A4, watermark information generate: to each page block P i(i=1,2 ..., I), by the authentication information A that A3 step generates icarry out compressing and utilize secret key k 2binary compressed data stream B is obtained after encryption i, and add PDF information Y i, generate PDF object W to be added i, W i={ B i, Y i, calculate its byte length simultaneously, be designated as l i;
Described PDF information Y icomprise: object number i+J, object modification number of times, the keyword of object, newline;
B, watermark embedment based on document code
First the individual PDF object W to be added of I A4 step generated ias page block P iwatermark information W={W i| i=1,2 ... I}, by the file body S that this watermark information W and A2 walks o={ O j| j=1,2 ... J} merges, and obtains comprising N=I+J combining objects O ' nmerged file body S ' o=O ' n| n=1,2 ... N}, wherein n is combining objects O ' nobject number, O n ′ = { O j , n ≤ J W n - J , n > J ; Simultaneously according to offset address information x j, calculate each combining objects O ' nlength l n ′ , l n ′ = l J o , n ≤ J l n - J , n > J , Wherein
Then, based on key k 3generating length is the pseudo-random sequence of I+J, sorts from small to large to this sequence, and the positional information before sequence, by the positional alignment after sequence, forms object scrambling sequence V, V={v n| n=1,2 ..., I+J}, wherein v nfor combining objects O ' nnew object number, v nthe integer of ∈ [1, I+J]; Carry out scramble operation according to this, by combining objects O ' nobject number be revised as v neven, obtain immediately after scramble containing watermark file body S o w = { O v n ′ | v n = 1 , 2 , ... , I + J } ;
The renewal of C, report to the leadship after accomplishing a task concordance list and end-of-file
Watermark file body is contained according to what obtain in B step with object length information l ' ncalculate the offset address information after merging x v n w = a , n = 1 x v n - 1 w + l n - 1 ′ , n ∈ [ 2 , I + J ] , Wherein: a is first object O 1offset address, its occurrence is the file header S of pdf document version information hbyte length; Regenerate concordance list of reporting to the leadship after accomplishing a task updating file tail S simultaneously tin root object number, summary object number and object sum, generate the end-of-file after upgrading
D, containing watermark PDF electronic invoice file generated
What walked by B contains watermark file body concordance list of reporting to the leadship after accomplishing a task after upgrading with C step and end-of-file merge reconstruct, namely obtain the invoice file F containing watermark w;
The reconstruct of E, authentication information and authentication information extract
The authentication information reconstruct of E1, document to be detected:
By PDF electronic invoice file F to be measured #reader page P #, be divided into I page block P to be measured according to the intrinsic distribution coordinate of message unit i #, i.e. P #={ P # i| i=1,2 ..., I};
To each page block P to be measured # i(i=1,2 ..., I), first utilize Itextsharp assembly to extract each page block P to be measured # iin content of text T to be measured # i, T # i={ t # i,q| q=1,2 .., z i, and text coordinate D to be measured # i, D # i={ d # i,q| q=1,2 .., z i, wherein, t # i,qfor content of text T to be measured # iin q message unit to be measured, z ifor content of text T to be measured # ithe message unit number comprised, d # i,qfor message unit t to be measured # i,qcorresponding coordinate information to be measured, each message unit t to be measured # i,qbetween to need to add line last identifier " r n ";
Then, based on secret key k 1generate content of text T to be measured # iwith text coordinate D to be measured # itext snippet information H to be measured # i, H # i=Hash (k 1, T # i|| D # i);
E2, authentication information extract: to PDF electronic invoice file F to be measured #code page S #carry out parsing and obtain file body to be measured according to secret key k 3with object scrambling sequence V to I+J object to be measured carry out being inverted unrest, find object to be measured i pseudo-object of middle correspondence and watermark information W to be measured i *(i=1,2 ..., I), decompress(ion) watermark information W to be measured i *in packed data B to be measured i *, and utilize secret key k 2deciphering obtains authentication information A to be measured i *;
According to content of text T to be measured in E1 # i, text coordinate D to be measured # iwith text snippet information H to be measured # ilength, by left-to-right successively from authentication information A to be measured i *be partitioned into corresponding page block P i *extraction content of text T i *, extract text coordinate D i *, extract text snippet information H i *, and according to row last identifier " r n " text message T will be extracted i *be partitioned into information extraction unit t * i,q, and from extraction text coordinate D i *separate corresponding extraction coordinate unit d * i,q, i.e. T i *={ t * i,q| q=1,2 .., z i, D i *={ d * i,q| q=1,2 .., z i;
F, distort certification
F1, distort authenticity judge: if content of text T to be measured # iwith extraction content of text T i *equal, and text snippet information H to be measured i #with extraction text snippet information H i *equal, then judge PDF electronic invoice file F to be measured #be not tampered, the result that output detections passes through, complete detection; Otherwise, judge that PDF electronic invoice file to be measured is tampered, carry out the operation of F2 step;
F2, tampering location and mark: during F1 is walked, regard as the invoice file F distorted #, find out and information extraction unit t * i,qunequal message unit t to be measured # i,q, be and be tampered message unit t # i,q, the result that output detections does not pass through; And according to the coordinate information d of its correspondence # i,q, be tampered message unit t # i,qadd mark information warning, make the page P of its correspondence #display word content place demonstrates distorts identification information.
Compared with prior art, the invention has the beneficial effects as follows:
One, telltale mark is distorted: because of this method using the content of message unit and its coordinate figure etc. as watermark information, therefore after detecting that invoice is tampered, system can utilize in invoice and be tampered coordinate information corresponding to message unit by its precise marking, and pass through the attribute of the identification information of setting, on reader, highlight as adding band coloured graticule, marking frame etc. the message unit be tampered.
Two, large watermark capacity, high security: compare and utilize redundant space embed watermark in cross reference table, its watermark capacity is little, watermark information is easily caused security low by sending out, page info is divided into 5 blocks by the invoice content that this method shows according to PDF reader, by the content information of the invoice block text after piecemeal, the coordinate information of text, the hash value of the label information of artificial interpolation and text message and coordinate information is as watermark information, through encryption, compress and add part end to end necessary in object, the PDF legal object that disguises oneself as respectively embeds in pdf document, and utilize all objects in secret key scramble file body, finally reconstruct pdf document, thus allow suspicion person be difficult to the forgery object finding to have embedded, therefore security is high, simultaneously the watermark information that comprises of each object is much larger than the redundant space in cross reference table, watermark capacity is sufficient.
Three, location complexity is low, locating information is complete: text messages all for the invoice page and coordinate information are directly carried out pre-service such as encryption, compression etc. and embed in PDF source file as watermark information by this method, file control is within 1%, therefore the watermark amount needed is few, file increment is less, and locating information is complete; During location, find the object containing watermark can reduce watermark information through the step such as decompressions, deciphering, therefore location complexity is low.
In A2 step of the present invention, pdf document structure elucidation is carried out to the code page S of original PDF electronic invoice file, obtain the file header S indicating pdf document version information h, file body S o, cross reference table S xwith end-of-file S t.File header S hdo not have practical significance in the present invention, therefore do not analyze, and file body S oinformation can be found by cross reference table, therefore only need analyze cross reference table S xwith end-of-file S ttwo parts, its specific practice is:
The code page S of original PDF electronic invoice file is read buffer area with the form of byte, finds end-of-file S tuniquely identified key word " trailer " (because this invoice is linear pdf document, therefore only there is a key word " trailer ") information, namely have found end-of-file S t, and log file tail S tin the information such as " Size " key word root object information below, summary object information, object sum, namely obtain end-of-file S tinformation;
Search cross reference table S simultaneously xuniquely identified key word " startxref ", is thereafter the reference list S that reports to the leadship after accomplishing a task x, record cross-reference table S xin each offset address x j, and the object number information corresponding with it, namely obtain end-of-file S xinformation.
Below in conjunction with the drawings and specific embodiments, the present invention is described in further detail.
Accompanying drawing explanation
Fig. 1 is for checking the inventive method distorts certification effect original PDF electronic invoice used to the replacement of PDF electronic invoice partial content.
Fig. 2 be to the PDF electronic invoice embed watermark of Fig. 1 after containing the invoice file of watermark.
Fig. 3 replaces the PDF electronic invoice after distorting to the invoice file part content containing watermark of Fig. 2.
Fig. 4 is the PDF electronic invoice PDF electronic invoice after the distorting of Fig. 3 being carried out to tampering location and mark.
Fig. 5 is for checking the inventive method is added PDF electronic invoice content, deleted and distort certification effect original PDF electronic invoice used.
Fig. 6 be to the PDF electronic invoice embed watermark of Fig. 5 after containing the invoice file of watermark.
Fig. 7 adds the invoice file content containing watermark of Fig. 6, deletes the PDF electronic invoice after distorting.
Fig. 8 is the PDF electronic invoice PDF electronic invoice after the distorting of Fig. 7 being carried out to tampering location and mark.
Fig. 9 adds for checking the inventive method simultaneously PDF electronic invoice, delete and replaces to distort certification effect original PDF electronic invoice used.
Figure 10 be to the PDF electronic invoice embed watermark of Fig. 9 after containing the invoice file of watermark.
Figure 11 replaces the PDF electronic invoice after distorting to the invoice file part content containing watermark of Figure 10.
Figure 12 is the PDF electronic invoice PDF electronic invoice after the distorting of Figure 11 being carried out to tampering location and mark.
Embodiment
Embodiment
A kind of embodiment of the present invention is: a kind ofly can locate the watermark embedment and authentication method that PDF electronic invoice distorts, and comprises the steps:
A, to generate based on the watermark of file reading page
A1, Segment: by the reader page P of original PDF electronic invoice file, be divided into I page block P according to the intrinsic distribution coordinate of message unit i, i.e. P={P i| i=1,2 ..., I}, i are page block P isequence number, I is the page block P on whole reader page P isum;
A2, resolution file structure: pdf document structure elucidation is carried out to the code page S of original PDF electronic invoice file, obtains the file header S indicating pdf document version information h, file body S o, cross reference table S xwith end-of-file S tfour parts, that is: S={S h, S o, S x, S t;
Wherein, file body S oby J object O j(j=1,2 ..., J) composition, i.e. S o={ O j| j=1,2 ... J}, j are object number;
Cross reference table S xby J index information X j(j=1,2 ..., J) composition, X j={ object offset address x j, represent the number of times that this object is revised, whether object is by the state used };
End-of-file S tcomprise file description, encrypted message and file body S oin root object number, summary object number and object sum;
A3, authentication information generate:
To each page block P i(i=1,2 ..., I), first utilize Itextsharp assembly to extract each page block P iin content of text T i, T i={ t i,q| q=1,2 .., z i, and text coordinate D i, D i={ d i,q| q=1,2 .., z i, wherein, t i,qfor content of text T iin q message unit, z ifor content of text T ithe message unit number comprised, d i,qfor message unit t i,qcorresponding coordinate information, each message unit t i,qbetween to need to add line last identifier " r n ";
Then, based on secret key k 1generate content of text T iwith text coordinate D isummary info H i, H i=Hash (k 1, T i|| D i); Wherein, Hash (.) represents Hash function, || represent attended operation;
Finally by page block P icontent of text T i, text coordinate D iwith summary info H icouple together and form page block P iauthentication information A i, A i=T i|| D i|| H i;
A4, watermark information generate: to each page block P i(i=1,2 ..., I), by the authentication information A that A3 step generates icarry out compressing and utilize secret key k 2binary compressed data stream B is obtained after encryption i, and add PDF information Y i, generate PDF object W to be added i, W i={ B i, Y i, calculate its byte length simultaneously, be designated as l i;
Described PDF information Y icomprise: object number i+J, object modification number of times, the keyword of object, newline;
B, watermark embedment based on document code
First the individual PDF object W to be added of I A4 step generated ias page block P iwatermark information W={W i| i=1,2 ... I}, by the file body S that this watermark information W and A2 walks o={ O j| j=1,2 ... J} merges, and obtains comprising N=I+J combining objects O ' nmerged file body S ' o=O ' n| n=1,2 ... N}, wherein n is combining objects O ' nobject number, O n ′ = O j , n ≤ J W n - J , n > J ; Simultaneously according to offset address information x j, calculate each combining objects O ' nlength l n ′ , l n ′ = l J o , n ≤ J l n - J , n > J , Wherein
Then, based on key k 3generating length is the pseudo-random sequence of I+J, sorts from small to large to this sequence, and the positional information before sequence, by the positional alignment after sequence, forms object scrambling sequence V, V={v n| n=1,2 ..., I+J}, wherein v nfor combining objects O ' nnew object number, v nthe integer of ∈ [1, I+J]; Carry out scramble operation according to this, by combining objects O ' nobject number be revised as v neven, obtain immediately after scramble containing watermark file body S o w = { O v n ′ | v n = 1 , 2 , ... , I + J } ;
The renewal of C, report to the leadship after accomplishing a task concordance list and end-of-file
Watermark file body is contained according to what obtain in B step with object length information l ' ncalculate the offset address information after merging x v n w = { a , n = 1 x v n - 1 w + l n - 1 ′ , n ∈ [ 2 , I + J ] , Wherein: a is first object O 1offset address, its occurrence is the file header S of pdf document version information hbyte length; Regenerate concordance list of reporting to the leadship after accomplishing a task updating file tail S simultaneously tin root object number, summary object number and object sum, generate the end-of-file after upgrading
D, containing watermark PDF electronic invoice file generated
What walked by B contains watermark file body concordance list of reporting to the leadship after accomplishing a task after upgrading with C step and end-of-file merge reconstruct, namely obtain the invoice file F containing watermark w;
The reconstruct of E, authentication information and authentication information extract
The authentication information reconstruct of E1, document to be detected:
By PDF electronic invoice file F to be measured #reader page P #, be divided into I page block P to be measured according to the intrinsic distribution coordinate of message unit i #, i.e. P #={ P # i| i=1,2 ..., I};
To each page block P to be measured # i(i=1,2 ..., I), first utilize Itextsharp assembly to extract each page block P to be measured # iin content of text T to be measured # i, T # i={ t # i,q| q=1,2 .., z i, and text coordinate D to be measured # i, D # i={ d # i,q| q=1,2 .., z i, wherein, t # i,qfor content of text T to be measured # iin q message unit to be measured, z ifor content of text T to be measured # ithe message unit number comprised, d # i,qfor message unit t to be measured # i,qcorresponding coordinate information to be measured, each message unit t to be measured # i,qbetween to need to add line last identifier " r n ";
Then, based on secret key k 1generate content of text T to be measured # iwith text coordinate D to be measured # itext snippet information H to be measured # i, H # i=Hash (k 1, T # i|| D # i);
E2, authentication information extract: to PDF electronic invoice file F to be measured #code page S #carry out parsing and obtain file body to be measured according to secret key k 3with object scrambling sequence V to I+J object to be measured carry out being inverted unrest, find object to be measured i pseudo-object of middle correspondence and watermark information W to be measured i *(i=1,2 ..., I), decompress(ion) watermark information W to be measured i *in packed data B to be measured i *, and utilize secret key k 2deciphering obtains authentication information A to be measured i *;
According to content of text T to be measured in E1 # i, text coordinate D to be measured # iwith text snippet information H to be measured # ilength, by left-to-right successively from authentication information A to be measured i *be partitioned into corresponding page block P i *extraction content of text T i *, extract text coordinate D i *, extract text snippet information H i *, and according to row last identifier " r n " text message T will be extracted i *be partitioned into information extraction unit t * i,q, and from extraction text coordinate D i *separate corresponding extraction coordinate unit d * i,q, i.e. T i *={ t * i,q| q=1,2 .., z i, D i *={ d * i,q| q=1,2 .., z i;
F, distort certification
F1, distort authenticity judge: if content of text T to be measured # iwith extraction content of text T i *equal, and text snippet information H to be measured i #with extraction text snippet information H i *equal, then judge PDF electronic invoice file F to be measured #be not tampered, the result that output detections passes through, complete detection; Otherwise, judge that PDF electronic invoice file to be measured is tampered, carry out the operation of F2 step;
F2, tampering location and mark: during F1 is walked, regard as the invoice file F distorted #, find out and information extraction unit t * i,qunequal message unit t to be measured # i,q, be and be tampered message unit t # i,q, the result that output detections does not pass through; And according to the coordinate information d of its correspondence # i,q, be tampered message unit t # i,qadd mark information warning, make the page P of its correspondence #display word content place demonstrates distorts identification information.
Effect of the present invention can be verified by following performance evaluation and test and be illustrated:
Wherein, the byte number (Byte) embedded in watermark capacity Single document is weighed, and invoice increment file byte (Byte) percentage change is weighed.
One, the analysis of watermark capacity and invoice size increment and statistics
To similar watermarking algorithm, watermark capacity is larger, brings the increment of file also larger.
The file increment of existing document is multi-control within 5%, considers disguise, and prevent the person's of being cracked suspicion and reduce storage space, watermark capacity is set within 1% by the present invention.Watermark capacity changes according to the content size difference of object block.Table 1 gives that the present invention is consuming time to the electronic invoice source document size of conventional PDF, page number, embedding, the statistics of watermark capacity and increment.As can be seen from Table 1, file increment of the present invention is watermark capacity, about 0.5%, well can ensure the transmission of file; Embedding the time calculating watermark embedment file in a hurdle consuming time and the time upgrading cross reference table, this time, Embedding efficiency was high substantially at below 10ms, was suitable for practical application.
Table 1 is by the parametric statistics result of the inventive method to electronic invoice embed watermark
Note: in table, invoice tested object is all from the invoice manufacturing system of present inventor oneself exploitation
Two, the test of tampering detection performance
For the detection perform under distorting is replaced, adds, deletes and mixed to verification algorithm at partial content, by the method for above-described embodiment, dependence test is carried out to PDF electronic invoice file.
1, partial content is replaced and is distorted
As shown in Figure 1, with file size 98, the PDF electronic invoice of 458 bytes is tested object, and what utilize the present invention to generate sees Fig. 2 containing watermark file, can find out that watermark is completely invisible from Fig. 1 and Fig. 2; Watermark capacity is 458 bytes, and wherein embedding consuming time is 5 milliseconds, is 0.48% containing watermark electronic invoice size increment.Carry out following replacement to Fig. 2 to distort: with " 11 " replacement " 1 " in quantity one hurdle 1) in goods number 1; 2) add up in Renminbi (capitalization) one hurdle with " ten yuan " replacement " fifty yuan "; 3) " 1110013710715 " replacement " 111001371071 " is used in invoice codes; 4) in date one hurdle, " 2015-02-06 " replacement " 2015-02-02 " is used; 2) “ $10 is used in small letter one hurdle " replace “ $50 ", Fig. 3 is the image after distorting Fig. 2.
Fig. 4 carries out the positioning image after certification by the inventive method to Fig. 3; Positioning precision is the length of sub-block Word message, and telltale mark is a red line.Can see, because the present invention is to all word coordinate informations that can obtain in invoice, Fig. 4 is accurately detected and distorts.Distort result by replacement can find out, the present invention accurately locates invoice, there is not undetected or empty error detection by mistake.
2, add, delete and distort
As shown in Figure 5, with file size 100, the PDF electronic invoice of 325 bytes is tested object, what utilize the present invention to generate sees Fig. 6 containing watermark file, watermark capacity be 492 bytes wherein, embedding consuming time is 6 milliseconds, be 0.51% containing watermark electronic invoice size increment.Fig. 6 is distorted as follows: 1) add items " fruitlet earphone ", commodity corresponding unit price “ $20 ", the corresponding quantity of commodity " 1 ", the corresponding amount of money of commodity " 20 ".Fig. 7 is the image after distorting Fig. 6.
Fig. 8 carries out the tampering location image after certification by the inventive method to the electronic invoice after the distorting of Fig. 7, and positioning precision is similarly the length of sub-block Word message, and telltale mark is a red line.Can see, the present invention because contrast one by one after fileinfo piecemeal, and can obtain the coordinate information of word in file, and is also carry out macroblock mode when certification, provides guarantee like this for final accurate location.Test result shows that the present invention distorts have good authentication capability to interpolation, deletion.
3, carry out adding simultaneously, to delete and replacement is distorted
With PDF original electron invoice as shown in Figure 9 for tested object, what utilize the present invention to generate sees Figure 10 containing watermark file.Following distorting is carried out: 1) add items " hundred sparrow antelope mildy wash ", unit price " ", quantity " 1 ", the amount of money " " to containing watermark electronic invoice file such as Figure 10; 2) " business " in deletion trade classification and the full detail of payee; 3) capitalization adds up to " 20 yuan " change into " 50 yuan ", small letter adds up to “ $20 " be " 50 ", time of making out an invoice is revised as " 2015-02-17 " by " 2015-02-03 ", and security code is revised as " bPJbubvt5Sj44bt8DMN335 " by " WbTPJuvt5Sj44bt8DM-N3 "; Obtain comprising three kinds and distort the electronic invoice file of the multiple tampered region of mode as shown in figure 11.Figure 12 utilizes the inventive method to the image of Figure 11 certification, tampering location, and positioning precision is the length of sub-block Word message, and telltale mark is a red line.Comprehensive above test result can be found out, when adding, delete and replacement is present in a PDF electronic invoice file simultaneously, algorithm of the present invention still has accurate tampering location ability.

Claims (1)

1. can locate watermark embedment and an authentication method that PDF electronic invoice distorts, comprise the steps:
A, to generate based on the watermark of file reading page
A1, Segment: by the reader page P of original PDF electronic invoice file, be divided into I page block P according to the intrinsic distribution coordinate of message unit i, i.e. P={P i| i=1,2 ..., I}, i are page block P isequence number, I is the page block P on whole reader page P isum;
A2, resolution file structure: pdf document structure elucidation is carried out to the code page S of original PDF electronic invoice file, obtains the file header S indicating pdf document version information h, file body S o, cross reference table S xwith end-of-file S tfour parts, that is: S={S h, S o, S x, S t;
Wherein, file body S oby J object O j(j=1,2 ..., J) composition, i.e. S o={ O j| j=1,2 ... J}, j are object number;
Cross reference table S xby J index information X j(j=1,2 ..., J) composition, X j={ object offset address x j, represent the number of times that this object is revised, whether object is by the state used };
End-of-file S tcomprise file description, encrypted message and file body S oin root object number, summary object number and object sum;
A3, authentication information generate:
To each page block P i(i=1,2 ..., I), first utilize Itextsharp assembly to extract each page block P iin content of text T i, T i={ t i,q| q=1,2 .., z i, and text coordinate D i, D i={ d i,q| q=1,2 .., z i, wherein, t i,qfor content of text T iin q message unit, z ifor content of text T ithe message unit number comprised, d i,qfor message unit t i,qcorresponding coordinate information, each message unit t i,qbetween to need to add line last identifier " r n ";
Then, based on secret key k 1generate content of text T iwith text coordinate D isummary info H i, H i=Hash (k 1, T i|| D i); Wherein, Hash () represents Hash function, || represent attended operation;
Finally by page block P icontent of text T i, text coordinate D iwith summary info H icouple together and form page block P iauthentication information A i, A i=T i|| D i|| H i;
A4, watermark information generate: to each page block P i(i=1,2 ..., I), by the authentication information A that A3 step generates icarry out compressing and utilize secret key k 2binary compressed data stream B is obtained after encryption i, and add PDF information Y i, generate PDF object W to be added i, W i={ B i, Y i, calculate its byte length simultaneously, be designated as l i;
Described PDF information Y icomprise: object number i+J, object modification number of times, the keyword of object, newline;
B, watermark embedment based on document code
First the individual PDF object W to be added of I A4 step generated ias page block P iwatermark information W={W i| i=1,2 ... I}, by the file body S that this watermark information W and A2 walks o={ O j| j=1,2 ... J} merges, and obtains comprising N=I+J combining objects O ' nmerged file body S ' o=O ' n| n=1,2 ... N}, wherein n is combining objects O ' nobject number, O n ′ = O j , n ≤ J W n - J , n > J ; Simultaneously according to offset address information x j, calculate each combining objects O ' nlength l ' n, l n ′ = l J o , n ≤ J l n - J , n > J , Wherein
Then, based on key k 3generating length is the pseudo-random sequence of I+J, sorts from small to large to this sequence, and the positional information before sequence, by the positional alignment after sequence, forms object scrambling sequence V, V={v n| n=1,2 ..., I+J}, wherein v nfor combining objects O ' nnew object number, v nthe integer of ∈ [1, I+J]; Carry out scramble operation according to this, by combining objects O ' nobject number be revised as v neven, obtain immediately after scramble containing watermark file body S o w = { O v n ′ | v n = 1 , 2 , ... , I + J } ;
The renewal of C, report to the leadship after accomplishing a task concordance list and end-of-file
Watermark file body is contained according to what obtain in B step with object length information l ' ncalculate the offset address information after merging x v n w = a , n = 1 x v n - 1 w + l n - 1 ′ , n ∈ [ 2 , I + J ] , Wherein: a is first object O 1offset address, its occurrence is the file header S of pdf document version information hbyte length; Regenerate concordance list of reporting to the leadship after accomplishing a task updating file tail S simultaneously tin root object number, summary object number and object sum, generate the end-of-file after upgrading
D, containing watermark PDF electronic invoice file generated
What walked by B contains watermark file body concordance list of reporting to the leadship after accomplishing a task after upgrading with C step and end-of-file merge reconstruct, namely obtain the invoice file F containing watermark w;
The reconstruct of E, authentication information and authentication information extract
The authentication information reconstruct of E1, document to be detected:
By PDF electronic invoice file F to be measured #reader page P #, be divided into I page block P to be measured according to the intrinsic distribution coordinate of message unit i #, i.e. P #={ P # i| i=1,2 ..., I};
To each page block P to be measured # i(i=1,2 ..., I), first utilize Itextsharp assembly to extract each page block P to be measured # iin content of text T to be measured # i, T # i={ t # i,q| q=1,2 .., z i, and text coordinate D to be measured # i, D # i={ d # i,q| q=1,2 .., z i, wherein, t # i,qfor content of text T to be measured # iin q message unit to be measured, z ifor content of text T to be measured # ithe message unit number comprised, d # i,qfor message unit t to be measured # i,qcorresponding coordinate information to be measured, each message unit t to be measured # i,qbetween to need to add line last identifier " r n ";
Then, based on secret key k 1generate content of text T to be measured # iwith text coordinate D to be measured # itext snippet information H to be measured # i, H # i=Hash (k 1, T # i|| D # i);
E2, authentication information extract: to PDF electronic invoice file F to be measured #code page S #carry out parsing and obtain file body to be measured S o w * = { O v n * | v n = 1 , 2 , ... , I + J } , According to secret key k 3with object scrambling sequence V to I+J object to be measured carry out being inverted unrest, find object to be measured i pseudo-object of middle correspondence and watermark information W to be measured i *(i=1,2 ..., I), decompress(ion) watermark information W to be measured i *in packed data B to be measured i *, and utilize secret key k 2deciphering obtains authentication information A to be measured i *;
According to content of text T to be measured in E1 # i, text coordinate D to be measured # iwith text snippet information H to be measured # ilength, by left-to-right successively from authentication information A to be measured i *be partitioned into corresponding page block P i *extraction content of text T i *, extract text coordinate D i *, extract text snippet information H i *, and according to row last identifier " r n " text message T will be extracted i *be partitioned into information extraction unit t * i,q, and from extraction text coordinate D i *separate corresponding extraction coordinate unit d * i,q, i.e. T i *={ t * i,q| q=1,2 .., z i, D i *={ d * i,q| q=1,2 .., z i;
F, distort certification
F1, distort authenticity judge: if content of text T to be measured # iwith extraction content of text T i *equal, and text snippet information H to be measured i #with extraction text snippet information H i *equal, then judge PDF electronic invoice file F to be measured #be not tampered, the result that output detections passes through, complete detection; Otherwise, judge that PDF electronic invoice file to be measured is tampered, carry out the operation of F2 step;
F2, tampering location and mark: during F1 is walked, regard as the invoice file F distorted #, find out and information extraction unit t * i,qunequal message unit t to be measured # i,q, be and be tampered message unit t # i,q, the result that output detections does not pass through; And according to the coordinate information d of its correspondence # i,q, be tampered message unit t # i,qadd mark information warning, make the page P of its correspondence #display word content place demonstrates distorts identification information.
CN201510339156.7A 2015-06-17 2015-06-17 It is a kind of to position watermark insertion and the authentication method that PDF electronic invoices are distorted Expired - Fee Related CN104899822B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510339156.7A CN104899822B (en) 2015-06-17 2015-06-17 It is a kind of to position watermark insertion and the authentication method that PDF electronic invoices are distorted

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510339156.7A CN104899822B (en) 2015-06-17 2015-06-17 It is a kind of to position watermark insertion and the authentication method that PDF electronic invoices are distorted

Publications (2)

Publication Number Publication Date
CN104899822A true CN104899822A (en) 2015-09-09
CN104899822B CN104899822B (en) 2018-04-27

Family

ID=54032470

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510339156.7A Expired - Fee Related CN104899822B (en) 2015-06-17 2015-06-17 It is a kind of to position watermark insertion and the authentication method that PDF electronic invoices are distorted

Country Status (1)

Country Link
CN (1) CN104899822B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107229879A (en) * 2017-06-05 2017-10-03 北京网证科技有限公司 Electronics confirmation request automatic generation method and system based on safe Quick Response Code
CN109948123A (en) * 2018-11-27 2019-06-28 阿里巴巴集团控股有限公司 A kind of image combining method and device
CN110288755A (en) * 2019-05-21 2019-09-27 平安银行股份有限公司 The invoice method of inspection, server and storage medium based on text identification
CN111209577A (en) * 2019-12-31 2020-05-29 航天信息股份有限公司 Method and device for adding watermark data, storage medium and electronic equipment
CN111680987A (en) * 2020-06-03 2020-09-18 中国银行股份有限公司 Transaction processing method, device and equipment based on bill and readable storage medium
CN112733095A (en) * 2020-12-25 2021-04-30 山东浪潮通软信息科技有限公司 Implementation method and equipment for automatically generating document watermark

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100324748A1 (en) * 2006-11-16 2010-12-23 Keith Voysey Building Optimization Platform And Web-Based Invoicing System
US20120205445A1 (en) * 2011-02-10 2012-08-16 Ariel Inventions, Llc Electronic payment using optically readable symbols
CN102842189A (en) * 2011-06-23 2012-12-26 张翼 Electronic invoice
CN104036447A (en) * 2014-03-12 2014-09-10 西南交通大学 PNG electronic invoice image watermark embedding and authentication method based on block sorting

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100324748A1 (en) * 2006-11-16 2010-12-23 Keith Voysey Building Optimization Platform And Web-Based Invoicing System
US20120205445A1 (en) * 2011-02-10 2012-08-16 Ariel Inventions, Llc Electronic payment using optically readable symbols
CN102842189A (en) * 2011-06-23 2012-12-26 张翼 Electronic invoice
CN104036447A (en) * 2014-03-12 2014-09-10 西南交通大学 PNG electronic invoice image watermark embedding and authentication method based on block sorting

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王甜甜 等: "关键字符双重验证的Word文档脆弱水印方法", 《光电子.激光》 *
霍耀冉 等: "基于邻域比较的JPEG脆弱水印算法及性能分析", 《软件学报》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107229879A (en) * 2017-06-05 2017-10-03 北京网证科技有限公司 Electronics confirmation request automatic generation method and system based on safe Quick Response Code
CN109948123A (en) * 2018-11-27 2019-06-28 阿里巴巴集团控股有限公司 A kind of image combining method and device
CN109948123B (en) * 2018-11-27 2023-06-02 创新先进技术有限公司 Image merging method and device
CN110288755A (en) * 2019-05-21 2019-09-27 平安银行股份有限公司 The invoice method of inspection, server and storage medium based on text identification
CN110288755B (en) * 2019-05-21 2023-05-23 平安银行股份有限公司 Invoice checking method based on text recognition, server and storage medium
CN111209577A (en) * 2019-12-31 2020-05-29 航天信息股份有限公司 Method and device for adding watermark data, storage medium and electronic equipment
CN111680987A (en) * 2020-06-03 2020-09-18 中国银行股份有限公司 Transaction processing method, device and equipment based on bill and readable storage medium
CN112733095A (en) * 2020-12-25 2021-04-30 山东浪潮通软信息科技有限公司 Implementation method and equipment for automatically generating document watermark

Also Published As

Publication number Publication date
CN104899822B (en) 2018-04-27

Similar Documents

Publication Publication Date Title
CN104899822A (en) Watermarking embedding and authentication method of positioning PDF electronic invoice falsification
CN110457957B (en) Information processing method and device of electronic bill, electronic equipment and medium
CN104036447B (en) PNG electronic invoices image watermark based on block sort is embedded in and authentication method
CN104077624A (en) Methods and systems for generating and checking electronic note with anti-counterfeiting two-dimension code
CN110532811B (en) PDF (Portable document Format) signature method and PDF signature system
CN103164515B (en) Computer system confidential file knowledge base searching method
CN108363929B (en) System and method for generating information elimination report of storage device and preventing tampering
WO2022001096A1 (en) Facial test database management system for detection of facial recognition device, and method
CN101763394B (en) Method for searching secret-related files in computer system
CN103150679B (en) A kind of ticket information safety protection method
CN111444275A (en) Block chain-based data security right confirming method and system
CN102831570B (en) Webpage watermark generation and authentication method capable of positioning and tampering positions on a browser
CN106951743A (en) A kind of software code infringement detection method
CN106033543A (en) Document modification detecting method, original document manuscript providing device, duplicated document detecting device, and document modification detection system
CN111984734A (en) Data processing method, device and equipment based on block chain and storage medium
CN104050400A (en) Webpage link protection method based on control character coding and steganography
CN102013088A (en) Digital watermark generation and embedding method based on comprehensive characteristics of digital archival resources
CN102012999B (en) Electronic file for machine-readable information cards, and implementation method and system thereof
CN104035874B (en) A kind of software program detection method, apparatus and system
CN111490870B (en) Seal registration method, verification method and anti-counterfeiting system based on blockchain
CN109992984B (en) File identification method and equipment based on two-dimensional code
JP5788681B2 (en) Handwritten signature acquisition apparatus, handwritten signature acquisition program, and handwritten signature acquisition method
CN101226578B (en) Method and device for hiding file information and recognizing pursuit
CN102063739A (en) Accreditation and inspection system and method for infectious disease inspection physical examination documents at frontier port
CN201804367U (en) Portable note detecting device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180427

Termination date: 20200617