CN113656836B - Document processing method, apparatus, device, storage medium, and computer program product - Google Patents

Document processing method, apparatus, device, storage medium, and computer program product Download PDF

Info

Publication number
CN113656836B
CN113656836B CN202110982776.8A CN202110982776A CN113656836B CN 113656836 B CN113656836 B CN 113656836B CN 202110982776 A CN202110982776 A CN 202110982776A CN 113656836 B CN113656836 B CN 113656836B
Authority
CN
China
Prior art keywords
file
document
watermark information
recording
data area
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110982776.8A
Other languages
Chinese (zh)
Other versions
CN113656836A (en
Inventor
张琦守
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202110982776.8A priority Critical patent/CN113656836B/en
Publication of CN113656836A publication Critical patent/CN113656836A/en
Application granted granted Critical
Publication of CN113656836B publication Critical patent/CN113656836B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6209Protecting access to data via a platform, e.g. using keys or access control rules to a single file or object, e.g. in a secure envelope, encrypted and accessed using a key, or with access control rules appended to the object itself
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2107File encryption

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bioethics (AREA)
  • Editing Of Facsimile Originals (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The disclosure provides a document processing method, a device, equipment, a storage medium and a computer program product, and relates to the technical field of data processing, in particular to the technical field of document security. The specific implementation scheme is as follows: positioning the tail part of a data area of a document to be processed, wherein the data area is an accessible area based on information recorded in a document head of the document to be processed; recording preset dark watermark information at the tail part of the data area to obtain a new document containing the preset dark watermark information, wherein the preset dark watermark information is a preset character string.

Description

Document processing method, apparatus, device, storage medium, and computer program product
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to the field of document security technologies.
Background
With the rapid development of computer technology, many businesses and organizations migrate data processing traffic onto networks. These data processing procedures rely on: documents containing sensitive information. Therefore, in data processing, it is becoming increasingly important to protect confidentiality, privacy, authenticity, and integrity of data.
Disclosure of Invention
The present disclosure provides a document processing method, apparatus, device, storage medium, and computer program product.
According to a first aspect of the present disclosure, there is provided a document processing method including:
positioning the tail part of a data area of a document to be processed, wherein the data area is an accessible area based on information recorded in a document head of the document to be processed;
recording preset dark watermark information at the tail part of the data area to obtain a new document containing the preset dark watermark information, wherein the preset dark watermark information is a preset character string.
According to a second aspect of the present disclosure, there is provided a document processing apparatus comprising:
the positioning module is used for positioning the tail part of a data area of the document to be processed, wherein the data area is an accessible area based on information recorded in the document head of the document to be processed;
and the recording module is used for recording preset dark watermark information at the tail part of the data area to obtain a new document containing the preset dark watermark information, wherein the preset dark watermark information is a preset character string.
According to a third aspect of the present disclosure, there is provided an electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the document processing method described above.
According to a fourth aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the document processing method according to the above.
According to a fifth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a document processing method according to the above.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic diagram of a first flow of a document processing method according to an embodiment of the disclosure;
fig. 2 is a schematic diagram of watermarking provided in an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a second flow chart of a document processing method according to an embodiment of the disclosure;
FIG. 4 is a first schematic view of a document structure provided by an embodiment of the present disclosure;
FIG. 5 is a third flow chart of a document processing method according to an embodiment of the present disclosure;
FIG. 6 is a fourth flowchart of a document processing method according to an embodiment of the present disclosure;
fig. 7a and 7b are schematic diagrams of watermarking using embodiments of the present disclosure;
FIG. 8 is a fifth flowchart of a document processing method according to an embodiment of the present disclosure;
FIG. 9 is a sixth flowchart of a document processing method according to an embodiment of the present disclosure;
FIG. 10 is a second schematic view of a document structure provided by an embodiment of the present disclosure;
FIG. 11 is a seventh flowchart of a document processing method according to an embodiment of the present disclosure;
FIG. 12 is a schematic view of an eighth flowchart of a document processing method according to an embodiment of the present disclosure;
FIG. 13 is a ninth flowchart of a document processing method according to an embodiment of the present disclosure;
FIG. 14 is a schematic view of a document processing apparatus according to an embodiment of the present disclosure;
FIG. 15 is a first block diagram of an electronic device for implementing a document processing method of an embodiment of the present disclosure;
FIG. 16 is a second block diagram of an electronic device for implementing a document processing method of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The data processing process relies on: documents containing sensitive information. This makes it increasingly important to protect the confidentiality, privacy, authenticity and integrity of data in data processing.
In the related art, the confidentiality, privacy, authenticity and integrity of data are protected by adding a watermark in a document. Wherein, according to the visibility of the watermark, the watermark is divided into: visible watermarks (which may also be referred to as explicit watermarks or bright watermarks) and invisible watermarks (which may also be referred to as hidden watermarks or dark watermarks).
At present, in the field of data processing, more schemes for adding watermarks in media resources such as pictures or videos are provided, but fewer schemes for adding watermarks in documents are provided, and particularly, fewer schemes for adding dark watermarks in documents are provided.
In the related art, in the scheme of adding a dark watermark to a document, a data area of the document needs to be modified, for example, a line of text, a group of words, a group of characters, or the like in the original document is migrated, or character features in the original document are modified, or the size of characters in the original document is adjusted, so that the sizes of the watermark and the characters in the original document are different, and the like.
Therefore, the scheme of adding the dark watermark can influence the content of the original document, and the algorithm for adding the watermark is complex, so that the realization cost is high.
In addition, the scheme of adding the dark watermark in the related art aims at a specific document, so that the universality is poor.
To solve the above-described problems, an embodiment of the present disclosure provides a document processing method, as shown in fig. 1. The method comprises the following steps:
step S11, locating the tail part of a data area of the document to be processed, wherein the data area is an area accessible based on information recorded in the document head of the document to be processed.
In the embodiment of the present disclosure, the structure type of the document to be processed may be a ZIP structure type, an OLE2 structure type, or a character string structure type. The structure type of the document to be processed may be other structure types, which are not limited.
Documents of the above ZIP structure types may include, but are not limited to: docx, pptx, xlsx and epub et al. Documents of OLE2 structure type may include, but are not limited to: doc, ppt, xls and wps, etc. Documents of the string structure type may include, but are not limited to: pdf documents, etc.
The accessible area in the document to be processed is controlled by the information recorded in the header of the document. Wherein the accessible area in the document to be processed is called a data area. The data area is used for recording information for viewing by a user.
In the embodiment of the disclosure, after the document to be processed is acquired, the document to be processed is analyzed, and the tail part of the data area of the document to be processed is determined.
Step S12, recording preset dark watermark information at the tail of the data area to obtain a new document containing the preset dark watermark information, wherein the preset dark watermark information is a preset character string.
In the embodiment of the disclosure, after determining the tail of the data area, recording preset dark watermark information at the tail of the data area to obtain a new document containing the preset dark watermark information.
For example, a watermarking scheme as shown in fig. 2. In fig. 2, dark watermark information is added at the tail of the data area of the document 1 (i.e., the document to be processed), resulting in a new document 2 including the dark watermark information.
The preset dark watermark information is a series of preset character strings. The preset character string is recorded at the end of the data area, that is, at an extension area outside the data area. The data area is an accessible area in the document to be processed, and based on the fact that the preset character string is located in an inaccessible area, the preset character string is invisible to anyone. Based on the definition of the watermark, the predetermined string may be referred to as a dark watermark.
In the technical scheme provided by the embodiment of the disclosure, the preset dark watermark information is recorded at the tail part of the data area, that is, the preset dark watermark information is added in the extension area outside the data area, so that the data area of the original document does not need to be modified. The method provides traceability evidence, reduces black malicious behaviors, reduces document loss, and reduces influence on a data area of an original document.
In addition, in the technical scheme provided by the embodiment of the disclosure, only the preset dark watermark information needs to be directly recorded at the tail part of the data area, and any complex operation such as text, word or character migration or character feature modification is not needed. Thus, the cost of adding the dark watermark information is reduced.
Furthermore, in the technical scheme provided by the embodiment of the disclosure, only the preset dark watermark information needs to be directly recorded at the tail part of the data area, the structure type of the document does not need to be distinguished, and the algorithm for adding the dark watermark information does not need to be adjusted based on the structure type of the document. Therefore, the universality of the scheme for adding the dark watermark information is improved.
According to the watermark information extraction method suitable for the document processing method provided by the embodiment of the disclosure, after a new document is obtained, when preset dark watermark information is extracted from the new document, the length of the preset dark watermark information can be determined, and a character string with the determined length is extracted from the tail part of a data area of the new document and is used as the preset dark watermark information. Compared with the watermark information extraction method suitable for complex operations such as text, words or characters migration or character feature modification, the method for extracting the dark watermark information provided by the embodiment of the disclosure is simpler and more convenient, and the cost for extracting the dark watermark information is reduced.
In the embodiment of the disclosure, the data area of the document can be accessed based on the information recorded in the document header, and the information in the extension area outside the data area cannot be accessed. Therefore, the preset dark watermark information added in the extension area does not affect the content of the document.
In one example, after a new document containing preset dark watermark information is generated, the attribute of the new document may be set to inhibit editing or save as.
In practical applications, when a document (e.g., an OLE2 structure type document) is edited or saved, the document is regenerated, and the preset dark watermark information added in the extension area is lost. In the embodiment of the disclosure, the attribute of the new document is set to be forbidden to edit or save, so that confidentiality, privacy, authenticity and integrity of data can be effectively ensured.
In one embodiment of the present disclosure, the embodiment of the present disclosure further provides a document processing method, as shown in fig. 3, in which a central directory record is included at the tail of the data area. The method may comprise the steps of:
step S31, locating the tail part of a data area of the document to be processed, wherein the data area is an area accessible based on information recorded in the document head of the document to be processed. Step S31 is the same as step S11.
Step S32, recording preset dark watermark information in the custom attribute file contained in the central directory record.
Documents of the ZIP structure type include: a data area and an extension area located outside the data area (i.e., at the end of the data area). The extension area includes: the central directory record records the end data block and the central directory record.
Wherein the size and address of the central directory record are recorded in the end data block of the central directory record. The central directory record has recorded therein information of various data blocks. Wherein the information of the data block may include a first type file, a second type file, and a third type file. The first type file is used to record the name and type of the file, for example, [ content_types ]. Xml file. The second type of file is used to record the relationship between the contained files, such as rels files under the rels directory. The third type of file is used to set document property content, i.e., custom property file, e.g., custom. Xml under doctips directory.
In the embodiment of the disclosure, a document to be processed is analyzed, a user-defined attribute file contained in a central directory record is determined, and preset dark watermark information is recorded in the user-defined attribute file.
Step S33, the custom attribute file is moved to the tail of the data area.
In the embodiment of the disclosure, after the preset dark watermark information is recorded in the custom attribute file, the custom attribute file is moved to the tail of the data area.
Steps S32 to S33 are refinements to step S12.
The central directory record has recorded therein information of various data blocks. The custom property file is only one of the various data blocks of the central directory record. In the embodiment of the disclosure, the custom attribute file is moved to the tail part of the data area, so that the custom attribute file is close to the data area, and the dark watermark information is conveniently extracted.
For example, a document structure as shown in fig. 4. In fig. 4, in document 1, preset dark watermark information is recorded in the custom.xml under the doclips directory, and the custom.xml in which the preset dark watermark information is recorded is moved to the tail of the data area, so as to obtain document 2.
In the embodiment of the disclosure, the addition of the preset dark watermark information can be completed by simply modifying the custom attribute file contained in the central directory record. While the central directory record is located outside the data area. Therefore, the preset dark watermark information is added through the technical scheme provided by the embodiment of the disclosure, and the data area of the original document is not affected.
In addition, in the technical scheme provided by the embodiment of the disclosure, the addition of the preset dark watermark information can be completed only by simply modifying the custom attribute file contained in the central directory record, so that the cost of adding the dark watermark information is further reduced.
The technical scheme provided by the embodiment of the disclosure can be suitable for adding the dark watermark information in batches due to lower cost of adding the dark watermark information, and improves the efficiency of adding the dark watermark information.
In one embodiment of the present disclosure, the embodiment of the present disclosure further provides a document processing method, as shown in fig. 5, which may include the following steps.
Step S51, locating the tail part of a data area of the document to be processed, wherein the data area is an area accessible based on information recorded in the document head of the document to be processed. Step S51 is the same as step S11.
Step S52, analyzing the end data block of the central directory record of the document to be processed to obtain the size and address of the central directory record.
Step S53, the size and address of the central directory record are analyzed, and the directory address of the custom attribute file is determined.
The information recorded in the central directory record includes, as described above in the section of step S32 in fig. 3: a first type file, a second type file, and a third type file. The first type file is used for recording the names and types of the files, and the second type file is used for recording the relations among the files. The third type file is used for setting the directory address of the document attribute content, and the third type file is the self-defined attribute file.
In the embodiment of the disclosure, the starting position and the ending position of the central directory record can be determined according to the size and the address of the central directory record. The first type file and the second type file are read from a storage area between a start position and an end position of the central directory record. And determining the directory address of the third type file according to the file name and the type recorded in the first type file and the relation between the files recorded in the second type file.
Step S54, recording preset dark watermark information in the custom attribute file under the directory address of the custom attribute file.
The custom property file is contained under the directory address of the custom property file. After the directory address is determined, a custom attribute file under the directory address can be determined, and then preset dark watermark information is recorded in the custom attribute file.
Steps S52, S53 and S54 are refinements to step S32.
Step S55, the custom attribute file is moved to the tail of the data area. Step S55 is the same as step S33.
Through the steps S52-S54, the user-defined attribute files outside the data area are modified, so that the dark watermark information is added into the document to be processed, the influence on the data area of the original document is reduced, and the cost for adding the dark watermark information is reduced.
In one embodiment of the present disclosure, the embodiment of the present disclosure further provides a document processing method, as shown in fig. 6, which may include the following steps.
Step S61, locating the tail of a data area of the document to be processed, wherein the data area is an area accessible based on information recorded in the document header of the document to be processed. Step S61 is the same as step S11.
Step S62, analyzing the end data block of the central directory record of the document to be processed to obtain the size and address of the central directory record. Step S62 is the same as step S52.
Step S63, analyzing the size and address of the central directory record, and determining the directory address of the custom attribute file. Step S63 is the same as step S53.
Step S64, the length of the custom attribute file under the directory address of the custom attribute file is modified to the length of the preset dark watermark information.
The length of the custom attribute file under the directory address of the custom attribute file is simply called the record length. After determining the custom property file, the recording length may be modified to the length of the preset dark watermark information.
Step S65, based on the length of the modified custom attribute file, the preset dark watermark information is recorded in the custom attribute file.
After the record length is modified, based on the modified record length, the preset dark watermark information is recorded in the custom attribute file.
In one example, the custom properties file is an annotation field. The comment field includes a comment length and comment content. Take the document shown in fig. 7a and 7b as an example. The document before adding the preset dark watermark information shown in fig. 7a, and the document after adding the preset dark watermark information shown in fig. 7 b. The preset dark watermark information "77 61 74 65 72 6D 61 72 6B 5F 74 65 73 74" is 14 bytes in length. After determining the comment field, the comment length is modified to 14 bytes, as shown in "0E 00" in FIG. 7 b. Thereafter, "77 61 74 65 72 6D 61 72 6B 5F 74 65 73 74" is added to the comment field.
Steps S64 and S65 are refinements to step S54.
Step S66, the custom attribute file is moved to the tail of the data area. Step S66 is the same as step S33.
In the embodiment of the disclosure, when preset dark watermark information is recorded in the custom attribute file, the recording length is modified to the length of the preset dark watermark information. Therefore, the preset dark watermark information is completely written into the custom attribute file, and the waste of storage space can be avoided.
In another embodiment of the present disclosure, after determining the custom attribute file, if the recording length is greater than or equal to the length of the preset dark watermark information, the preset dark watermark information is directly recorded in the custom attribute file; if the recording length is smaller than the length of the preset dark watermark information, the information with the appointed length can be extracted from the preset dark watermark information according to the preset rule, and the information is used as the compressed preset dark watermark information, and the compressed preset dark watermark information is recorded in the custom attribute file. Wherein the specified length is less than or equal to the recording length.
After determining the custom attribute file, other ways may be used to record the preset dark watermark information in the custom attribute file.
In one embodiment of the present disclosure, the embodiment of the present disclosure further provides a document processing method, as shown in fig. 8, which may include the following steps.
Step S81, the tail part of a data area of the document to be processed is positioned, wherein the data area is an area accessible based on information recorded in the document head of the document to be processed. Step S81 is the same as step S11.
Step S82, adding a custom attribute file at the end of the central directory record of the document to be processed, wherein the length of the custom attribute file is the length of the preset dark watermark information.
In the disclosed embodiment, the central directory record of the document to be processed does not contain custom property files. In this case, the custom attribute file may be constructed, where the length of the custom attribute file is the length of the preset dark watermark information, and the custom attribute file is added at the end of the central directory record of the document to be processed.
In one embodiment of the present disclosure, the central directory record of the document to be processed contains custom property files. Before the preset dark watermark information is recorded in the custom attribute file, the custom attribute file can be removed, the influence of the original custom attribute file on the writing of the preset dark watermark information is reduced, and the accuracy of adding the preset dark watermark information is improved.
After the custom property file is purged, step S82 may be performed to add the custom property file at the end of the central directory record of the document to be processed.
Step S83, recording preset dark watermark information in the custom attribute file.
Steps S82 and S83 are refinements to step S32.
Step S84, the custom property file is moved to the tail of the data area. Step S84 is the same as step S33.
In the embodiment of the disclosure, for a document which does not contain a custom attribute file, the custom attribute file is added at the end of a central directory record of the document, and then dark watermark information is preset in the custom attribute file record. The method and the device realize the addition of the dark watermark information outside the data area of the document which does not contain the custom attribute file, reduce the influence on the data area of the original document and reduce the cost of adding the dark watermark information.
In one embodiment of the present disclosure, the embodiment of the present disclosure further provides a document processing method, as shown in fig. 9, which may include the following steps.
Step S91, locating the tail of a data area of the document to be processed, wherein the data area is an area accessible based on information recorded in the document header of the document to be processed. Step S91 is the same as step S11.
Step S92, recording preset dark watermark information in the custom attribute file contained in the central directory record. Step S92 is the same as step S32.
Step S93, the custom property file is moved to the tail of the data area. Step S93 is the same as step S33.
In step S94, the first type file and the second type file contained in the central directory record are moved to the tail of the data area.
The central directory record contains a first type of file and a second type of file. The first type file is used for recording the names and types of the files, and the second type file is used for recording the relations among the files. That is, according to the first type file and the second type file, the directory address of the custom attribute file can be determined, so that the addition of the dark watermark information is realized.
In the embodiment of the disclosure, the first type file and the second type file may be located in front of the custom property file or may be located behind the custom property file. This is not limited.
In the embodiment of the disclosure, besides moving the custom attribute file to the tail part of the data area, the first type file and the second type file can also be moved to the tail part of the data area, so that the integral structure of the central directory record outside the data area is adjusted, and the security and confidentiality of the document can be effectively improved.
For example, a document structure as shown in fig. 10. In FIG. 10, in document 1, rels under the constructs [ content_types ]. Xml and_rels directory. The [ content_types ]. Xml and rels indicate that the custom.xml under the docProps directory is at the end of the central directory record, the custom.xml under the docProps directory is constructed, the preset dark watermark information is recorded in the custom.xml, and the [ content_types ]. Xml, rels and custom.xml with the preset dark watermark information recorded are moved to the tail of the data area, so that the document 2 is obtained.
In one embodiment of the present disclosure, the embodiment of the present disclosure further provides a document processing method, as shown in fig. 11, which may include the following steps.
Step S111, locating the tail of a data area of the document to be processed, wherein the data area is an area accessible based on information recorded in the document header of the document to be processed. Step S111 is the same as step S11.
Step S112, recording preset dark watermark information in the custom attribute file contained in the central directory record. Step S112 is the same as step S32.
Step S113, the custom property file is moved to the tail of the data area. Step S113 is the same as step S33.
In step S114, the first type file and the second type file contained in the central directory record are moved to the tail of the data area. Step S114 is the same as step S94.
Step S115, according to the size and address of the moved custom attribute file, and the size and address of the moved first type file and second type file, updating the information recorded in the end data block of the central directory record and the information recorded in the central directory record.
The directory address of the custom property file is determined according to the first type file and the second type file, and the size and address of each of the custom property file, the first type file and the second type file are contained in the central directory record.
After the preset dark watermark information is recorded in the custom attribute file, the size of the custom attribute file is changed. After the custom property file, the first type file, and the second type file are moved, addresses of the custom property file, the first type file, and the second type file are changed. Thus, the size and address of the central directory record will change, as will the addresses of the first type of file and the second type of file.
In the embodiment of the disclosure, after the custom attribute file, the first type file and the second type file are moved, the information recorded in the end data block of the central directory record and the information recorded in the central directory record are updated, so that the follow-up accurate extraction of the preset dark watermark information from the new document can be ensured.
In embodiments of the present disclosure, the updated data in the end of center directory record data block may include the size and address of the center directory record. The data updated in the central directory record may include offset information for each data block. Such as the offset information of the custom property file, the first type file, and the second type file described above.
The following describes the document processing flow shown in fig. 12 in detail, taking the custom property file as an annotation field as an example.
Step S121 clears the comment field in the end data block of the central directory record.
See the description of step S82 section.
Step S122, the information recorded in the end data block of the central directory record is analyzed to obtain the size and address of the central directory record.
Step S123, the information recorded in the central directory record is analyzed to obtain a first type file, a second type file and a third type file.
Step S124, reconstruct the document.
Wherein the reconstruction of the document can be seen in the schematic diagrams shown in fig. 4 and 10.
The description of the above steps S121-124 is relatively simple, and specific reference may be made to the descriptions of fig. 3-11, which are not repeated here.
In one embodiment of the present disclosure, to facilitate accurate determination of the end of a data area, the embodiments of the present disclosure also provide a document processing method, as shown in fig. 13, which may include the following steps.
Step S131, reading the end identification of the data area in the document to be processed; wherein the end identifier is used for indicating the tail of the data area.
In the embodiment of the disclosure, after the document to be processed is obtained, the document to be processed is analyzed, and the end identification of the data area in the document to be processed is read. The position of the end mark is the tail of the data area.
The end identifiers of the documents with different structure types can be the same or different. For example, "%" EOF "is used as an end flag in a document of the character creation structure type. In the embodiments of the present disclosure, the end identifier is not limited.
Step S131 is a refinement of step S11.
Step S132, recording preset dark watermark information at the tail of the data area to obtain a new document containing the preset dark watermark information, wherein the preset dark watermark information is a preset character string.
In the embodiment of the disclosure, after the end mark is read, the position where the end mark is located can be taken as the tail part of the data area, and the preset dark watermark information is added after the end mark, so that the efficiency of adding the preset dark watermark information is improved.
Corresponding to the above document processing method, the embodiment of the present disclosure further provides a document processing apparatus, as shown in fig. 14, including:
the positioning module 141 is configured to position a tail portion of a data area of the document to be processed, where the data area is an area accessible based on information recorded in a document header of the document to be processed;
the recording module 142 is configured to record preset dark watermark information at the tail of the data area, and obtain a new document containing the preset dark watermark information, where the preset dark watermark information is a preset character string.
Wherein, the tail part of the data area comprises a central directory record; recording module 142, may include:
the recording sub-module is used for recording preset dark watermark information in the custom attribute file contained in the central directory record;
and the moving submodule is used for moving the custom attribute file to the tail part of the data area.
The recording submodule is specifically configured to:
analyzing the end data block of the central directory record of the document to be processed to obtain the size and address of the central directory record;
Analyzing the size and the address of the central directory record, and determining the directory address of the custom attribute file;
and recording preset dark watermark information in the custom attribute file under the directory address.
The recording submodule is specifically configured to:
modifying the length of the custom attribute file under the directory address to the length of preset dark watermark information;
and recording preset dark watermark information in the custom attribute file based on the length of the modified custom attribute file.
The recording submodule is specifically configured to:
adding a custom attribute file at the end of a central directory record of a document to be processed, wherein the length of the custom attribute file is the length of preset dark watermark information;
and recording preset dark watermark information in the custom attribute file.
The central directory record comprises a first type file and a second type file, wherein the first type file is used for recording the names and types of the files, and the second type file is used for recording the relation among the contained files; recording module, further operable to:
and moving the first type file and the second type file contained in the central directory record to the tail part of the data area.
Wherein, the above-mentioned document processing apparatus may further include:
And the updating module is used for updating the information recorded in the end data block of the central directory record and the information recorded in the central directory record according to the size and the address of the moved custom attribute file and the sizes and the addresses of the moved first type file and the moved second type file after moving the custom attribute file to the tail part of the data area.
The custom attribute file is an annotation field.
The positioning module 141 may specifically be configured to read an end identifier of a data area in the document to be processed, where the end identifier is used to indicate a tail of the data area.
In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium, and a computer program product.
Fig. 15 illustrates a schematic block diagram of an example electronic device 1500 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 15, the apparatus 1500 includes a computing unit 1501, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1502 or a computer program loaded from a storage unit 1508 into a Random Access Memory (RAM) 1503. In the RAM 1503, various programs and data required for the operation of the device 1500 may also be stored. The computing unit 1501, the ROM1502, and the RAM 1503 are connected to each other through a bus 1504. An input/output (I/O) interface 1505 is also connected to bus 1504.
Various components in device 1500 are connected to I/O interface 1505, including: an input unit 1506 such as a keyboard, mouse, etc.; an output unit 1507 such as various types of displays, speakers, and the like; a storage unit 1508 such as a magnetic disk, an optical disk, or the like; and a communication unit 1509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 1509 allows the device 1500 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks.
The computing unit 1501 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The calculation unit 1501 performs the respective methods and processes described above, for example, a document processing method. For example, in some embodiments, the document processing method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 1508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 1500 via the ROM1502 and/or the communication unit 1509. When a computer program is loaded into the RAM 1503 and executed by the computing unit 1501, one or more steps of the document processing method described above may be performed. Alternatively, in other embodiments, the computing unit 1501 may be configured to perform the document processing method by any other suitable means (e.g., by means of firmware).
According to an embodiment of the present disclosure, there is further provided an electronic device, as shown in fig. 16, including:
at least one processor 1601; and
a memory 1602 communicatively coupled to the at least one processor 1601; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory 1602 stores instructions executable by the at least one processor 1601 to enable the at least one processor 1601 to perform any of the document processing methods described above.
According to an embodiment of the present disclosure, there is also provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the document processing method according to any one of the above.
According to an embodiment of the present disclosure, there is also provided a computer program product comprising a computer program which, when executed by a processor, implements a document processing method according to any of the above.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (13)

1. A document processing method, comprising:
positioning the tail part of a data area of a document to be processed, wherein the data area is an accessible area based on information recorded in a document head of the document to be processed;
recording preset dark watermark information at the tail part of the data area to obtain a new document containing the preset dark watermark information, wherein the preset dark watermark information is a preset character string;
a central directory record is included at the tail of the data area; the step of recording preset dark watermark information at the tail of the data area comprises the following steps:
recording preset dark watermark information in a custom attribute file contained in the central directory record;
moving the custom attribute file to the tail part of the data area;
the central directory record comprises a first type file and a second type file, the first type file is used for recording the names and types of the files, and the second type file is used for recording the relation among the contained files; the method further comprises the steps of:
moving the first type file and the second type file contained in the central directory record to the tail part of a data area;
after moving the custom property file to the end of the data area, further comprising:
And updating the information recorded in the end data block of the central directory record and the information recorded in the central directory record according to the size and the address of the moved custom attribute file and the sizes and the addresses of the moved first type file and the second type file.
2. The method of claim 1, wherein the step of recording preset dark watermark information in a custom property file contained in the central directory record comprises:
analyzing the end data block of the central directory record of the document to be processed to obtain the size and address of the central directory record;
analyzing the size and the address of the central directory record, and determining the directory address of the custom attribute file;
and recording preset dark watermark information in the custom attribute file under the directory address.
3. The method of claim 2, wherein the step of recording preset dark watermark information in the custom property file under the directory address comprises:
modifying the length of the custom attribute file under the catalog address to be the length of preset dark watermark information;
and recording the preset dark watermark information in the custom attribute file based on the length of the custom attribute file after modification.
4. The method of claim 1, wherein the step of recording preset dark watermark information in a custom property file contained in the central directory record comprises:
adding a custom attribute file at the end of the central directory record, wherein the length of the custom attribute file is the length of preset dark watermark information;
and recording the preset dark watermark information in the custom attribute file.
5. The method of any of claims 2-4, wherein the custom properties file is an annotation field.
6. The method of claim 1, wherein the step of locating the end of the data field of the document to be processed comprises:
and reading an end identifier of the data area in the document to be processed, wherein the end identifier is used for indicating the tail part of the data area.
7. A document processing apparatus comprising:
the positioning module is used for positioning the tail part of a data area of the document to be processed, wherein the data area is an accessible area based on information recorded in the document head of the document to be processed;
the recording module is used for recording preset dark watermark information at the tail part of the data area to obtain a new document containing the preset dark watermark information, wherein the preset dark watermark information is a preset character string;
A central directory record is included at the tail of the data area; the recording module comprises:
the recording sub-module is used for recording preset dark watermark information in the custom attribute file contained in the central directory record;
a moving sub-module, configured to move the custom attribute file to the tail of the data area;
the central directory record comprises a first type file and a second type file, the first type file is used for recording the names and types of the files, and the second type file is used for recording the relation among the contained files; the recording module is further configured to:
moving the first type file and the second type file contained in the central directory record to the tail part of a data area;
the apparatus further comprises:
and the updating module is used for updating the information recorded in the end data block of the central directory record and the information recorded in the central directory record according to the size and the address of the moved custom attribute file and the sizes and the addresses of the moved first type file and the second type file after moving the custom attribute file to the tail part of the data area.
8. The apparatus of claim 7, wherein the recording sub-module is specifically configured to:
Analyzing the end data block of the central directory record of the document to be processed to obtain the size and address of the central directory record;
analyzing the size and the address of the central directory record, and determining the directory address of the custom attribute file;
and recording preset dark watermark information in the custom attribute file under the directory address.
9. The apparatus of claim 8, wherein the recording sub-module is specifically configured to:
modifying the length of the custom attribute file under the catalog address to be the length of preset dark watermark information;
and recording the preset dark watermark information in the custom attribute file based on the length of the custom attribute file after modification.
10. The apparatus of claim 7, wherein the recording sub-module is specifically configured to:
adding a custom attribute file at the end of the central directory record, wherein the length of the custom attribute file is the length of preset dark watermark information;
and recording the preset dark watermark information in the custom attribute file.
11. The apparatus of any of claims 7-10, wherein the custom properties file is an annotation field.
12. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.
13. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-6.
CN202110982776.8A 2021-08-25 2021-08-25 Document processing method, apparatus, device, storage medium, and computer program product Active CN113656836B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110982776.8A CN113656836B (en) 2021-08-25 2021-08-25 Document processing method, apparatus, device, storage medium, and computer program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110982776.8A CN113656836B (en) 2021-08-25 2021-08-25 Document processing method, apparatus, device, storage medium, and computer program product

Publications (2)

Publication Number Publication Date
CN113656836A CN113656836A (en) 2021-11-16
CN113656836B true CN113656836B (en) 2023-10-27

Family

ID=78492830

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110982776.8A Active CN113656836B (en) 2021-08-25 2021-08-25 Document processing method, apparatus, device, storage medium, and computer program product

Country Status (1)

Country Link
CN (1) CN113656836B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106775465A (en) * 2016-12-03 2017-05-31 乐视控股(北京)有限公司 Date storage method, device and electronic equipment
CN110674477A (en) * 2019-09-24 2020-01-10 北京溯斐科技有限公司 Document source tracing method and device based on electronic file security identification
CN112529759A (en) * 2020-12-22 2021-03-19 北京百度网讯科技有限公司 Document processing method, device, equipment, storage medium and computer program product

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010124451A (en) * 2008-10-24 2010-06-03 Canon Inc Document processing apparatus, and document processing method
US10169552B2 (en) * 2015-07-17 2019-01-01 Box, Inc. Event-driven generation of watermarked previews of an object in a collaboration environment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106775465A (en) * 2016-12-03 2017-05-31 乐视控股(北京)有限公司 Date storage method, device and electronic equipment
CN110674477A (en) * 2019-09-24 2020-01-10 北京溯斐科技有限公司 Document source tracing method and device based on electronic file security identification
CN112529759A (en) * 2020-12-22 2021-03-19 北京百度网讯科技有限公司 Document processing method, device, equipment, storage medium and computer program product

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种抗打印扫描的暗水印方法;孙芳等;第三十届中国(天津)2016IT、网络、信息技术、电子、仪器仪表创新学术会议;全文 *
基于PDF文档结构的数字水印算法;钟征燕;郭燕慧;徐国爱;;计算机应用(第10期);全文 *

Also Published As

Publication number Publication date
CN113656836A (en) 2021-11-16

Similar Documents

Publication Publication Date Title
US9588952B2 (en) Collaboratively reconstituting tables
CN111199054B (en) Data desensitization method and device and data desensitization equipment
US9026612B2 (en) Generating a custom parameter rule based on a comparison of a run-time value to a request URL
WO2015196981A1 (en) Method and device for recognizing picture junk files
CN107111649B (en) Uploading user and system data from a source location to a destination location
CN108182221B (en) Data processing method and related equipment
CN114969840A (en) Data leakage prevention method and device
US9646157B1 (en) Systems and methods for identifying repackaged files
CN113656836B (en) Document processing method, apparatus, device, storage medium, and computer program product
CN113268453A (en) Log information compression storage method and device
CN113760894A (en) Data calling method and device, electronic equipment and storage medium
CN111914517A (en) Document hyperlink creating method and device, electronic equipment and readable storage medium
CN115982675A (en) Document processing method, device, electronic equipment and storage medium
CN115328736A (en) Probe deployment method, device, equipment and storage medium
CN114329149A (en) Detection method and device for automatically capturing page information, electronic equipment and readable storage medium
US10776376B1 (en) Systems and methods for displaying search results
CN108628909B (en) Information pushing method and device
CN107741992B (en) Network storage method and device for conference records, intelligent tablet and storage medium
CN112529759B (en) Document processing method, apparatus, device, storage medium, and computer program product
CN115004623A (en) Protecting encryption keys
CN112784596A (en) Method and device for identifying sensitive words
US20180143960A1 (en) Modifying Tabular Data to be Rendered on a Display Device
CN114416663A (en) Electronic file control method and device, electronic equipment and medium
CN113591440B (en) Text processing method and device and electronic equipment
CN111460273B (en) Information pushing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant