CN114357524B - Electronic document processing method and device - Google Patents

Electronic document processing method and device Download PDF

Info

Publication number
CN114357524B
CN114357524B CN202210218193.2A CN202210218193A CN114357524B CN 114357524 B CN114357524 B CN 114357524B CN 202210218193 A CN202210218193 A CN 202210218193A CN 114357524 B CN114357524 B CN 114357524B
Authority
CN
China
Prior art keywords
electronic document
tracing
information
target electronic
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210218193.2A
Other languages
Chinese (zh)
Other versions
CN114357524A (en
Inventor
李继国
章勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Eetrust Technology Co ltd
Original Assignee
Eetrust Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Eetrust Technology Co ltd filed Critical Eetrust Technology Co ltd
Priority to CN202210218193.2A priority Critical patent/CN114357524B/en
Publication of CN114357524A publication Critical patent/CN114357524A/en
Application granted granted Critical
Publication of CN114357524B publication Critical patent/CN114357524B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The application discloses a method and a device for processing an electronic document. The method comprises the following steps: acquiring host information of terminal equipment and object information of a target object corresponding to the terminal equipment, wherein the terminal equipment is used for running an application program for operating an electronic document; detecting whether a source tracing element having an association relation with a target object exists in a first target electronic document, wherein the first target electronic document is an electronic document operated by an application program at the current moment, and the source tracing element is used for tracing the leakage process of the first target electronic document; under the condition that the tracing element does not exist in the first target electronic document, generating the tracing element by utilizing the host information and the object information; adding the traceability element to the first target electronic document. The method and the device solve the technical problems that the existing tracing method for the electronic document leakage influences the content display and reading of the electronic document, the document processing is complex, the document editing is influenced, and the success rate of the implicit information extraction is low.

Description

Electronic document processing method and device
Technical Field
The application relates to the field of electronic document processing and tracing, in particular to a method and a device for processing an electronic document.
Background
The information construction and application of enterprises and public institutions can generate a large number of electronic documents (mainly comprising stream and format electronic documents such as doc/docx, xls/xlsx, ppt/pptx, wps, et, dps, pdf, ofd and the like), which are important digital assets, and some electronic documents relate to sensitive information such as trade secrets and the like, and the information is also exposed to the security risk of information leakage caused by illegal outflow in the processes of storage, processing, circulation and use. In order to trace back and trace blame for the disclosure of an electronic document, visible or invisible information related to the current user is usually added to the electronic document.
The current ways to add visible information to an electronic document include the following: one is that when the computer opens the electronic document, the display of the text information related to the current user is floated on the computer system desktop or the document processing program interface; and the other method is to add the visible information related to the current user in the electronic document in the modes of shading, two-dimensional codes, visible character watermarks and the like. The tracing mode of adding visible information is visual, is beneficial to leakage tracing such as photographing and screen capturing, but brings certain influence on reading of the electronic document, and meanwhile, for the floating watermark mode, after the electronic document is copied or sent to the outside of an enterprise and public institution, no traceable information exists; the method of embedding visible information in an electronic document also has a problem that the embedded information is relatively easy to remove or destroy and cannot be traced.
At present, invisible information is added to an electronic document in a mode of embedding hidden information mainly in a mode of slightly changing the size and the distance of characters or changing character fonts and the like, and then the hidden information is extracted in an image processing mode.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the application provides a method and a device for processing an electronic document, which are used for at least solving the technical problems that the existing tracing method for leaking the electronic document influences the content display and reading of the electronic document, the document processing is complex, the document editing is influenced, and the success rate of extracting implicit information is low.
According to an aspect of an embodiment of the present application, there is provided a method for processing an electronic document, including: acquiring host information of terminal equipment and object information of a target object corresponding to the terminal equipment, wherein the terminal equipment is used for running an application program for operating an electronic document; detecting whether a source tracing element having an association relation with a target object exists in a first target electronic document, wherein the first target electronic document is an electronic document operated by an application program at the current moment, and the source tracing element is used for tracing the leakage process of the first target electronic document; under the condition that the tracing element does not exist in the first target electronic document, generating the tracing element by utilizing the host information and the object information; adding the traceability element to the first target electronic document.
Optionally, detecting whether a traceability element having an association relationship with the target object exists in the first target electronic document includes: reading the content of a first target electronic document; analyzing the content according to the type of the first target electronic document to obtain a format structure corresponding to the first target electronic document; searching the tracing element in the format structure according to the name prefix of the tracing element; under the condition that the tracing element is found in the format structure, detecting whether the tracing element is a tracing element having an association relation with the target object; and under the condition that the traceability elements are not found in the format structure, determining that the traceability elements do not exist in the first target electronic document.
Optionally, detecting whether the tracing element is a tracing element having an association relationship with the target object includes: comparing the feature code corresponding to the target object with a first character string positioned behind the name prefix in the tracing element; if the feature code is the same as the first character string, determining that the tracing element is a tracing element having an association relation with the target object; and if the feature code is different from the first character string, determining that the tracing element is not the tracing element having the association relation with the target object.
Optionally, the tracing element further includes a second character string, and the second character string is generated by: storing the host information, the object information and the time information at the current moment according to a preset file format to obtain traceability information; encrypting the tracing information by using a preset algorithm to obtain the encrypted tracing information; and coding the encrypted tracing information according to a preset coding mode to obtain a second character string, wherein the second character string is an attribute value of the tracing element.
Optionally, adding the traceability element to the first target electronic document comprises: determining a tracing element adding method corresponding to the type of the first target electronic document, and acquiring a name prefix of a tracing element, a first character string corresponding to a feature code of a current target object and a second character string corresponding to tracing information; determining a format structure corresponding to the type of the first target electronic document; if the first target electronic document does not have the traceability element, adding a name prefix of the traceability element, a first character string corresponding to the feature code of the current target object and a second character string corresponding to the traceability information in a format structure of the first target electronic document according to a traceability element adding method; and if the tracing element in the first target electronic document is not the tracing element having the incidence relation with the target object, replacing the name prefix, the feature code and the current tracing information of the current tracing element in the first target electronic document by using the name prefix of the tracing element, the first character string corresponding to the feature code of the current target object and the second character string corresponding to the tracing information respectively.
Optionally, before detecting whether a tracing element having an association relationship with the target object exists in the first target electronic document, the method further includes: running a file system kernel driver by using the terminal equipment, wherein the file system kernel driver is automatically started along with an operating system run by the terminal equipment; acquiring an opening request of an application program to a first target electronic document by using a file system kernel driver; and acquiring path information of the first target electronic document in the opening request, and preventing the first target electronic document from being opened, wherein the path information is used for acquiring the content of the first target electronic document.
Optionally, the host information includes at least one of: host name, IP address, MAC address, operating system type, operating system version number and operating system user name; the object information includes at least one of: the name of the user account, the name of the user, and information of the organization to which the user belongs.
According to another aspect of the embodiments of the present application, there is provided another method for processing an electronic document, including: acquiring a second target electronic document, wherein the second target electronic document is an electronic document with information leakage; reading the content of the second target electronic document, and analyzing the content according to the type of the second target electronic document to obtain a format structure corresponding to the second target electronic document; searching a tracing element in a format structure according to the name prefix of the tracing element, wherein the tracing element is used for tracing the leakage process of the electronic document; acquiring a character string in the tracing element, wherein the character string is an attribute value of the tracing element; and determining object information of a target object for operating the second target electronic document and host information of the terminal equipment according to the character string.
According to another aspect of the embodiments of the present application, there is also provided an apparatus for processing an electronic document, including: the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring host information of terminal equipment and object information of a target object corresponding to the terminal equipment, and the terminal equipment is used for running an application program for operating an electronic document; the detection module is used for detecting whether a traceability element having an association relation with a target object exists in a first target electronic document, wherein the first target electronic document is an electronic document operated by an application program at the current moment, and the traceability element is used for tracing the leakage process of the first target electronic document; the generating module is used for generating the traceability element by utilizing the host information and the object information under the condition that the traceability element does not exist in the first target electronic document; and the adding module is used for adding the source tracing element to the first target electronic document.
According to still another aspect of the embodiments of the present application, there is also provided a nonvolatile storage medium including a stored program, wherein a device in which the nonvolatile storage medium is located is controlled to execute the above processing method of an electronic document when the program is executed.
According to yet another aspect of the embodiments of the present application, there is also provided a processor configured to execute a program stored in a memory, wherein the program executes the above processing method for an electronic document.
In the embodiment of the application, acquiring host information of terminal equipment and object information of a target object corresponding to the terminal equipment, wherein the terminal equipment is used for running an application program for operating an electronic document; detecting whether a source tracing element having an association relation with a target object exists in a first target electronic document, wherein the first target electronic document is an electronic document operated by an application program at the current moment, and the source tracing element is used for tracing the leakage process of the first target electronic document; under the condition that the source tracing element does not exist in the first target electronic document, generating the source tracing element by utilizing the host information and the object information; the manner in which the traceback element is added to the first target electronic document is determined by parsing the electronic document format structure, detecting, adding and updating invisible tracing elements in a document format structure according to a uniform name specification, and the tracing information related to the current user identity and the host thereof is encrypted and coded and then assigned to the tracing element, thereby realizing the embedding, detection and extraction of the tracing information of the electronic document, thereby realizing the embedding of a large amount of tracing information, the rapid detection of the tracing information based on the name specification and the error-free extraction, not only needing complex graphic image processing, but also not influencing the technical effect of the normal use operation of the electronic document, thereby solving the problem that the prior tracing method for the leakage of the electronic document influences the content display and reading of the electronic document, and the document processing is complex, the document editing is influenced, and the success rate of extracting the implicit information is low.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 shows a hardware configuration block diagram of a computer terminal (or mobile device) for implementing a processing method of an electronic document;
FIG. 2 is a flow chart of a method of processing an electronic document according to an embodiment of the present application;
FIG. 3 is a flow chart of another method of processing an electronic document according to an embodiment of the present application;
fig. 4 is a block diagram of a processing apparatus of an electronic document according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
According to an embodiment of the present application, there is provided an embodiment of a method for processing an electronic document, it should be noted that the steps shown in the flowchart of the drawings may be executed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flowchart, in some cases, the steps shown or described may be executed in an order different from that here.
The method provided by the first embodiment of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Fig. 1 shows a hardware configuration block diagram of a computer terminal (or mobile device) for implementing a processing method of an electronic document. As shown in fig. 1, the computer terminal 10 (or mobile device 10) may include one or more (shown as 102a, 102b, … …, 102 n) processors 102 (the processors 102 may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA, etc.), a memory 104 for storing data, and a transmission device 106 for communication functions. Besides, the method can also comprise the following steps: a display, an input/output interface (I/O interface), a Universal Serial BUS (USB) port (which may be included as one of the ports of the BUS), a network interface, a power source, and/or a camera. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
It should be noted that the one or more processors 102 and/or other data processing circuitry described above may be referred to generally herein as "data processing circuitry". The data processing circuitry may be embodied in whole or in part in software, hardware, firmware, or any combination thereof. Further, the data processing circuit may be a single stand-alone processing module, or incorporated in whole or in part into any of the other elements in the computer terminal 10 (or mobile device). As referred to in the embodiments of the application, the data processing circuit acts as a processor control (e.g. selection of a variable resistance termination path connected to the interface).
The memory 104 may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the control method of the home appliance in the embodiment of the present application, and the processor 102 executes various functional applications and data processing by running the software programs and modules stored in the memory 104, so as to implement the processing method of the electronic document. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 can be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with the user interface of the computer terminal 10 (or mobile device).
Fig. 2 is a flowchart of a processing method of an electronic document according to an embodiment of the present application, and as shown in fig. 2, the method includes the steps of:
step S202, acquiring host information of terminal equipment and object information of a target object corresponding to the terminal equipment, wherein the terminal equipment is used for running an application program for operating an electronic document;
the terminal device may be a PC terminal, such as a computer device; and the mobile terminal device can also be mobile terminal devices such as tablet computers, mobile phones and the like.
In this step, host information of the current terminal device and object information of a user of the terminal device are acquired through a client installed in the terminal device.
According to an optional embodiment of the present application, the host information includes at least one of: host name, IP address, MAC address, operating system type, operating system version number and operating system user name; the object information includes at least one of: the name of the user account, the name of the user, and information of the organization to which the user belongs.
As an alternative embodiment, when step S202 is executed, a client program is installed on the user computer terminal, the user registers and logs in a client (syclient. exe), and the client collects information such as a host name, an IP address, an MAC address, an operating system type and version, an operating system user, and information such as a user account, a name, an affiliated institution of the current user of the computer terminal.
Step S204, detecting whether a source tracing element having an association relation with a target object exists in a first target electronic document, wherein the first target electronic document is an electronic document operated by an application program at the current moment, and the source tracing element is used for tracing the leakage process of the first target electronic document;
An application program for operating the electronic document, such as Word, Excel, PowerPoint, wps, et, wpp, acrobatearer, SuwellReader and the like, can run on the terminal device.
In this step, the content of the electronic document is read through the client, and whether the tracing element related to the current user is added in the electronic document is detected.
Step S206, under the condition that the source tracing element does not exist in the first target electronic document, generating the source tracing element by utilizing the host information and the object information;
step S208, adding the source tracing element to the first target electronic document.
Through the steps, through analyzing the electronic document format structure, the invisible traceability elements are detected, added and updated in the document format structure according to the unified name standard, the traceability information related to the current user identity and the host thereof is encrypted and coded and then assigned to the traceability elements, and the embedding, the detection and the extraction of the traceability information of the electronic document are realized, so that the embedding of a large amount of traceability information, the quick detection and the error-free extraction of the traceability information based on the name standard are realized, the complex graphic image processing is not needed, and the technical effect of the normal use operation of the electronic document is not influenced.
According to an optional embodiment of the present application, step S204 is executed to detect whether a tracing element having an association relationship with a target object exists in a first target electronic document, and the method is implemented by: reading the content of a first target electronic document; analyzing the content according to the type of the first target electronic document to obtain a format structure corresponding to the first target electronic document; searching the tracing element in the format structure according to the name prefix of the tracing element; under the condition that the tracing element is found in the format structure, detecting whether the tracing element is a tracing element having an incidence relation with the target object; and under the condition that the source tracing element is not found in the format structure, determining that the source tracing element does not exist in the first target electronic document.
The client reads the content of the electronic document, and according to different types (doc/docx/wps, xls/xlsxx/et, ppt/pptx/dps, pdf and ofd) of the electronic document, the client analyzes the format structure of the electronic document, as shown in table 1:
TABLE 1
Figure DEST_PATH_IMAGE001
Next, the client searches the related trace elements in the format structure of the electronic document according to the name prefixes (e.g., ETS _ SY _) of the trace elements, and if not, it indicates that the trace elements have not been added to the electronic document. If the tracing element is found, whether the found tracing element is the tracing element related to the current user is further detected.
By the method, the technical effect of detecting whether the source tracing element exists in the electronic document can be achieved.
According to another optional embodiment of the present application, detecting whether a tracing element is a tracing element having an association relationship with a target object includes the following steps: comparing the feature code corresponding to the target object with a first character string positioned behind the name prefix in the tracing element; if the feature code is the same as the first character string, determining that the tracing element is a tracing element having an association relation with the target object; and if the feature code is different from the first character string, determining that the tracing element is not the tracing element having the association relation with the target object.
If the tracing element is found, comparing the MD5 value (represented by upper case hexadecimal character string) of the current user account with the character string of the feature code (i.e. the first character string in the above) behind the name prefix (ETS _ SY _) of the tracing element, if the MD5 value is the same as the character string of the feature code, indicating that the tracing information related to the current user is added, and directly opening the current electronic document by the client and ending. If not, the tracing element and the tracing information related to the current user need to be updated.
By the method, the technical effect of detecting whether the source tracing elements related to the user exist in the electronic document can be achieved.
In some optional embodiments of the present application, the source tracing element further includes a second character string, and the second character string is generated by: storing the host information, the object information and the time information of the current moment according to a preset file format to obtain traceability information; encrypting the tracing information by using a preset algorithm to obtain the encrypted tracing information; and coding the encrypted tracing information according to a preset coding mode to obtain a second character string, wherein the second character string is an attribute value of the tracing element.
In this step, the client organizes the acquired information such as the host information, the object information, the current time and the like according to an xml format, performs base64 encoding after encryption processing, assigns the obtained encoded information string to a tracing element for subsequent tracing information extraction, saves the electronic document content and opens the electronic document. Mainly comprises the following steps:
the first step is as follows: the method comprises the following steps of forming tracing information by object information of a current user, host information and operation time in an xml mode, only reserving a person who operates a document last and information of the person according to requirements, and reserving all persons who operate the document before and information lists of the persons according to a time reverse order, wherein the format is as follows:
<infoList>
<info>
<user>
<userid>zhangs</userid>
<username>Zhang San</username>
<organization>office-XX Corp</organization>
</user>
<machine>
<ip>192.168.1.10</ip>
<mac>74:E5:0B:01:10:12</mac>
<hostname>zhangs-PC</hostname>
<os>windows 7</os>
<account>zhangs</account>
</machine>
<timestamp>2021-11-01 10:03:32</timestamp>
</info>
......
</infoList>
The second step is that: the tracing information in the xml format is encrypted by using a specified key and an algorithm (such as SM 4);
the third step: for the information which is output by encryption, base64 encoding is carried out to generate a corresponding text string (namely, the second string in the above);
the fourth step: and assigning the encrypted and coded text character string as an attribute value to a corresponding tracing element, storing the electronic document and opening the electronic document, thereby completing the addition/update of the tracing information of the electronic document.
The traceability information generated by the method is higher in confidentiality and is more difficult to remove or destroy, and therefore a better traceability effect can be achieved in the divulgence process of the electronic document.
In other alternative embodiments of the present application, the step S208 of adding the traceability element to the first target electronic document includes the following steps: determining a tracing element adding method corresponding to the type of the first target electronic document, and acquiring a name prefix of a tracing element, a first character string corresponding to a feature code of a current target object and a second character string corresponding to tracing information; determining a format structure corresponding to the type of the first target electronic document; if the first target electronic document does not have the traceability element, adding a name prefix of the traceability element, a first character string corresponding to the feature code of the current target object and a second character string corresponding to the traceability information in a format structure of the first target electronic document according to a traceability element adding method; and if the tracing element in the first target electronic document is not the tracing element having the incidence relation with the target object, replacing the name prefix, the feature code and the current tracing information of the current tracing element in the first target electronic document by using the name prefix of the tracing element, the first character string corresponding to the feature code of the current target object and the second character string corresponding to the tracing information respectively.
And the client analyzes the format structure of the electronic document according to different types of electronic documents, and adds/updates the source tracing elements in the format structure. The method comprises the following steps:
the first step is as follows: the client calls the corresponding tracing element adding method respectively according to the extension names of different electronic documents, and introduces a tracing element name and specific tracing information (i.e. text character strings generated in the above text), wherein the tracing element name uses a uniform name prefix (ETS _ SY _), and the tracing information is an MD5 value (represented by upper-case hexadecimal character strings) of the current user account, as shown in table 2:
TABLE 2
Figure 834057DEST_PATH_IMAGE002
The second step is that: respectively acquiring information of a corresponding chapter and paragraph set (ParagraphCollection), a component set (ShapeCollection), a layer (PdfLayer), an entity (OFDENTITY) and the like according to different types of parsing format structures of the electronic document;
the third step: for the format structure information to which no trace element is added, trace elements such as a component (shape), a layer (PdfLayer), and an entity (entity) are added, set to be hidden and invisible, named according to the names of the trace elements that are transmitted, and the attribute values of the trace elements are set as the source information that is transmitted, as shown in table 3:
TABLE 3
Figure DEST_PATH_IMAGE003
And for the tracing elements added in the format structure information, the names of the incoming tracing elements are used for replacing, and the attribute values of the tracing elements are added or replaced by using the incoming tracing information according to the scene requirements.
By the method, the technical effect of embedding a large amount of confidential information in the electronic document by a simple and confidential method can be realized.
According to an optional embodiment of the present application, before step S204 is executed to detect whether a trace element having an association relationship with a target object exists in a first target electronic document, a terminal device is further required to run a file system kernel driver, where the file system kernel driver is automatically started along with an operating system run by the terminal device; acquiring an opening request of an application program to a first target electronic document by using a file system kernel driver; and acquiring path information of the first target electronic document in the opening request, and preventing the first target electronic document from being opened, wherein the path information is used for acquiring the content of the first target electronic document.
As an optional embodiment, a file system kernel driver (syfilter. sys) is installed on a terminal device and automatically started with an operating system, an upper layer application program (such as Word, Excel, PowerPoint, wps, et, wpp, acrobat reader, SuwellReader and the like) is captured at a kernel layer according to a strategy to filter an I/O request packet (IRP _ MJ _ CREATE) opened by a related electronic document, a complete path of the electronic document to be opened is obtained in the request and is notified to a client, and meanwhile, the opening of the current electronic document is prevented.
Fig. 3 is a flowchart of another electronic document processing method according to an embodiment of the application, and as shown in fig. 3, the method includes the following steps:
step S302, a second target electronic document is obtained, wherein the second target electronic document is an electronic document with information leakage;
step S304, reading the content of the second target electronic document, and analyzing the content according to the type of the second target electronic document to obtain a format structure corresponding to the second target electronic document;
step S306, searching the traceability elements in the format structure according to the name prefixes of the traceability elements, wherein the traceability elements are used for tracing the leakage process of the second target electronic document;
step S308, character strings in the tracing elements are obtained, wherein the character strings are attribute values of the tracing elements;
step S310, determining object information of a target object for operating the second target electronic document and host information of the terminal device according to the character string.
Steps S302 to S310 provide a method for extracting tracing information of an electronic document, which can collect an electronic document leaked once the electronic document is illegally transferred to the outside of a unit or uploaded to the internet (e.g. a network disk or a library), and extract tracing information through an electronic document tracing client tool (sytrace. exe), comprising the following steps:
The first step is as follows: reading the acquired electronic document by using a tracing client tool;
the second step: respectively analyzing the format structure of the electronic document according to different types of the electronic document, and acquiring information such as a corresponding chapter and paragraph set (ParagraphiCollection), a component set (ShapeCollection), a layer (PdfLayer), an entity (OFDENTITY) and the like;
the third step: detecting whether the tracing element is added in the document format structure according to the name prefix (ETS _ SY _) of the tracing element;
the fourth step: reading attribute values of the tracing elements to obtain tracing information;
the fifth step: and respectively carrying out base64 decoding, decryption and xml information analysis on the traceability information to obtain and display information of the personnel who finally operate the document and the host computer thereof, thereby tracing and positioning the specific personnel and the host computer which leak the electronic document.
It should be noted that, reference may be made to the description related to the embodiment shown in fig. 2 for a preferred implementation of the embodiment shown in fig. 3, and details are not described here again.
According to the method, at a file system kernel layer of a computer terminal, an upper application program is monitored in real time to open a request packet of an electronic document, an electronic document path is obtained and notified to a client side, the client side reads and analyzes an electronic document format structure, invisible traceability elements are detected, added and updated in the document format structure according to a uniform name standard, traceability information related to the identity of a current user and a host thereof is encrypted and coded and then assigned to the traceability elements, and embedding, detection and extraction of the traceability information of the electronic document are achieved. By analyzing the electronic document format and adding the tracing elements, the embedding of a large amount of required tracing information, the rapid detection of the tracing information based on the name specification and error-free extraction can be realized, and the normal use operation of the electronic document is not influenced without complex graphic image processing.
Fig. 4 is a block diagram of a processing apparatus for an electronic document according to an embodiment of the present application, as shown in fig. 4, the apparatus including:
an obtaining module 40, configured to obtain host information of a terminal device and object information of a target object corresponding to the terminal device, where the terminal device is configured to run an application program that operates an electronic document;
the detection module 42 is configured to detect whether a source tracing element having an association relationship with a target object exists in a first target electronic document, where the first target electronic document is an electronic document operated by using an application program at the current time, and the source tracing element is used to trace a source of a leakage process of the electronic document;
a generating module 44, configured to generate a source tracing element by using the host information and the object information when the source tracing element does not exist in the first target electronic document;
an adding module 46, configured to add the traceability element to the first target electronic document.
It should be noted that, reference may be made to the description related to the embodiment shown in fig. 2 for a preferred implementation of the embodiment shown in fig. 4, and details are not described here again.
The embodiment of the application also provides a nonvolatile storage medium, which comprises a stored program, wherein when the program runs, the device where the nonvolatile storage medium is located is controlled to execute the processing method of the electronic document.
The nonvolatile storage medium stores a program for executing the following functions: acquiring host information of terminal equipment and object information of a target object corresponding to the terminal equipment, wherein the terminal equipment is used for running an application program for operating an electronic document; detecting whether a source tracing element having an association relation with a target object exists in a first target electronic document, wherein the first target electronic document is an electronic document operated by an application program at the current moment, and the source tracing element is used for tracing the leakage process of the first target electronic document; under the condition that the tracing element does not exist in the first target electronic document, generating the tracing element by utilizing the host information and the object information; adding the traceability element to the first target electronic document. Or
Acquiring a second target electronic document, wherein the second target electronic document is an electronic document with information leakage; reading the content of the second target electronic document, and analyzing the content according to the type of the second target electronic document to obtain a format structure corresponding to the second target electronic document; searching a tracing element in a format structure according to the name prefix of the tracing element, wherein the tracing element is used for tracing the leakage process of the electronic document; acquiring a character string in the tracing element, wherein the character string is an attribute value of the tracing element; and determining object information of a target object for operating the second target electronic document and host information of the terminal equipment according to the character string.
The embodiment of the application also provides a processor which is used for running the program stored in the memory, wherein the program runs to execute the processing method of the electronic document.
The processor is used for running a program for executing the following functions: acquiring host information of terminal equipment and object information of a target object corresponding to the terminal equipment, wherein the terminal equipment is used for running an application program for operating an electronic document; detecting whether a source tracing element having an association relation with a target object exists in a first target electronic document, wherein the first target electronic document is an electronic document operated by an application program at the current moment, and the source tracing element is used for tracing the leakage process of the first target electronic document; under the condition that the tracing element does not exist in the first target electronic document, generating the tracing element by utilizing the host information and the object information; adding the traceability element to the first target electronic document. Or
Acquiring a second target electronic document, wherein the second target electronic document is an electronic document with information leakage; reading the content of the second target electronic document, and analyzing the content according to the type of the second target electronic document to obtain a format structure corresponding to the second target electronic document; searching a tracing element in a format structure according to the name prefix of the tracing element, wherein the tracing element is used for tracing the leakage process of the electronic document; acquiring a character string in the tracing element, wherein the character string is an attribute value of the tracing element; and determining object information of a target object for operating the second target electronic document and host information of the terminal equipment according to the character string.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
In the embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to the related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technical content can be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be an indirect coupling or communication connection through some interfaces, units or modules, and may be electrical or in other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims (10)

1. A method for processing an electronic document, comprising:
acquiring host information of terminal equipment and object information of a target object corresponding to the terminal equipment, wherein the terminal equipment is used for running an application program for operating an electronic document;
detecting whether a source tracing element having an association relation with the target object exists in a first target electronic document, wherein the first target electronic document is an electronic document operated by using the application program at the current moment, and the source tracing element is used for tracing the leakage process of the first target electronic document;
generating the tracing element by using the host information and the object information under the condition that the tracing element does not exist in the first target electronic document;
adding the traceability element to the first target electronic document;
detecting whether a traceability element having an association relation with the target object exists in a first target electronic document, wherein the detecting comprises the following steps: reading the content of the first target electronic document; analyzing the content according to the type of the first target electronic document to obtain a format structure corresponding to the first target electronic document; searching the tracing element in the format structure according to the name prefix of the tracing element; under the condition that the tracing element is found in the format structure, detecting whether the tracing element is a tracing element having an association relation with the target object; and determining that the traceability element does not exist in the first target electronic document under the condition that the traceability element is not found in the format structure.
2. The method according to claim 1, wherein detecting whether the tracing element is a tracing element having an association relationship with the target object comprises:
comparing the feature code corresponding to the target object with a first character string positioned behind the name prefix in the tracing element;
if the feature code is the same as the first character string, determining that the tracing element is a tracing element having an association relation with the target object;
and if the feature code is different from the first character string, determining that the tracing element is not a tracing element having an association relation with the target object.
3. The method of claim 2, wherein the traceback element further comprises a second string, the second string generated by:
storing the host information, the object information and the time information of the current moment according to a preset file format to obtain traceability information;
encrypting the tracing information by using a preset algorithm to obtain encrypted tracing information;
and coding the encrypted tracing information according to a preset coding mode to obtain the second character string, wherein the second character string is an attribute value of the tracing element.
4. The method of claim 3, wherein adding the traceable element to the first target electronic document comprises:
determining a tracing element adding method corresponding to the type of the first target electronic document, and acquiring a name prefix of the tracing element, a first character string corresponding to a feature code of a current target object and the second character string corresponding to the tracing information;
determining a format structure corresponding to the type of the first target electronic document;
if the tracing element does not exist in the first target electronic document, adding a name prefix of the tracing element, the first character string corresponding to the feature code of the current target object and the second character string corresponding to the tracing information in a format structure of the first target electronic document according to the tracing element adding method;
if the tracing element in the first target electronic document is not a tracing element having an association relationship with the target object, replacing the name prefix, the feature code and the current tracing information of the current tracing element in the first target electronic document with the name prefix of the tracing element, the first character string corresponding to the feature code of the current target object and the second character string corresponding to the tracing information respectively.
5. The method according to claim 1, wherein before detecting whether a source element having an association relationship with the target object exists in the first target electronic document, the method further comprises:
running a file system kernel driver by using the terminal equipment, wherein the file system kernel driver is automatically started along with an operating system run by the terminal equipment;
acquiring an opening request of the application program for the first target electronic document by utilizing the file system kernel driver;
and acquiring path information of the first target electronic document in the opening request, and preventing the first target electronic document from being opened, wherein the path information is used for acquiring the content of the first target electronic document.
6. The method of claim 1,
the host information includes at least one of: host name, IP address, MAC address, operating system type, operating system version number and operating system user name;
the object information includes at least one of: the name of the user account, the name of the user, and information of the organization to which the user belongs.
7. A method for processing an electronic document, comprising:
Acquiring a second target electronic document, wherein the second target electronic document is an electronic document with information leakage;
reading the content of the second target electronic document, and analyzing the content according to the type of the second target electronic document to obtain a format structure corresponding to the second target electronic document;
searching the tracing element in the format structure according to the name prefix of the tracing element, wherein the tracing element is used for tracing the leakage process of the second target electronic document;
acquiring a character string in the tracing element, wherein the character string is an attribute value of the tracing element;
and determining object information of a target object for operating the second target electronic document and host information of the terminal equipment according to the character string.
8. An apparatus for processing an electronic document, comprising:
the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring host information of terminal equipment and object information of a target object corresponding to the terminal equipment, and the terminal equipment is used for running an application program for operating an electronic document;
the detection module is used for detecting whether a source tracing element having an association relation with the target object exists in a first target electronic document, wherein the first target electronic document is an electronic document operated by using the application program at the current moment, and the source tracing element is used for tracing the leakage process of the electronic document;
A generating module, configured to generate the tracing element by using the host information and the object information when the tracing element does not exist in the first target electronic document;
an adding module, configured to add the source tracing element to the first target electronic document;
the detection module is further used for reading the content of the first target electronic document; analyzing the content according to the type of the first target electronic document to obtain a format structure corresponding to the first target electronic document; searching the tracing element in the format structure according to the name prefix of the tracing element; under the condition that the tracing element is found in the format structure, detecting whether the tracing element is a tracing element having an association relation with the target object; and determining that the traceability element does not exist in the first target electronic document under the condition that the traceability element is not found in the format structure.
9. A non-volatile storage medium, comprising a stored program, wherein a device in which the non-volatile storage medium is located is controlled to execute the processing method of an electronic document according to any one of claims 1 to 7 when the program runs.
10. A processor for executing a program stored in a memory, wherein the program executes to perform the method for processing an electronic document according to any one of claims 1 to 7.
CN202210218193.2A 2022-03-08 2022-03-08 Electronic document processing method and device Active CN114357524B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210218193.2A CN114357524B (en) 2022-03-08 2022-03-08 Electronic document processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210218193.2A CN114357524B (en) 2022-03-08 2022-03-08 Electronic document processing method and device

Publications (2)

Publication Number Publication Date
CN114357524A CN114357524A (en) 2022-04-15
CN114357524B true CN114357524B (en) 2022-06-10

Family

ID=81094478

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210218193.2A Active CN114357524B (en) 2022-03-08 2022-03-08 Electronic document processing method and device

Country Status (1)

Country Link
CN (1) CN114357524B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115795417B (en) * 2023-01-09 2023-04-28 北京亿赛通科技发展有限责任公司 Method and device for tracing OOXML document, electronic equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8196116B2 (en) * 2009-03-31 2012-06-05 International Business Systems Corporation Tracing objects in object-oriented programming model
CN110674477A (en) * 2019-09-24 2020-01-10 北京溯斐科技有限公司 Document source tracing method and device based on electronic file security identification
CN112506883A (en) * 2020-12-03 2021-03-16 深圳市致远速联信息技术有限公司 Document tracing method and device, electronic equipment and storage medium
CN113254408B (en) * 2021-07-13 2021-11-12 北京艾秀信安科技有限公司 Invisible mark adding method, device, medium and electronic equipment

Also Published As

Publication number Publication date
CN114357524A (en) 2022-04-15

Similar Documents

Publication Publication Date Title
Zhang et al. Breaking into the vault: Privacy, security and forensic analysis of Android vault applications
CN102103669B (en) Automatic safe control system and computer implemented method
CN104680077B (en) Method for encrypting picture, method for viewing picture, system and terminal
US8320676B2 (en) Method for configuring camera-equipped electronic devices using an encoded mark
CN111797430B (en) Data verification method, device, server and storage medium
CN105577684A (en) Anti-crawling methods, server, client and system
CN103295046A (en) Method and device for generating and using safe two-dimensional codes
WO2018044918A1 (en) Data transmission using dynamically rendered message content prestidigitation
CN104615917A (en) Picture camouflaging method, picture viewing method, system and terminal
CN104680078A (en) Method and system for taking photos and checking images and terminal
CN101625752B (en) Image processing apparatus and image processing method
CN114357524B (en) Electronic document processing method and device
CN115344835A (en) Picture processing method, storage medium and computer terminal
CN114626079A (en) File viewing method, device, equipment and storage medium based on user permission
CN108319821A (en) A kind of software activation method and device
TW201738802A (en) A removable security device and a method to prevent unauthorized exploitation and control access to files
CN107729345B (en) Website data processing method and device, website data processing platform and storage medium
CN115098877A (en) File encryption and decryption method and device, electronic equipment and medium
WO2021158778A1 (en) Systems and methods for encoding executable code in barcodes
KR102025659B1 (en) Smart watch and security input system thereof
WO2020240637A1 (en) Learning device, determination device, learning method, determination method, learning program, and determination program
CN112637635A (en) File security method and system, computer readable storage medium and processor
CN112579958A (en) Webpage conversion method and device, computer equipment and readable storage medium
CN112825093A (en) Security baseline checking method, host, server, electronic device and storage medium
RU2739936C1 (en) Method of adding digital labels to digital image and apparatus for realizing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant