CN114611471A - Electronic document reading method and device, electronic equipment and storage medium - Google Patents

Electronic document reading method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114611471A
CN114611471A CN202210294109.5A CN202210294109A CN114611471A CN 114611471 A CN114611471 A CN 114611471A CN 202210294109 A CN202210294109 A CN 202210294109A CN 114611471 A CN114611471 A CN 114611471A
Authority
CN
China
Prior art keywords
document
signature
data
electronic
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210294109.5A
Other languages
Chinese (zh)
Inventor
祝红瑞
杨振燕
王志辉
马广伟
李一帆
曾祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Digital Certificate Authority Center Co ltd
Original Assignee
Shenzhen Digital Certificate Authority Center Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Digital Certificate Authority Center Co ltd filed Critical Shenzhen Digital Certificate Authority Center Co ltd
Priority to CN202210294109.5A priority Critical patent/CN114611471A/en
Publication of CN114611471A publication Critical patent/CN114611471A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files

Abstract

The application is applicable to the technical field of data processing, and provides a method and a device for reading an electronic document, electronic equipment and a storage medium, wherein the method comprises the following steps: if a reading instruction of the target document added with the electronic signature is received, acquiring document data of the target document; reading tail subdata of the document data, and acquiring an increment index table from the tail subdata; sequentially reading the incremental positioning data according to the reverse order of the adding order of the incremental positioning data in the incremental index table, and determining the byte sections of the electronic signatures in the target document in the document data; and deleting the signature data in the document data in sequence according to the signature data of the electronic signature corresponding to the byte sections so as to read the document content of the target document. By adopting the method, the situation that the document content is unreadable due to messy codes of the document can be avoided, and the readability of the electronic document is improved.

Description

Method and device for reading electronic document, electronic equipment and storage medium
Technical Field
The present application belongs to the field of data processing technologies, and in particular, to a method and an apparatus for reading an electronic document, an electronic device, and a storage medium.
Background
With the continuous progress of electronization, more and more documents can be stored in a datamation form, and in order to realize legalization, authenticity and non-repudiation of electronic documents, the technology of electronic signatures is also generated, so that the application range of the electronic documents is further improved.
In the existing electronic signature application technology, when an electronic document is electronically signed, a signature object needs to be inserted into a text, so that the positions of all objects in the electronic text are deviated, and the deviation can be continuously iterated in the process of multiple signatures, so that the text analysis abnormality is easily caused when the electronic text is analyzed, and the readability of the electronic text is reduced.
Disclosure of Invention
The embodiment of the application provides a method and a device for reading an electronic document, electronic equipment and a storage medium, which can solve the problems that in the existing technology for reading an electronic document, when the electronic document is subjected to electronic signature, a signature object needs to be inserted into a text, so that the positions of all objects in the electronic text are shifted, and in the process of multiple signature, the shift can be continuously iterated, so that the electronic text is easy to have abnormal text analysis during analysis, and the readability of the electronic text is reduced.
In a first aspect, an embodiment of the present application provides a method for reading an electronic document, including:
if a reading instruction of a target document added with an electronic signature is received, acquiring document data of the target document;
reading tail subdata of the document data, and acquiring an increment index table from the tail subdata; the increment index table comprises at least one increment positioning data;
sequentially reading the incremental positioning data according to the reverse order of the adding order of the incremental positioning data in the incremental index table, and determining the byte sections of the electronic signatures in the target document, which correspond to the document data;
and deleting the signature data in the document data in sequence according to the signature data of the electronic signature corresponding to each field so as to read the document content of the target document.
In a possible implementation manner of the first aspect, before the obtaining the document data of the target document to which the electronic signature is added if the reading instruction of the target document is received, the method further includes:
receiving a document read request regarding a target signature;
calculating the matching degree between the candidate document and the target signature according to the signature template of the target signature and the candidate document data of each candidate document in the database; the matching degree is specifically as follows:
Figure BDA0003562585830000021
wherein MatchLv is the matching degree between the candidate document and the signature template; targetdociThe ith data segment is framed in the candidate document for a division window determined based on the signature template; IDF (frame) is an inverse text coefficient of the signature template in the candidate document; the TargetDoc is data corresponding to the candidate document; the Framework is the data of the signature template; count is a character counting function; same is a similar character recognition function;
and if the matching degree between any candidate document and the signature template is greater than a preset matching threshold, identifying any candidate text as the target document, and generating the reading instruction of the target document.
In a possible implementation manner of the first aspect, after the calculating, according to the signature template of the target signature and candidate document data of each candidate document in the database, a matching degree between the candidate document and the target signature further includes:
if the matching degrees between all the candidate documents and the signature template are smaller than or equal to the matching threshold, outputting prompt information of search failure; the prompt message contains an adjustment area of the signature template;
receiving adjustment data input by a user in the adjustment area, and updating the signature template of the target signature based on the adjustment data;
and returning and executing the operation of calculating the matching degree between the candidate document and the target signature according to the signature template of the target signature and the candidate document data of each candidate document in the database respectively based on the updated signature template.
In a possible implementation manner of the first aspect, after deleting, in the document data, each piece of signature data in sequence according to the signature data of the electronic signature corresponding to each byte segment to read a document content of the target document, the method further includes:
if a signature instruction about a target document is received, acquiring the electronic signature corresponding to the signature instruction;
determining a signature area of the electronic signature in the target document, and adding the electronic signature to document subdata in the signature area to generate signature data;
determining a byte offset generated based on a signature operation according to the signature data added in the signature area;
generating the incremental positioning data based on the byte offset of the signature operation and the historical offset of the target document; the historical offset is determined from the delta index table;
and updating the increment index table with the increment positioning data.
In a possible implementation manner of the first aspect, the determining a signature area of the electronic signature in the target document, adding the electronic signature to document subdata in the signature area, and generating signature data includes:
determining a document data range needing to be signed based on the signature area and the document content of the target document;
calculating a hash value of the document data range according to the document data contained in the document data range and the signature type of the electronic signature;
and performing data conversion on the hash value and the document data through a signature algorithm associated with the signature type to obtain the signature data.
In a possible implementation manner of the first aspect, the deleting, in the document data, each signature data in sequence according to the signature data of the electronic signature corresponding to each field to read the document content of the target document includes:
performing semantic understanding on the document content to obtain a semantic recognition result of the document content;
if the semantic recognition result contains abnormal characters, marking abnormal areas where the abnormal characters appear; the abnormal characters are characters which are irrelevant to the semantics in the sentence where the abnormal characters are located;
determining abnormal positioning data from all the incremental positioning data according to the abnormal area;
adjusting the anomaly location data based on the anomaly characters.
In a possible implementation manner of the first aspect, the reading tail sub data of the document data and obtaining an incremental index table from the tail sub data includes:
if the document data contains the initial keyword of the tail subdata, taking the position of the initial keyword of the tail subdata as the initial position of the tail subdata;
determining the tail subdata based on the starting position;
and if the document data does not contain the initial key words, reading the document content based on the document data.
In a second aspect, an embodiment of the present application provides an apparatus for reading an electronic document, including:
a document data acquisition unit configured to acquire document data of a target document to which an electronic signature has been added, if a reading instruction of the target document is received;
an increment index table obtaining unit, configured to read tail sub data of the document data, and obtain an increment index table from the tail sub data; the increment index table comprises at least one increment positioning data;
the incremental positioning data reading unit is used for sequentially reading the incremental positioning data according to the reverse order of the adding order of the incremental positioning data in the incremental index table and determining the byte sections of the electronic signatures in the target document, which correspond to the document data;
and the document content reading unit is used for deleting the signature data in the document data in sequence according to the signature data of the electronic signature corresponding to the byte sections so as to read the document content of the target document.
In a third aspect, an embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the method according to any one of the first aspect is implemented.
In a fourth aspect, the present application provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the method according to any one of the above first aspects.
In a fifth aspect, embodiments of the present application provide a computer program product, which when run on a server, causes the server to perform the method of any one of the first aspect.
Compared with the prior art, the embodiment of the application has the advantages that: when a reading instruction for a target document is received, document data of the target document is acquired, an increment index table in which increment positioning data of each electronic signature is recorded is extracted from tail subdata of the document data, and signature data of each electronic signature added in the document data is deleted in sequence based on the reverse order of the adding order of each increment positioning data in the increment index table, so that the document content of the target document is restored, and reading disorder codes caused by adding the electronic signature are avoided. Compared with the existing reading technology of the electronic document, the incremental positioning data corresponding to the electronic signature added to the target document can be generated and added to the incremental index table according to the incremental positioning data, so that the signature data added to the electronic document can be deleted in sequence based on the incremental index table during reading, the condition that the document content is unreadable due to messy codes of the document is avoided, and the readability of the electronic document is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a flowchart illustrating an implementation of a method for reading an electronic document according to an embodiment of the present application;
FIG. 2 is a schematic view of reading document contents provided by an embodiment of the present application;
FIG. 3 is a flowchart illustrating an implementation of a method for reading an electronic document according to an embodiment of the present application;
FIG. 4 is a schematic diagram illustrating an implementation manner of a reading method for an electronic document according to an embodiment of the present application;
fig. 5 is a schematic diagram of an implementation manner of S104 of a method for reading an electronic document according to an embodiment of the present application;
FIG. 6 is a schematic diagram illustrating an implementation manner of S102 of a method for reading an electronic document according to an embodiment of the present application;
FIG. 7 is a schematic structural diagram of a reading apparatus for an electronic document provided in an embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.
The method for reading the electronic document provided by the embodiment of the application can be applied to electronic devices which can execute the operation of reading the electronic document, such as a smart phone, a server, a tablet computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a server and the like. The embodiment of the present application does not set any limit to the specific type of the electronic device.
Referring to fig. 1, fig. 1 is a schematic diagram illustrating an implementation of a method for reading an electronic document according to an embodiment of the present application, where the method includes the following steps:
in S101, if a reading instruction of a target document to which an electronic signature has been added is received, document data of the target document is acquired.
In this embodiment, when a user needs to read one of the electronic documents, the user may initiate a reading request to the electronic device, where the reading request carries the document identifier of the electronic document to be read (i.e., the target document). The electronic device may analyze the reading request, extract a document identifier of the target document, query a storage path corresponding to the document identifier from a local storage, and obtain document data corresponding to the target document based on the storage path.
In the present embodiment, the target document is specifically an electronic document to which an electronic signature has been added, that is, document data includes signature data. When the electronic device adds the electronic signature to the target document, data offset is caused to the document data of the internal part of the document, in order to realize subsequent reading operation, the data volume of the document data offset caused by the electronic signature is recorded, corresponding incremental positioning data is obtained, and the incremental positioning data is added into an incremental index table of the electronic document. The document data of the target document specifically includes two parts, namely, document main data for recording document content and signature data, and tail sub data for storing incremental positioning data and an incremental index table.
In a possible implementation manner, the target document may be stored in a cloud server, in which case, the reading instruction may carry a download link of the target document, and the electronic device may obtain document data of the target document from the cloud server according to the download link.
In S102, tail sub-data of the document data is read, and an incremental index table is obtained from the tail sub-data; the increment index table contains at least one increment positioning data.
In this embodiment, the electronic device may extract tail sub data in the document data, that is, data of a trailer part, where the data of the trailer part includes incremental positioning data and an incremental index table for indexing signature data.
In this embodiment, the incremental positioning data included in the incremental index table is the same as the electronic signatures included in the target document, and if the target document includes two or more electronic signatures, the incremental index table includes the incremental positioning data with the same number as the electronic signatures.
In a possible implementation manner, the tail sub-data is configured with a corresponding tail keyword, and the electronic device may locate the tail sub-data from the document data according to the tail keyword, and then may extract the corresponding incremental index table from the tail sub-data.
In a possible implementation manner, if some target document does not include the tail sub data, or the content of the tail sub data is empty, it indicates that the target document is not signed electronically, and in this case, the electronic device may directly restore the corresponding document content based on the document data of the target document.
In S103, according to the reverse order of the adding order of each incremental positioning data in the incremental index table, each incremental positioning data is sequentially read, and a byte segment corresponding to each electronic signature in the document data in the target document is determined.
In S104, according to the signature data of the electronic signature corresponding to each of the byte sections, each of the signature data is deleted in the document data in sequence to read the document content of the target document.
In this embodiment, the electronic device may obtain incremental positioning data corresponding to each electronic signature from the incremental index table, determine a byte position, i.e., the above-mentioned field, to which the electronic signature is added in the document data of the target document based on each incremental positioning data, delete signature data of the associated electronic signature from the data corresponding to the field, obtain document data from which the electronic signature is deleted once, determine a field of the next electronic signature in the document data from which the electronic signature is deleted once again according to the incremental positioning data of the next adding order, delete signature data corresponding to the field in the corresponding field again, and perform classification until all signature data of the electronic signature are deleted, identify the document data from which all electronic signatures are deleted as content data of the target document, display the document content of the target document according to the content data, therefore, the situation that messy codes appear when the document content is read due to the insertion of the signature data can be avoided, and the accuracy of reading the document content is improved.
In this embodiment, the electronic device may sequentially acquire each incremental positioning data according to the reverse order of the adding order, so as to determine the corresponding byte segment according to each incremental positioning data, delete the signature data of the electronic signature from the byte segment, and avoid data offset caused by the signature data to the original content data of the document content.
Illustratively, fig. 2 shows a reading schematic diagram of document contents provided by an embodiment of the present application. Referring to (a) in fig. 2, the increment index table of the target document contains two increment positioning data, namely data a and data B, wherein the adding order of the data a is earlier than that of the data B. Based on this, the electronic device determines the byte segment of the electronic signature B added to the document data according to the data B, and deletes the signature data of the electronic signature B from the byte segment, as shown in (B) in fig. 2, at which the byte segment of the electronic signature a is located, and the position of the byte segment is consistent with the corresponding document state when the electronic signature a is added, that is, the same as the byte segment recorded in the data a; then, the electronic device acquires data a from the incremental index table again, determines a byte section of the electronic signature a added to the document data from the data a, and deletes the signature data of the electronic signature a from the byte section, thereby obtaining content data from which all electronic signatures are deleted, as shown in (c) in fig. 2, and the content data displays the document content of the electronic document.
As can be seen from the above, in the method for reading an electronic document according to the embodiment of the present application, when a reading instruction for a target document is received, document data of the target document is obtained, an incremental index table in which incremental positioning data of each electronic signature is recorded is extracted from tail sub data of the document data, and signature data of each electronic signature added in the document data is sequentially deleted based on an inverse order of an adding order of each incremental positioning data in the incremental index table, so that document content of the target document is restored, and reading messy codes caused by adding an electronic signature are avoided. Compared with the existing reading technology of the electronic document, the incremental positioning data corresponding to the electronic signature added to the target document can be generated and added to the incremental index table according to the incremental positioning data, so that the signature data added to the electronic document can be deleted in sequence based on the incremental index table during reading, the condition that the document content is unreadable due to messy codes of the document is avoided, and the readability of the electronic document is improved.
Fig. 3 is a flowchart illustrating a specific implementation of a method for reading an electronic document according to a second embodiment of the present invention. Referring to fig. 3, with respect to the embodiment shown in fig. 1, in the reading method of an electronic document provided by the present embodiment, S101 includes: s1011 to S1015 are specifically described as follows:
further, before the obtaining the document data of the target document to which the electronic signature is added if the reading instruction of the target document to which the electronic signature is added is received, the method further includes:
in S301, a document read request regarding a target signature is received.
In the embodiment, the electronic device can receive a reading instruction initiated by a user for a specific document, and can also extract the electronic document added with a specified electronic signature from the database so as to achieve the purpose of batch reading of related documents. Based on this, the user can input the signature identifier of the electronic signature (i.e., the target signature) added to the electronic document to be read in batch, and generate a corresponding document reading request based on the signature identifier.
In one possible implementation, the electronic device may generate a search page, where the search page is configured with an input control and a corresponding search type, where the search type includes: a document type and a signature type. If the user selects the document type, identifying a document identifier which is input in the input control by the user and is a target document; and if the user selects the signature type, identifying the signature type which is input in the input control by the user and is the target signature. The electronic device can determine the identified type entered in the input control based on the type selected by the user in the search types. And when detecting that the user selects the signature type from the search types, generating a corresponding document reading request based on the signature identification input by the user in the input control, and executing the operation of S301.
In S302, calculating a matching degree between the candidate document and the target signature according to the signature template of the target signature and the candidate document data of each candidate document in the database, respectively; the matching degree is specifically as follows:
Figure BDA0003562585830000071
wherein MatchLv is the matching degree between the candidate document and the signature template; targetdociFraming an ith data segment in the candidate document for a divided window determined based on the signature template; IDF (frame) is an inverse text coefficient of the signature template in the candidate document; the TargetDoc is data corresponding to the candidate document; the Framework is data of the signature template; count is a character counting function; same is a similar character recognition function.
In this embodiment, each electronic signature is configured with a corresponding signature template, which is generated based on the signature data of the electronic signature. If an electronic signature is added in a certain electronic document, the document data of the electronic document contains signature data corresponding to the electronic signature, so that a certain data segment in the electronic document is matched with the signature template of the electronic signature. Based on the above, the electronic device may generate a corresponding sliding window according to the signature template of the electronic signature, perform sliding framing in each candidate document based on the sliding window, match the data segment of the sliding span with the signature template of the electronic signature, count the number of the same characters, and identify whether the candidate document includes the electronic signature according to the matching degree based on the character string with the largest number of the same characters as the matching degree between the two.
In S303, if the matching degree between any candidate document and the signature template is greater than a preset matching threshold, identifying any candidate text as the target document, and generating the reading instruction of the target document.
In this embodiment, if the matching degree between any candidate document in the database and the electronic signature is greater than the matching threshold, the candidate document is identified as being added with the electronic signature, in this case, the candidate document may be identified as the target document, and the data reading operation may be performed on the target document, that is, the reading instruction of the target document may be generated.
In the embodiment of the application, by designating the target signature to be searched, identifying all target documents matched with the signature template of the target signature from the database, and generating the reading instruction of all the target documents, the purpose of reading the documents in batches is achieved, the diversity of file searching is improved, all the documents added with the designated signature can be read, and the document reading efficiency is further improved.
In S304, if the matching degrees between all the candidate documents and the signature template are less than or equal to the matching threshold, outputting a prompt message indicating that the search fails; the prompt message contains the adjustment area of the signature template.
In this embodiment, if the matching degrees between all candidate documents in the database and the signature templates of the target signature are less than or equal to the matching threshold, there are two possible situations, where there is no electronic document added with the corresponding electronic signature in the database; in the second case, the signature template corresponding to the target signature may have an abnormality, and is not updated to the latest version, which may also result in being unable to be searched. In any case, the corresponding electronic document cannot be searched, and at this time, the electronic device may output a prompt message indicating that the corresponding search failed to notify the user of the corresponding search result. In order to correct the second condition, the prompt message may include an adjustment area for adjusting the signature template.
In S305, adjustment data input by a user in the adjustment region is received, and the signature template of the target signature is updated based on the adjustment data.
In this embodiment, if the signature template of the target signature is abnormal, the user may input the adjustment data corresponding to the signature template in the adjustment area in the prompt message, and the electronic device adjusts the signature template according to the adjustment data input in the adjustment area, so as to update the signature template.
For example, the user may input a version number regarding the target signature in the signature template, and the electronic device may download the corresponding signature template from the target signature correspondence server according to the version number to replace the associated signature template.
In S306, based on the updated signature template, the operation of calculating the matching degree between the candidate document and the target signature is performed in return for the signature template according to the target signature and the candidate document data of each candidate document in the database.
In this embodiment, after the signature template is updated, the matching degree with each candidate document may be calculated again based on the updated signature template, so as to select the target document.
In the embodiment of the application, the corresponding adjusting area is configured in the output prompt message of search failure, so that a user can conveniently and quickly input corresponding adjusting data, the signature template can be adjusted conveniently, the condition that the signature template is overdue or invalid can be quickly repaired, unnecessary operations of the user are reduced, and the adjusting efficiency is improved.
Fig. 4 is a flowchart illustrating a specific implementation of a method for reading an electronic document according to a third embodiment of the present invention. Referring to fig. 4, with respect to the embodiment shown in fig. 1, the present embodiment provides a method for reading an electronic document, after the signature data of the electronic signature corresponding to each byte section is deleted in the document data in sequence to read the document content of the target document, further including: s401 to S405 are described in detail as follows:
further, after the sequentially deleting the signature data in the document data according to the electronic signature corresponding to each byte segment to read the document content of the target document, the method further includes:
in S401, if a signature instruction about a target document is received, the electronic signature corresponding to the signature instruction is acquired.
In this embodiment, when a user needs to sign an electronic document, a signature instruction may be generated, where the signature instruction specifies an identifier of a signature stamp that needs to be electronically signed, and the electronic device may obtain a signature template of an electronic signature and signature data corresponding to the identifier based on the identifier.
In S402, a signature area of the electronic signature in the target document is determined, and the electronic signature is added to the document sub-data in the signature area to generate signature data.
In this embodiment, the signature instruction may include a signature area of a desired signature, and the electronic device may add signature data of an electronic signature to the document sub-data corresponding to the signature data according to the signature area, so as to generate corresponding signature data.
Further, as another embodiment of the present application, the step S402 may specifically include the following three steps:
in S402.1, a document data range of a required signature is determined based on the signature region and the document content of the target document.
In S402.2, a hash value of the document data range is calculated according to the document data included in the document data range and the signature type of the electronic signature.
In S402.3, data conversion is performed on the hash value and the document data through a signature algorithm associated with the signature type, so as to obtain the signature data.
In this embodiment, since the electronic document may be signed in a specific range in the designated document, that is, the whole document range of the electronic document may not be signed, in this case, the electronic device may determine the text range designated by the user by analyzing the signature request, and extract the corresponding document content based on the text range designated by the user, so as to obtain the document data range that needs to be electronically signed.
In this embodiment, after obtaining the text data, the electronic device may extract the text data in the document data range, and import the text data into the hash conversion algorithm through the hash conversion algorithm corresponding to the signature type, so as to obtain the corresponding signature data. The hash conversion algorithm can be specifically generated for an electronic seal based on a signature type, and text data is converted through the electronic seal, so that the text data in a text data range can be added to the corresponding electronic seal.
In the embodiment of the application, the text data range is automatically determined, and the electronic signature is automatically added to the local data, so that the operation required by a user can be reduced, and the operation of the electronic signature is more flexible.
In S403, a byte offset generated based on a signature operation is determined according to the signature data added by the signature region.
In S404, generating the incremental positioning data based on the byte offset of the signature operation and the historical offset of the target document; the historical offset is determined from the delta index table.
In S405, the incremental positioning data is updated to the incremental index table.
In this embodiment, since the signature template of the electronic signature is added to the original document data, that is, the corresponding signature data is generated, in this case, a certain data offset may occur in the document data, if other electronic signatures have been added to the target document, the history offset of the electronic document may be determined according to the latest incremental positioning data added, the incremental positioning data corresponding to the signature may be generated based on the byte offset and the history offset corresponding to this time, and the corresponding incremental positioning data may be added to the incremental index table, so that the document content of the target document may be restored based on the incremental index table during subsequent reading.
In the embodiment of the application, the incremental positioning data corresponding to the electronic signature can be determined when the electronic signature is added, and the incremental index table is updated based on the incremental positioning data, so that the subsequent document reading operation can be conveniently completed, and the readability of the electronic document after the electronic signature is added is improved.
Fig. 5 is a flowchart illustrating a specific implementation of the method S104 for reading an electronic document according to a fourth embodiment of the present invention. Referring to fig. 5, with respect to any one of the embodiments shown in fig. 1 to 4, the method S104 for reading an electronic document provided by this embodiment includes: s1041 to S1044, detailed description is as follows:
further, deleting each signature data in the document data in sequence according to the signature data of the electronic signature corresponding to each field so as to read the document content of the target document, including:
in S1041, performing semantic understanding on the document content to obtain a semantic recognition result of the document content.
In S1042, if the semantic recognition result includes an abnormal character, marking an abnormal region where the abnormal character appears; the exception character is a semantically irrelevant character in the sentence in which the exception character is located.
In S1043, according to the abnormal area, the abnormal location data is determined from all the incremental location data.
In S1044, the anomaly location data is adjusted based on the anomaly characters.
In this embodiment, after reading the document content of the electronic document, the electronic device may determine whether there is a disorder code in the electronic document again, so that a semantic recognition result corresponding to the document content may be determined through a semantic understanding algorithm, the semantic understanding algorithm may recognize the semantics of each sentence in the electronic document, and determine whether the sentence semantics are smooth, thereby determining whether there is an abnormal character irrelevant to the semantics in each sentence, if there is an abnormal character, the electronic device may mark the corresponding character in the document content through a preset abnormal mark, at this time, if the electronic device detects that the semantic recognition result includes the abnormal character, it indicates that there is a disorder code in the electronic document, it is necessary to repair the disorder code, locate the sentence where the abnormal character appears, that is, the above abnormal region, and determine whether there is a positioning data relevant to the abnormal region in increments corresponding to all electronic signatures, if the electronic signature exists, identifying the incremental positioning data corresponding to the electronic signature as abnormal positioning data, and adjusting the abnormal positioning data to repair the situation that the messy code characters appear in the statement.
In the embodiment of the application, the sentence is subjected to semantic understanding to judge whether the sentence contains abnormal characters, and abnormal positioning data associated with the abnormal characters is repaired, so that the purpose of automatically repairing the abnormal condition is achieved.
Fig. 6 shows a flowchart of a specific implementation of the method S102 for reading an electronic document according to the fifth embodiment of the present invention. Referring to fig. 6, with respect to any one of the embodiments in fig. 1 to 4, in the reading method of an electronic document provided by this embodiment, S102 includes: s1021 to S1023 are described in detail as follows:
further, the reading tail sub-data of the document data and obtaining an increment index table from the tail sub-data includes:
in S501, if the document data includes the start keyword of the tail sub data, the position of the start keyword of the tail sub data is used as the start position of the tail sub data.
In S502, the tail sub data is determined based on the start position.
In S503, if the document data does not include the start keyword, the document content is read based on the document data.
In this embodiment, the electronic device may locate the tail sub data from the document data by using the start keyword of the tail sub data, and if the tail sub data of the target document is not empty, it indicates that an electronic signature is added to the target document.
Fig. 7 is a block diagram illustrating an apparatus for reading an electronic document according to an embodiment of the present invention, where the server includes units for executing the steps implemented by the intermediate server in the corresponding embodiment of fig. 1. Please refer to fig. 1 and fig. 1 for the corresponding description of the embodiment. For convenience of explanation, only the portions related to the present embodiment are shown.
Referring to fig. 7, the reading apparatus of the electronic document includes:
a document data acquisition unit 71 for acquiring document data of a target document to which an electronic signature has been added, if a reading instruction of the target document is received;
an increment index table obtaining unit 72, configured to read tail sub data of the document data, and obtain an increment index table from the tail sub data; the increment index table comprises at least one increment positioning data;
an incremental positioning data reading unit 73, configured to sequentially read each incremental positioning data according to a reverse order of an adding order of each incremental positioning data in the incremental index table, and determine a byte segment corresponding to each electronic signature in the document data in the target document;
a document content reading unit 74 configured to delete each signature data in the document data in sequence according to the signature data of the electronic signature corresponding to each field, so as to read the document content of the target document.
Optionally, the reading apparatus of the electronic document further includes:
a document read request unit for receiving a document read request regarding a target signature;
a matching degree calculation unit, configured to calculate a matching degree between the candidate document and the target signature according to the signature template of the target signature and the candidate document data of each candidate document in the database, respectively; the matching degree is specifically as follows:
Figure BDA0003562585830000111
wherein MatchLv is the matching degree between the candidate document and the signature template; targetdociThe ith data segment is framed in the candidate document for a division window determined based on the signature template; IDF (frame) is an inverse text coefficient of the signature template in the candidate document; the TargetDoc is data corresponding to the candidate document; the Framework is the data of the signature template; count is a character counting function; same is a similar character recognition function;
and the target document selecting unit is used for identifying any candidate text as the target document and generating the reading instruction of the target document if the matching degree between any candidate document and the signature template is greater than a preset matching threshold value.
Optionally, the reading apparatus of the electronic document further includes:
a prompt information display unit, configured to output prompt information of search failure if the matching degrees between all the candidate documents and the signature template are less than or equal to the matching threshold; the prompt message contains an adjustment area of the signature template;
an adjustment data receiving unit, configured to receive adjustment data input by a user in the adjustment area, and update the signature template of the target signature based on the adjustment data;
and the adjustment searching unit is used for returning and executing the operation of calculating the matching degree between the candidate document and the target signature according to the signature template of the target signature and the candidate document data of each candidate document in the database respectively based on the updated signature template.
Optionally, the reading apparatus of the electronic document further:
the electronic signature acquisition unit is used for acquiring the electronic signature corresponding to a signature instruction if the signature instruction about a target document is received;
the signature data acquisition unit is used for determining a signature area of the electronic signature in the target document, adding the electronic signature to document subdata in the signature area and generating signature data;
a byte offset determination unit configured to determine a byte offset generated based on a signature operation from the signature data added to the signature region;
an incremental positioning data generating unit, configured to generate the incremental positioning data based on the byte offset of the signature operation and the historical offset of the target document; the historical offset is determined from the delta index table;
and the incremental positioning data adding unit is used for updating the incremental positioning data to the incremental index table.
Optionally, the signature data obtaining unit includes:
a document data range determining unit, configured to determine a document data range to be signed based on the signature region and the document content of the target document;
a hash value determination unit, configured to calculate a hash value of the document data range according to the document data included in the document data range and the signature type of the electronic signature;
and the hash value conversion unit is used for performing data conversion on the hash value and the document data through a signature algorithm associated with the signature type to obtain the signature data.
Optionally, the document content reading unit includes:
the semantic recognition unit is used for carrying out semantic understanding on the document content to obtain a semantic recognition result of the document content;
the abnormal character recognition unit is used for marking an abnormal area where the abnormal character appears if the semantic recognition result contains the abnormal character; the abnormal characters are characters which are irrelevant to the semantics in the sentence where the abnormal characters are located;
the abnormal area identification unit is used for determining abnormal positioning data from all the incremental positioning data according to the abnormal characters;
and the abnormal character adjusting unit is used for adjusting the abnormal positioning data based on the abnormal characters.
Optionally, the increment index table obtaining unit includes:
a start keyword positioning unit, configured to, if the document data includes a start keyword of the tail sub data, use a position where the start keyword of the tail sub data is located as a start position of the tail sub data;
the tail subdata positioning unit is used for determining the tail subdata based on the initial position;
and the document content reading unit is used for reading the document content based on the document data if the document data does not contain the initial key word.
Therefore, the method and apparatus for reading an electronic document according to the embodiment of the present invention can also obtain the document data of the target document when receiving the reading instruction for the target document, extract the incremental index table in which the incremental positioning data of each electronic signature is recorded from the tail sub-data of the document data, and sequentially delete the signature data of each electronic signature added to the document data based on the reverse order of the adding order of each incremental positioning data in the incremental index table, thereby restoring the document content of the target document and avoiding the read messy codes caused by adding the electronic signature. Compared with the existing reading technology of the electronic document, the incremental positioning data corresponding to the electronic signature added to the target document can be generated and added to the incremental index table according to the incremental positioning data, so that the signature data added to the electronic document can be deleted in sequence based on the incremental index table during reading, the condition that the document content is unreadable due to messy codes of the document is avoided, and the readability of the electronic document is improved.
It should be understood that, in the structural block diagram of the device for reading an electronic document shown in fig. 7, each module is used to execute each step in the embodiment corresponding to fig. 1 to 6, and each step in the embodiment corresponding to fig. 1 to 6 has been explained in detail in the above embodiment, and specific reference is made to the relevant description in the embodiment corresponding to fig. 1 to 6 and fig. 1 to 6, which is not repeated herein.
Fig. 8 is a block diagram of an electronic device according to another embodiment of the present application. As shown in fig. 8, the electronic apparatus 800 of this embodiment includes: a processor 88, a memory 820 and a computer program 830, such as a program for a reading method of an electronic document, stored in the memory 820 and executable on the processor 88. The processor 88 implements the steps in the respective embodiments of the reading method of the electronic documents described above, such as S101 to S104 shown in fig. 1, when executing the computer program 830. Alternatively, the processor 88, when executing the computer program 830, implements the functions of the modules in the embodiment corresponding to fig. 8, for example, the functions of the units 71 to 74 shown in fig. 7, and refer to the related description in the embodiment corresponding to fig. 7 specifically.
Illustratively, the computer program 830 may be partitioned into one or more modules, which are stored in the memory 820 and executed by the processor 88 to accomplish the present application. One or more of the modules may be a series of computer program instruction segments capable of performing certain functions that are used to describe the execution of the computer program 830 in the electronic device 800. For example, the computer program 830 may be divided into unit modules, each of which functions as described above.
The electronic device 800 may include, but is not limited to, the processor 88, the memory 820. Those skilled in the art will appreciate that fig. 8 is merely an example of an electronic device 800 and does not constitute a limitation of electronic device 800, and may include more or fewer components than shown, or some components in combination, or different components, e.g., an electronic device may also include input-output devices, network access devices, buses, etc.
The processor 88 may be a central processing unit, but may also be other general purpose processors, digital signal processors, application specific integrated circuits, off-the-shelf programmable gate arrays or other programmable logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor or any conventional processor or the like.
The storage 820 may be an internal storage unit of the electronic device 800, such as a hard disk or a memory of the electronic device 800. The memory 820 may also be an external storage device of the electronic device 800, such as a plug-in hard disk, a smart card, a flash memory card, etc. provided on the electronic device 800. Further, the memory 820 may also include both internal storage units and external storage devices of the electronic device 800.
The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (10)

1. A method of reading an electronic document, comprising:
if a reading instruction of a target document added with an electronic signature is received, acquiring document data of the target document;
reading tail subdata of the document data, and acquiring an increment index table from the tail subdata; the increment index table comprises at least one increment positioning data;
sequentially reading the incremental positioning data according to the reverse order of the adding order of the incremental positioning data in the incremental index table, and determining the byte sections of the electronic signatures in the target document, which correspond to the document data;
and deleting the signature data in the document data in sequence according to the signature data of the electronic signature corresponding to each field so as to read the document content of the target document.
2. The reading method according to claim 1, before the acquiring the document data of the target document to which the electronic signature has been added if the reading instruction of the target document is received, further comprising:
receiving a document read request regarding a target signature;
calculating the matching degree between the candidate document and the target signature according to the signature template of the target signature and the candidate document data of each candidate document in the database; the matching degree is specifically as follows:
Figure FDA0003562585820000011
wherein MatchLv is the matching degree between the candidate document and the signature template; targetdociThe ith data segment is framed in the candidate document for a division window determined based on the signature template; IDF (frame) is an inverse text coefficient of the signature template in the candidate document; the TargetDoc is data corresponding to the candidate document; the Framework is the signature templateThe data of (a); count is a character counting function; same is a similar character recognition function;
and if the matching degree between any candidate document and the signature template is greater than a preset matching threshold, identifying any candidate text as the target document, and generating the reading instruction of the target document.
3. The reading method according to claim 2, further comprising, after the calculating the degree of matching between the candidate document and the target signature based on the signature template of the target signature and the candidate document data of each candidate document in the database, respectively:
if the matching degrees between all the candidate documents and the signature template are smaller than or equal to the matching threshold, outputting prompt information of search failure; the prompt message contains an adjustment area of the signature template;
receiving adjustment data input by a user in the adjustment area, and updating the signature template of the target signature based on the adjustment data;
and returning and executing the operation of calculating the matching degree between the candidate document and the target signature according to the signature template of the target signature and the candidate document data of each candidate document in the database respectively based on the updated signature template.
4. The reading method according to claim 1, further comprising, after the signature data according to the electronic signature corresponding to each of the byte sections sequentially deletes each of the signature data in the document data to read the document content of the target document:
if a signature instruction about a target document is received, acquiring the electronic signature corresponding to the signature instruction;
determining a signature area of the electronic signature in the target document, and adding the electronic signature to document subdata in the signature area to generate signature data;
determining a byte offset generated based on a signature operation according to the signature data added by the signature area;
generating the incremental positioning data based on the byte offset of the signature operation and the historical offset of the target document; the historical offset is determined according to the increment index table;
and updating the increment index table with the increment positioning data.
5. The reading method according to claim 4, wherein the determining a signature area of the electronic signature in the target document, adding the electronic signature to document sub-data in the signature area, and generating signature data includes:
determining a document data range needing to be signed based on the signature area and the document content of the target document;
calculating a hash value of the document data range according to the document data contained in the document data range and the signature type of the electronic signature;
and performing data conversion on the hash value and the document data through a signature algorithm associated with the signature type to obtain the signature data.
6. The reading method according to any one of claims 1 to 5, wherein said deleting, in the document data, each of the signature data in turn in accordance with the signature data of the electronic signature corresponding to each of the field sections to read the document content of the target document includes:
performing semantic understanding on the document content to obtain a semantic recognition result of the document content;
if the semantic recognition result contains abnormal characters, marking abnormal areas where the abnormal characters appear; the abnormal characters are characters which are irrelevant to the semantics in the sentence where the abnormal characters are located;
determining abnormal positioning data from all the incremental positioning data according to the abnormal area;
adjusting the anomaly location data based on the anomaly characters.
7. The reading method according to any one of claims 1 to 5, wherein the reading tail sub data of the document data and obtaining an incremental index table from the tail sub data includes:
if the document data contains the initial keyword of the tail subdata, taking the position of the initial keyword of the tail subdata as the initial position of the tail subdata;
determining the tail sub data based on the starting position;
and if the document data does not contain the initial key words, reading the document content based on the document data.
8. An apparatus for reading an electronic document, comprising:
a document data acquisition unit configured to acquire document data of a target document to which an electronic signature has been added, if a reading instruction of the target document is received;
an increment index table obtaining unit, configured to read tail sub data of the document data, and obtain an increment index table from the tail sub data; the increment index table comprises at least one increment positioning data;
the incremental positioning data reading unit is used for sequentially reading the incremental positioning data according to the reverse order of the adding order of the incremental positioning data in the incremental index table and determining the byte sections of the electronic signatures in the target document, which correspond to the document data;
and the document content reading unit is used for deleting the signature data in the document data in sequence according to the signature data of the electronic signature corresponding to the byte sections so as to read the document content of the target document.
9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the method of any of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.
CN202210294109.5A 2022-03-24 2022-03-24 Electronic document reading method and device, electronic equipment and storage medium Pending CN114611471A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210294109.5A CN114611471A (en) 2022-03-24 2022-03-24 Electronic document reading method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210294109.5A CN114611471A (en) 2022-03-24 2022-03-24 Electronic document reading method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114611471A true CN114611471A (en) 2022-06-10

Family

ID=81864672

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210294109.5A Pending CN114611471A (en) 2022-03-24 2022-03-24 Electronic document reading method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114611471A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115481445A (en) * 2022-08-16 2022-12-16 北京矩阵分解科技有限公司 Portable document format file signature checking method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115481445A (en) * 2022-08-16 2022-12-16 北京矩阵分解科技有限公司 Portable document format file signature checking method, device, equipment and storage medium
CN115481445B (en) * 2022-08-16 2023-08-18 北京矩阵分解科技有限公司 Signature verification method, device and equipment for portable document format file and storage medium

Similar Documents

Publication Publication Date Title
CN108874928B (en) Resume data information analysis processing method, device, equipment and storage medium
CN111581976B (en) Medical term standardization method, device, computer equipment and storage medium
EP2657884B1 (en) Identifying multimedia objects based on multimedia fingerprint
US10929125B2 (en) Determining provenance of files in source code projects
CN102713834A (en) Managing record format information
CN112836484B (en) Text alignment method and device, electronic equipment and computer readable storage medium
CN110019640B (en) Secret-related file checking method and device
CN113297238B (en) Method and device for mining information based on history change record
US8655075B2 (en) Optical character recognition verification and correction system
US10331717B2 (en) Method and apparatus for determining similar document set to target document from a plurality of documents
US20210264556A1 (en) Automatically attaching optical character recognition data to images
CN111832264B (en) Signature position determining method, device and equipment based on PDF (portable document format) file
CN114611471A (en) Electronic document reading method and device, electronic equipment and storage medium
CN111666087A (en) Operation rule updating method and device, computer system and readable storage medium
CN117216239A (en) Text deduplication method, text deduplication device, computer equipment and storage medium
CN113177407A (en) Data dictionary construction method and device, computer equipment and storage medium
US11182375B2 (en) Metadata validation tool
CN110196952B (en) Program code search processing method, device, equipment and storage medium
CN116860747A (en) Training sample generation method and device, electronic equipment and storage medium
CN114416847A (en) Data conversion method, device, server and storage medium
CN110909112B (en) Data extraction method, device, terminal equipment and medium
CN114528824A (en) Text error correction method and device, electronic equipment and storage medium
Milon Islam et al. A novel approach towards tamper detection of digital holy quran generation
US9251253B2 (en) Expeditious citation indexing
CN111967240B (en) Text parsing method, text parsing device, terminal equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination