CN114821623A - Document processing method and device, electronic equipment and storage medium - Google Patents

Document processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114821623A
CN114821623A CN202210502303.8A CN202210502303A CN114821623A CN 114821623 A CN114821623 A CN 114821623A CN 202210502303 A CN202210502303 A CN 202210502303A CN 114821623 A CN114821623 A CN 114821623A
Authority
CN
China
Prior art keywords
character
target document
document image
information
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210502303.8A
Other languages
Chinese (zh)
Inventor
程龙
梁鼎
侯朝晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Sensetime Technology Co Ltd
Original Assignee
Shenzhen Sensetime Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Sensetime Technology Co Ltd filed Critical Shenzhen Sensetime Technology Co Ltd
Priority to CN202210502303.8A priority Critical patent/CN114821623A/en
Publication of CN114821623A publication Critical patent/CN114821623A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Abstract

The disclosure provides a document processing method, a document processing device, an electronic device and a storage medium, wherein the method comprises the following steps: acquiring a target document image to be processed; extracting character features based on the target document image to obtain first character feature information of each character in the target document image; matching the first character characteristic information with second character characteristic information of each character extracted from a preset template image to obtain a characteristic matching degree; and determining a word processing result aiming at the target document image based on the characteristic matching degree. The method realizes the identification of the related target document image by utilizing the character characteristic matching mode, can flexibly identify the lengths of various fields by referring to the preset template image under the condition of high enough characteristic matching degree, and has high identification accuracy.

Description

Document processing method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of image processing technologies, and in particular, to a document processing method and apparatus, an electronic device, and a storage medium.
Background
With the development of information technology, a large amount of entity data (e.g., paper documents, image documents) is converted into electronic data, and then the electronic data is extracted and analyzed to obtain structured information therein.
Structured information extraction refers to extracting specific key information of interest to a user from an entity, and an Optical Character Recognition (OCR) technology is used as a basis for data structuring, and can convert an image document into a text format through a pre-stored Recognition template, and is widely applied to the data conversion process.
In general, different identification templates need to be designed for documents with different layouts, however, in practical applications, since the layouts of the documents are complicated and changeable, the structure of the rule template is not easy, which may cause problems such as identification errors, and further cause a low accuracy of data structuring.
Disclosure of Invention
The embodiment of the disclosure at least provides a document processing method and device, electronic equipment and a storage medium.
In a first aspect, an embodiment of the present disclosure provides a document processing method, including:
acquiring a target document image to be processed;
extracting character features based on the target document image to obtain first character feature information of each character in the target document image;
matching the first character feature information with second character feature information of each character extracted from a preset template image to obtain a feature matching degree;
and determining a word processing result aiming at the target document image based on the characteristic matching degree.
By adopting the document processing method, under the condition that the target document image to be processed is obtained, character feature extraction can be carried out, then the extracted first character feature information of each character is matched with the second character feature information of each character extracted from the preset template image, and the character processing result aiming at the target document image is determined based on the obtained feature matching degree. The method realizes the identification of the related target document image by utilizing the character characteristic matching mode, can flexibly identify the lengths of various fields by referring to the preset template image under the condition of high enough characteristic matching degree, and has high identification accuracy.
In a possible implementation manner, the extracting character features based on the target document image to obtain first character feature information of each character in the target document image includes:
performing character detection on the target document image to obtain content information of each character in the target document image and coordinate information of each character in the target document image;
and performing character feature extraction on each character based on the content information of each character and the coordinate information of each character in the target document image to obtain first character feature information aiming at each character.
The character extraction method has the advantages that the character extraction method can realize more integrated feature extraction aiming at characters by combining the content information and the coordinate information of the characters, the extracted features not only comprise the characters, but also combine the space coordinate relation of the characters in the whole image, and the extracted features are more accurate, so that the subsequent character recognition is facilitated.
In a possible implementation manner, the performing text feature extraction on the respective text based on the content information of the respective text and the coordinate information of each text in the target document image includes:
aiming at a first character in each character, selecting other characters related to the first character from each character based on the coordinate information of each character in the target document image;
and determining first character feature information of the first character based on the selected content information of the other characters, the selected content information of the first character and the incidence relation between the first character and the other characters.
Here, in combination with the association relationship between the characters, it can be determined that the fused first character feature more conforms to the character characteristic of itself, which is helpful for performing character recognition with high accuracy.
In a possible implementation manner, the performing text feature extraction on the respective text based on the content information of the respective text and the coordinate information of each text in the target document image includes:
acquiring a trained character feature extraction network; the character feature extraction network learns the association relation among all characters in the document image sample in advance;
and aiming at each character included in each character, performing character feature extraction on each character based on the association relation learned in advance by the character feature extraction network, the content information of each character and the coordinate information of each character in the target document image to obtain first character feature information aiming at each character output by the character feature extraction network.
The character feature extraction is carried out based on the character feature extraction network, features can be extracted more quickly, deeper character features are mined, and the accuracy of the extracted features is improved remarkably.
In a possible implementation manner, the text detection on the target document image includes:
converting the target document image into a document text by using an Optical Character Recognition (OCR) mode;
and performing character division on the document text based on a character feature template library to obtain content information of each character included in the document text.
In a possible implementation manner, the matching the first character feature information with the second character feature information of each character extracted from a preset template image to obtain a feature matching degree includes:
aiming at a first character in the target document image, matching first character feature information of the first character with each second character feature information extracted from a preset template image to obtain a feature matching degree between the first character and each second character in the preset template image;
the determining a word processing result for the target document image based on the feature matching degree comprises:
selecting second characters with characteristic matching degrees meeting preset requirements from all second characters in the preset template image aiming at the first characters in the target document image;
determining a word processing result of the first word based on the selected preset label of the second word;
and determining a word processing result aiming at the target document image based on the word processing result of the first word.
The feature matching degree between each character in the target document image and each character in the preset template image can be determined, the higher the feature matching degree is, the higher the possibility of indicating that the characters are in the same identification dimension to a certain extent is, otherwise, the lower the feature matching degree is, the lower the possibility of indicating that the characters are in the same identification dimension to a certain extent is, and then the preset label tag of the relevant characters in the preset template image is combined to realize the character processing of the relevant target document image, so that the character processing realized by taking the preset label tag as the guide is realized, and the processing result is more accurate.
In a possible implementation manner, at least one interested field is marked in the preset template image in advance; the determining a word processing result for the target document image based on the word processing result of the first word comprises:
for each interested field in the preset template image, determining a word processing result of a first word corresponding to the interested field based on a preset labeling label indicated by the interested field;
and determining the word processing result determined by the at least one region of interest as the word processing result aiming at the target document image.
In a possible implementation manner, the acquiring a target document image to be processed includes:
responding to an image acquisition instruction of a user side, and acquiring a target document image to be processed;
after the determining a word processing result for the target document image based on the feature matching degree, the method further comprises:
returning the word processing result to the user side; the user side is used for displaying a word processing comparison result between the target document image and the preset template image.
In a second aspect, an embodiment of the present disclosure further provides a document processing apparatus, including:
the acquisition module is used for acquiring a target document image to be processed;
the extraction module is used for extracting character features based on the target document image to obtain first character feature information of each character in the target document image;
the matching module is used for matching the first character characteristic information with second character characteristic information of each character extracted from a preset template image to obtain a characteristic matching degree;
and the processing module is used for determining a word processing result aiming at the target document image based on the characteristic matching degree.
In a third aspect, an embodiment of the present disclosure further provides an electronic device including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the steps of the document processing method according to the first aspect and any of its various embodiments.
In a fourth aspect, the disclosed embodiments also provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, performs the steps of the document processing method according to the first aspect and any of its various implementation manners.
For the description of the effects of the document processing apparatus, the electronic device, and the computer-readable storage medium, reference is made to the description of the document processing method, which is not repeated here.
In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.
FIG. 1 illustrates a flow chart of a document processing method provided by an embodiment of the disclosure;
FIG. 2 shows a schematic diagram of a document processing apparatus provided by an embodiment of the disclosure;
fig. 3 shows a schematic diagram of an electronic device provided by an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of the embodiments of the present disclosure, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure, presented in the figures, is not intended to limit the scope of the claimed disclosure, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The term "and/or" herein merely describes an associative relationship, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.
Research shows that different identification templates are required to be designed for documents with different layouts in general. In practice, however, since the document layouts are complicated, varied and different, and the recognition template structure is difficult, structured information extraction based on single sample learning is extremely needed in practical application. The extraction of the structured information based on the single sample learning refers to extracting the specific field information corresponding to the template from the document only by means of a single or a very small amount of specific templates, and is also called template matching. The template is usually represented as a bounding box of several specific fields, so that the user box is only required to select the field of interest to generate the corresponding template.
However, since the template picture and the real picture are not completely aligned, the text does not correspond to one another. Compared with the template, many test pictures have obvious overall offset or partial random offset, so that the direct mapping mode is difficult to achieve a good effect.
There are some research advances related to template matching, such as (1) template matching based on traditional rule methods. The conventional rule is to divide the template into a fixed field and a field to be extracted. The fixed field refers to a field of the template that is located close to and has the same content as the test pattern. And then matching the template with the fixed field of the test picture so as to construct a mapping relation matrix between the template and the test picture. And directly mapping the coordinates of the field to be extracted in the template onto the test picture according to the mapping matrix. And finally, searching a field closest to the mapping coordinate on the test picture as an extracted field result. The method has more problems that the random offset of the template and the test picture cannot be solved, the matching of the fixed field makes the manufacturing of the template and the matching of the template more complicated, and the problem that the lengths of the template and the test picture field are different, such as the difference between the address of the template and the address of the test picture, is difficult to solve. (2) And intelligent template matching based on text lines. The method regards a text line as a node, and divides a template field into a fixed field and a field to be extracted. And constructing a graph by utilizing the coordinate relation between the fixed field and the field to be extracted, and expressing the characteristics of the field to be extracted as the position in the relation graph. And finally, matching the feature of the text line in the test picture with the text line feature calculation similarity of the template so as to obtain the information to be extracted, which is the same as the template. The problem with this approach is that it also requires fixed fields, adding complexity to the template matching. Meanwhile, the text lines are used as units for matching, so that the problems of splitting and merging of the text lines in text detection cannot be solved.
It is known that although some methods have been tried to solve the problem of template matching, there are some obvious disadvantages and it is difficult to achieve better recognition.
Based on the research, the document processing method and the document processing system provide at least one scheme for processing the document based on a character feature matching mode so as to improve the accuracy of a text recognition result.
To facilitate understanding of the present embodiment, first, a document processing method disclosed in the embodiments of the present disclosure is described in detail, where an execution subject of the document processing method provided in the embodiments of the present disclosure is generally an electronic device with certain computing capability, and the electronic device includes, for example: a terminal device, which may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a vehicle-mounted device, or a server or other processing device. In some possible implementations, the document processing method may be implemented by a processor calling computer readable instructions stored in a memory.
Referring to fig. 1, which is a flowchart of a document processing method provided by the embodiment of the present disclosure, the method includes steps S101 to S104, wherein:
s101: acquiring a target document image to be processed;
s102: extracting character features based on the target document image to obtain first character feature information of each character in the target document image;
s103: matching the first character characteristic information with second character characteristic information of each character extracted from a preset template image to obtain a characteristic matching degree;
s104: and determining a word processing result aiming at the target document image based on the characteristic matching degree.
In order to facilitate understanding of the document processing method provided by the embodiment of the present disclosure, an application scenario of the method is first described below. The document processing method in the embodiment of the disclosure can be mainly applied to any application scene needing to identify characters in an image. In practical application, the method can be widely applied to the identification tasks of bills such as invoices, checks and the like in various industries such as governments, taxation, insurance, medical treatment, finance, factories and mines and the like.
In order to realize more accurate character recognition, the embodiment of the disclosure provides a document processing method based on text feature matching, and the method realizes processing for a document image based on a feature matching degree between first character feature information of each character in a target document image to be processed and second character feature information of each character extracted from a preset template image.
The higher the feature matching degree is, the higher the reliability of the relevant characters of the preset template image which can be referred to is described to a certain extent, and conversely, the lower the feature matching degree is, the lower the reliability of the relevant characters of the preset template image which can be referred to is described to a certain extent, that is, the document recognition is realized from the character granularity in the embodiment of the disclosure, one character in the target document image may be the feature matching degree with one character of the preset template image, or may correspond to a plurality of characters of the preset template image, and in the process of matching by combining the preset template image, the limitation of a preset frame is removed, so that the recognized result is more in line with the actual service scene.
For different recognition tasks, the target document image obtained correspondingly here is also different, for example, the target document image may be an image including an invoice, may also be an image including an identity document, and may also be another document image, which is not limited specifically here. In practical application, the bank invoice, the identity card and the like can be shot or scanned by using equipment such as a camera, a scanner and the like to obtain related images.
For a target document image, the embodiment of the present disclosure may first determine first text feature information of each text, where the first text feature information may be used to represent feature information of a corresponding text, and the feature information may represent a relevant characteristic of the text in the target document image to some extent, for example, may include characteristics such as text content and relevant coordinates thereof, and may uniquely represent the corresponding text in the target document image.
In order to realize character recognition in a related target document image, the embodiment of the disclosure may combine the first character feature information and the feature matching degree determined by the second feature information of each character in the preset template image to realize character processing.
For example, for a template image including contract information, the field of interest may be various fields including names of contract signing parties and contract validation time, and the like, and the specific limitations are not made herein.
For the preset template image, second character feature information of each character in the preset template image may be extracted, and is similar to the first character feature information, where the second character feature information may also be related feature information for representing character characteristics of a corresponding character, and specific contents are referred to in the above description, and are not described herein again.
In practical applications, regardless of the first text feature information or the second text feature information, in the embodiment of the present disclosure, a deeper text feature may be extracted by using a related text feature extraction network, and in addition, the extraction of the related text feature may be implemented by combining with other image processing manners, which is not limited herein.
The word processing result in the implementation of the present disclosure may be an identification result for all words in the entire target document image, or an identification result for related words in the target document image corresponding to the field of interest framed by the template image.
In addition, in practical application, the word processing result can be returned to the user side so as to show the word processing comparison result between the target document image and the preset template image. In the embodiment of the disclosure, the target document image to be processed can be acquired in response to the image acquisition instruction of the user side. In practical applications, the image obtaining instruction may be generated when a user triggers a relevant button of a relevant Application (APP) set at the user side, so that a target document image currently acquired by the user side can be processed.
In practical applications, a user interface for the above document processing may also be provided. In the current user interface, a user can upload a target document image to be processed on the current user interface, and the background can determine a word processing result aiming at the target document image through the processes of related feature extraction and feature matching and can display the word processing result on the user interface.
It should be noted that the display here may be a comparative display result between the preset template image and the target document image, for example, a relevant field corresponding to a field of interest selected from a frame in the preset template image may be selected from the target document image, so as to further understand the accuracy of the recognition result.
In consideration of the key role of the text feature extraction on the feature matching degree calculation, the following may specifically describe the extraction process related to the first text feature information. In the embodiment of the disclosure, the character feature extraction may be performed through the following steps:
step one, performing character detection on a target document image to obtain content information of each character in the target document image and coordinate information of each character in the target document image;
and secondly, performing character feature extraction on each character based on the content information of each character and the coordinate information of each character in the target document image to obtain first character feature information aiming at each character.
In order to realize more accurate character recognition, the content information of the characters and the coordinate information of the characters in the target document image can be integrated to realize the extraction of the character features so as to extract more fused character features, the character features are fused with the content features of one character and the spatial position relationship of the character in all the characters, and the corresponding characters can be uniquely represented to a certain extent, so that the extracted character features are more beneficial to subsequent character recognition.
Before extracting the Character features, the embodiment of the present disclosure may identify content information of each Character from a test target document image based on Optical Character Recognition (OCR), which may specifically be implemented by the following steps:
converting a target document image into a document text by using an Optical Character Recognition (OCR) mode;
and secondly, performing character division on the document text based on the character characteristic template library to obtain content information of each character included in the document text.
Here, the target document image may be converted into the document text based on the OCR manner. In practical application, the target document image may be subjected to preprocessing processes such as graying, binarization, noise removal, tilt correction, and the like. After the tilt correction is performed, the feature vectors extracted from the characters scanned by each part of the image can be subjected to template rough classification and template fine matching with a character feature template library, so that the content information of each character in the document text can be identified.
In the case that the content information of each character in the target document image and the coordinate information of each character in the target document image are identified, the extraction of character features can be realized.
In the embodiment of the present disclosure, on one hand, extraction of relevant text features may be implemented through a text association relationship, and on the other hand, extraction of text features may be implemented directly through a trained text feature extraction network, which may be specifically described in the following two aspects.
In a first aspect: the embodiment of the disclosure can realize character feature extraction according to the following steps:
step one, aiming at a first character in each character, selecting other characters related to the first character from each character based on coordinate information of each character in a target document image;
and secondly, determining first character characteristic information of the first character based on the selected content information of other characters, the content information of the first character and the incidence relation between the first character and other characters.
The first word here may be each word in the respective words, or may be any word/words in the respective words, or may be a specific word specified in the respective words, and is not limited specifically here. In order to implement more comprehensive character recognition, each character in each character may be processed, that is, for each character, other characters associated with the character may be selected from the characters, for example, one or more characters adjacent to the left side of the character may be selected as other characters, one or more characters adjacent to the right side of the character may be selected as other characters, and other associated characters may be selected, for example, all characters in a preset association range of the character, and under a certain condition, all characters except the character in the entire document layout may be used as associated characters of the character, which is not limited specifically herein.
Here, for each character, the first character feature information of each character can be specified by referring to the content information of the character, the content information of another character associated therewith, and the association relationship between characters.
In a second aspect: the embodiment of the disclosure can realize character feature extraction according to the following steps:
step one, acquiring a trained character feature extraction network; the character feature extraction network learns the association relation among all characters in the document image sample in advance;
and secondly, aiming at each character included by each character, extracting character features of each character based on the association relation learned in advance by the character feature extraction network, the content information of each character and the coordinate information of each character in the target document image to obtain first character feature information aiming at each character output by the character feature extraction network.
The content information of each character and the coordinate information of each character in the target document image are input into a character feature extraction network, the feature extraction network can extract character features of the associated characters based on the association relationship between the characters in the document image sample learned in advance, for example, the correlation between every two characters can be calculated based on a transducer model of an attention machine system, so that the features of all the associated characters of the whole document layout are fused into any character, and the fused first character feature information is obtained.
It should be noted that, for specific contents related to extracting the second character feature information of each character with respect to the preset template image, reference may be made to the aforementioned extraction process related to the first character feature information of each character in the target document image, and details are not described here again.
It is known that, for the target document image, the first text feature information of each text in the target document image determined here may be a fusion feature information fusing features of each text on the entire target document layout, and similarly, for the preset template image, the second text feature information of each text in the preset template image determined here may be a fusion feature information fusing features of each text on the entire template layout.
Based on the similarity between the fusion feature information determined by the two images, the feature matching degree can be determined, and then word processing for the target document image can be realized, which can be specifically realized through the following steps:
step one, aiming at a first character in a target document image, matching first character feature information of the first character with second character feature information of each character extracted from a preset template image to obtain a feature matching degree between the first character and each second character in the preset template image;
secondly, selecting second characters with characteristic matching degrees meeting preset requirements from all second characters in a preset template image aiming at the first characters in the target document image;
step three, determining a word processing result of the first word based on the selected preset label of the second word;
and step four, determining a word processing result aiming at the target document image based on the word processing result of the first word.
Here, the relevant feature matching degree may be a feature matching degree between each character in the target document image and each character in the preset template image, and the higher the feature matching degree is, the higher the possibility of indicating that two characters are in the same identification dimension to some extent is, whereas the lower the feature matching degree is, the lower the possibility of indicating that two characters are in the same identification dimension to some extent is.
The character feature information is fused with the feature information correspondingly, so that the determined feature matching degree can represent feature expression conditions of two characters in respective images to a certain degree, the higher the feature matching degree is, the greater the association degree between a preset labeling label indicated by a second character corresponding to a first character of a target document image and the first character in a preset template image can be considered, and then the labeling label of each first character can be determined based on the preset labeling label of the second character to determine a corresponding character processing result.
Here, the first words belonging to the same label tag may be divided into the same identification box, and then the word processing result for the target document image may be determined, which may specifically be implemented by the following steps:
step one, aiming at each interested field in a preset template image, determining a word processing result of a first word corresponding to the interested field based on a preset labeling label indicated by the interested field;
and step two, determining the word processing result determined by the at least one region of interest as the word processing result aiming at the target document image.
Here, based on the setting of the interest field in the preset template image, the word processing result of the first word corresponding to the interest field may be determined, and then the word processing result corresponding to the interest field is summarized to determine the word processing result for the target document image, that is, only the word processing result of the interest field marked by the user may be fed back, so that the user may perform more targeted analysis on the recognition result, and the problem of labor waste caused by searching for the interest information from each word may be effectively alleviated.
In practical applications, all the text of the target document image is unlabeled in advance. Therefore, here, the character features on the target document image are matched with the character features on the preset template image, the feature matching degree is calculated, and then the character recognition result can be determined based on the feature matching degree of each character determined by the target document image.
It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.
Based on the same inventive concept, the embodiment of the present disclosure further provides a document processing apparatus corresponding to the document processing method, and as the principle of the apparatus in the embodiment of the present disclosure for solving the problem is similar to the document processing method described above in the embodiment of the present disclosure, the implementation of the apparatus may refer to the implementation of the method, and repeated details are not repeated.
Referring to fig. 2, a schematic diagram of a document processing apparatus provided in an embodiment of the present disclosure is shown, the apparatus including: the device comprises an acquisition module 201, an extraction module 202, a matching module 203 and a processing module 204; wherein the content of the first and second substances,
an obtaining module 201, configured to obtain a target document image to be processed;
the extraction module 202 is configured to perform character feature extraction based on a target document image to obtain first character feature information of each character in the target document image;
the matching module 203 is configured to match the first character feature information with second character feature information of each character extracted from a preset template image, so as to obtain a feature matching degree;
and the processing module 204 is used for determining a word processing result aiming at the target document image based on the feature matching degree.
By adopting the document processing device, under the condition that the target document image to be processed is obtained, character feature extraction can be carried out, then the extracted first character feature information of each character is matched with the second character feature information of each character extracted from the preset template image, and the character processing result aiming at the target document image is determined based on the obtained feature matching degree. The method realizes the identification of the related target document image by utilizing the character characteristic matching mode, can flexibly identify the lengths of various fields by referring to the preset template image under the condition of high enough characteristic matching degree, and has high identification accuracy.
In a possible implementation manner, the extraction module 202 is configured to perform text feature extraction based on a target document image to obtain first text feature information of each text in the target document image, according to the following steps:
performing character detection on the target document image to obtain content information of each character in the target document image and coordinate information of each character in the target document image;
and performing character feature extraction on each character based on the content information of each character and the coordinate information of each character in the target document image to obtain first character feature information aiming at each character.
In a possible implementation manner, the extraction module 202 is configured to perform text feature extraction on each text based on content information of each text and coordinate information of each text in the target document image, according to the following steps:
aiming at a first character in each character, selecting other characters related to the first character from each character based on the coordinate information of each character in the target document image;
and determining the first character characteristic information of the first character based on the selected content information of other characters, the content information of the first character and the incidence relation between the first character and other characters.
In a possible implementation manner, the extraction module 202 is configured to perform text feature extraction on each text based on content information of each text and coordinate information of each text in the target document image, according to the following steps:
acquiring a trained character feature extraction network; the character feature extraction network learns the association relation among all characters in the document image sample in advance;
and aiming at each character included in each character, performing character feature extraction on each character based on the association relation learned in advance by the character feature extraction network, the content information of each character and the coordinate information of each character in the target document image to obtain first character feature information aiming at each character output by the character feature extraction network.
In one possible implementation, the extraction module 202 is configured to perform text detection on the target document image according to the following steps:
converting the target document image into a document text by using an Optical Character Recognition (OCR) mode;
and performing character division on the document text based on the character characteristic template library to obtain content information of each character included in the document text.
In a possible implementation manner, the matching module 203 is configured to match the first text feature information with the second text feature information of each text extracted from the preset template image, to obtain a feature matching degree, according to the following steps:
aiming at a first character in a target document image, matching first character feature information of the first character with each second character feature information extracted from a preset template image to obtain a feature matching degree between the first character and each second character in the preset template image;
the processing module 204 is configured to determine a word processing result for the target document image based on the feature matching degree according to the following steps:
aiming at the first characters in the target document image, selecting second characters with characteristic matching degrees meeting preset requirements from all second characters in a preset template image;
determining a word processing result of the first word based on the selected preset label of the second word;
based on the word processing result of the first word, a word processing result for the target document image is determined.
In one possible implementation mode, at least one interested field is marked in the preset template image in advance; a processing module 204, configured to determine a word processing result for the target document image based on the word processing result of the first word according to the following steps:
aiming at each interested field in a preset template image, determining a word processing result of a first word corresponding to the interested field based on a preset labeling label indicated by the interested field;
and determining the word processing result determined by the at least one region of interest as the word processing result aiming at the target document image.
In a possible implementation manner, the obtaining module 201 is configured to obtain a target document image to be processed according to the following steps:
responding to an image acquisition instruction of a user side, and acquiring a target document image to be processed;
the processing module 204 is further configured to return a word processing result to the user side after determining the word processing result for the target document image based on the feature matching degree; the user side is used for displaying a word processing comparison result between the target document image and the preset template image.
The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.
An embodiment of the present disclosure further provides an electronic device, as shown in fig. 3, which is a schematic structural diagram of the electronic device provided in the embodiment of the present disclosure, and the electronic device includes: a processor 301, a memory 302, and a bus 303. The memory 302 stores machine-readable instructions executable by the processor 301 (for example, execution instructions corresponding to the obtaining module 201, the extracting module 202, the matching module 203, and the processing module 204 in the apparatus in fig. 2, and the like), when the electronic device is operated, the processor 301 and the memory 302 communicate through the bus 303, and when the machine-readable instructions are executed by the processor 301, the following processes are performed:
acquiring a target document image to be processed;
extracting character features based on the target document image to obtain first character feature information of each character in the target document image;
matching the first character characteristic information with second character characteristic information of each character extracted from a preset template image to obtain a characteristic matching degree;
and determining a word processing result aiming at the target document image based on the characteristic matching degree.
The disclosed embodiments also provide a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, performs the steps of the document processing method described in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.
The embodiments of the present disclosure also provide a computer program product, where the computer program product carries a program code, and instructions included in the program code may be used to execute steps of the document processing method in the foregoing method embodiments, which may be referred to specifically in the foregoing method embodiments, and are not described herein again.
The computer program product may be implemented by hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing an electronic device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (11)

1. A method of document processing, comprising:
acquiring a target document image to be processed;
extracting character features based on the target document image to obtain first character feature information of each character in the target document image;
matching the first character feature information with second character feature information of each character extracted from a preset template image to obtain a feature matching degree;
and determining a word processing result aiming at the target document image based on the characteristic matching degree.
2. The method of claim 1, wherein the extracting character features based on the target document image to obtain first character feature information of each character in the target document image comprises:
performing character detection on the target document image to obtain content information of each character in the target document image and coordinate information of each character in the target document image;
and performing character feature extraction on each character based on the content information of each character and the coordinate information of each character in the target document image to obtain first character feature information aiming at each character.
3. The method of claim 2, wherein the performing text feature extraction on the respective text based on the content information of the respective text and the coordinate information of each text in the target document image comprises:
aiming at a first character in each character, selecting other characters related to the first character from each character based on the coordinate information of each character in the target document image;
and determining first character feature information of the first character based on the selected content information of the other characters, the selected content information of the first character and the incidence relation between the first character and the other characters.
4. The method of claim 2, wherein the performing text feature extraction on the respective text based on the content information of the respective text and the coordinate information of each text in the target document image comprises:
acquiring a trained character feature extraction network; the character feature extraction network learns the association relation among all characters in the document image sample in advance;
and aiming at each character included in each character, performing character feature extraction on each character based on the association relation learned in advance by the character feature extraction network, the content information of each character and the coordinate information of each character in the target document image to obtain first character feature information aiming at each character output by the character feature extraction network.
5. The method according to any one of claims 2 to 4, wherein the performing text detection on the target document image comprises:
converting the target document image into a document text by using an Optical Character Recognition (OCR) mode;
and performing character division on the document text based on a character feature template library to obtain content information of each character included in the document text.
6. The method according to any one of claims 2 to 5, wherein the matching the first character feature information with second character feature information of each character extracted from a preset template image to obtain a feature matching degree comprises:
aiming at a first character in the target document image, matching first character feature information of the first character with second character feature information of each character extracted from a preset template image to obtain a feature matching degree between the first character and each second character in the preset template image;
the determining a word processing result for the target document image based on the feature matching degree comprises:
selecting second characters with characteristic matching degrees meeting preset requirements from all second characters in the preset template image aiming at the first characters in the target document image;
determining a word processing result of the first word based on the selected preset label of the second word;
and determining a word processing result aiming at the target document image based on the word processing result of the first word.
7. The method according to claim 6, wherein the preset template image is pre-marked with at least one field of interest; the determining a word processing result for the target document image based on the word processing result of the first word comprises:
for each interested field in the preset template image, determining a word processing result of a first word corresponding to the interested field based on a preset labeling label indicated by the interested field;
and determining the word processing result determined by the at least one region of interest as the word processing result aiming at the target document image.
8. The method according to any one of claims 1 to 7, wherein the acquiring the target document image to be processed comprises:
responding to an image acquisition instruction of a user side, and acquiring a target document image to be processed;
after the determining a word processing result for the target document image based on the feature matching degree, the method further comprises:
returning the word processing result to the user side; the user side is used for displaying a word processing comparison result between the target document image and the preset template image.
9. A document processing apparatus, comprising:
the acquisition module is used for acquiring a target document image to be processed;
the extraction module is used for extracting character features based on the target document image to obtain first character feature information of each character in the target document image;
the matching module is used for matching the first character characteristic information with second character characteristic information of each character extracted from a preset template image to obtain a characteristic matching degree;
and the processing module is used for determining a word processing result aiming at the target document image based on the characteristic matching degree.
10. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the steps of the document processing method of any of claims 1 to 8.
11. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, performs the steps of the document processing method according to any one of claims 1 to 8.
CN202210502303.8A 2022-05-09 2022-05-09 Document processing method and device, electronic equipment and storage medium Pending CN114821623A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210502303.8A CN114821623A (en) 2022-05-09 2022-05-09 Document processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210502303.8A CN114821623A (en) 2022-05-09 2022-05-09 Document processing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114821623A true CN114821623A (en) 2022-07-29

Family

ID=82513375

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210502303.8A Pending CN114821623A (en) 2022-05-09 2022-05-09 Document processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114821623A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116403203A (en) * 2023-06-06 2023-07-07 武汉精臣智慧标识科技有限公司 Label generation method, system, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116403203A (en) * 2023-06-06 2023-07-07 武汉精臣智慧标识科技有限公司 Label generation method, system, electronic equipment and storage medium
CN116403203B (en) * 2023-06-06 2023-08-29 武汉精臣智慧标识科技有限公司 Label generation method, system, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US10140511B2 (en) Building classification and extraction models based on electronic forms
US10943105B2 (en) Document field detection and parsing
CN111476227B (en) Target field identification method and device based on OCR and storage medium
US20190385054A1 (en) Text field detection using neural networks
RU2679209C2 (en) Processing of electronic documents for invoices recognition
CN111406262A (en) Cognitive document image digitization
CN114092938B (en) Image recognition processing method and device, electronic equipment and storage medium
CN113780229A (en) Text recognition method and device
KR20150091948A (en) A system for recognizing a font and providing its information and the method thereof
CN112668580A (en) Text recognition method, text recognition device and terminal equipment
Akinbade et al. An adaptive thresholding algorithm-based optical character recognition system for information extraction in complex images
CN115937887A (en) Method and device for extracting document structured information, electronic equipment and storage medium
CN112149680A (en) Wrong word detection and identification method and device, electronic equipment and storage medium
CN113673528B (en) Text processing method, text processing device, electronic equipment and readable storage medium
CN114821623A (en) Document processing method and device, electronic equipment and storage medium
CN113642569A (en) Unstructured data document processing method and related equipment
CN112801099A (en) Image processing method, device, terminal equipment and medium
CN116052195A (en) Document parsing method, device, terminal equipment and computer readable storage medium
CN115034177A (en) Presentation file conversion method, device, equipment and storage medium
CN113128496B (en) Method, device and equipment for extracting structured data from image
CN114529933A (en) Contract data difference comparison method, device, equipment and medium
Pattnaik et al. A Framework to Detect Digital Text Using Android Based Smartphone
Karambelkar et al. Automated Text Extraction from Images using Optical Character Recognition.
CN112101356A (en) Method and device for positioning specific text in picture and storage medium
CN116304163B (en) Image retrieval method, device, computer equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination