WO2022057707A1 - Procédé de reconnaissance de texte, procédé de classification de reconnaissance d'image et procédé de traitement de reconnaissance de document - Google Patents

Procédé de reconnaissance de texte, procédé de classification de reconnaissance d'image et procédé de traitement de reconnaissance de document Download PDF

Info

Publication number
WO2022057707A1
WO2022057707A1 PCT/CN2021/117222 CN2021117222W WO2022057707A1 WO 2022057707 A1 WO2022057707 A1 WO 2022057707A1 CN 2021117222 W CN2021117222 W CN 2021117222W WO 2022057707 A1 WO2022057707 A1 WO 2022057707A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
image
recognition
document
textual
Prior art date
Application number
PCT/CN2021/117222
Other languages
English (en)
Chinese (zh)
Inventor
徐青松
李青
Original Assignee
杭州睿琪软件有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州睿琪软件有限公司 filed Critical 杭州睿琪软件有限公司
Publication of WO2022057707A1 publication Critical patent/WO2022057707A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • the present invention relates to the technical field of machine learning, and in particular, to a text recognition method, an image recognition classification method, a document recognition processing method, an electronic device, and a computer-readable storage medium.
  • OCR Optical Character Recognition, Optical Character Recognition
  • OCR refers to an electronic device (such as a scanner or digital camera) that examines characters printed on paper, determines its shape by detecting dark and light patterns, and then uses character recognition methods to translate the shape into a computer
  • the process of text that is, for printed characters, the text in the paper document is optically converted into a black and white dot matrix image file, and the text in the image is converted into a text format by the recognition software for the word processing software. Editing techniques.
  • the recognition model can usually be used to recognize the characters in the document.
  • the same model cannot be used for the recognition of documents in different languages. It is necessary to know the language of the document before calling the corresponding recognition model. If it is a mixed language It is more difficult to identify the documents of different languages. It can be seen that the existing OCR recognition technology has the problem of low text recognition accuracy for documents in different languages.
  • the purpose of the present invention is to provide a text recognition method, an image recognition classification method, a document recognition processing method, an electronic device, and a computer-readable storage medium.
  • the specific technical solutions are as follows:
  • the present invention provides a text recognition method, comprising:
  • Adopt character recognition model to recognize the character in each described text line, obtain the preliminary recognition result of described text to be recognized
  • the corresponding language recognition model is called according to the language type, and the corresponding character part is recognized to obtain the target recognition result of the text to be recognized.
  • the method further includes: recognizing the direction of the text to be recognized in the text image, and if the direction does not meet a preset condition, correcting the direction of the text to be recognized;
  • the identifying the direction of the text to be identified in the text image includes:
  • a direction recognition model is used to recognize the direction of the text to be recognized in the text image, and the direction recognition model is a CNN-based neural network model.
  • the character recognition model is a neural network model based on the CTC connectionism time classification technology and the Attention mechanism.
  • the character recognition model is obtained by training a training sample set including the CJK character set and the ISO8859 1-16 character set.
  • the language classification model is a fasttext ⁇ N-Gram> language classification model based on the wiki data set.
  • the present invention also provides an image recognition and classification method, including:
  • a keyword is determined according to the text recognition result, a first subdivision type of the content of the textual image or a second subdivision type of the content of the non-textual image is determined according to the keyword, and the text Class images are classified into folders corresponding to the first subdivision type, and the non-text class images are classified into folders corresponding to the second subdivision type.
  • the method further includes:
  • the textual image or the non-textual image is automatically named by using the keyword.
  • image recognition classification method after recognizing a textual image or a non-textual image, it also includes:
  • classifying the textual image into a folder corresponding to the first subdivision type, and classifying the non-textual image into a folder corresponding to the second subdivision type include:
  • the first subdivision type includes: one or more of notes, certificates, receipts, screenshots, documents, and certificates.
  • the image recognition model identifies the content in the non-text images
  • the image recognition and classification method further includes:
  • the second subdivision type is determined according to the content of the non-text image, and the non-text image is classified into a folder corresponding to the second subdivision type.
  • the method further includes:
  • the non-text images are automatically named according to the content in the non-text images.
  • the method further includes:
  • the method further includes:
  • the method before executing printing, the method further includes:
  • the signature is performed in the preset signature area in the text image that needs to be signed;
  • the present invention also provides a document identification processing method, including:
  • the character recognition results of the original document are arranged to obtain a recognized document.
  • the character recognition results of the original document are arranged to obtain a recognized document, including:
  • the original text in the original document is replaced by the character recognition result of the original document to obtain a recognized document.
  • the method further includes:
  • the original document is compared with the identification document to determine whether there is a difference between the identification document and the original document, and if there is, the difference is corrected in the identification document.
  • the method before the input image is recognized, the method further includes:
  • a correction model is used to identify the radian of the curve of the original document in the input image, and if the radian of the curve satisfies a preset correction condition, a correction process is performed on the original document in the input image to remove the radian of the original document. Curve radian.
  • the method further includes:
  • the character recognition result corresponding to the marked content is typeset into a format consistent with the original document.
  • the present invention also provides an electronic device, comprising a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory communicate with each other through the communication bus;
  • the memory for storing computer programs
  • the processor when executing the program stored in the memory, implements the steps in the text recognition method as described above, or implements the steps in the image recognition and classification method as described above, or implements the steps in the above-described image recognition and classification method. Steps in the described document identification processing method.
  • the present invention also provides a computer-readable storage medium on which instructions are stored, and when the instructions are executed, implement the steps in the text recognition method as described above, or implement the above-described steps in the text recognition method.
  • the steps in the image recognition classification method described, or the steps in the document recognition processing method as described above are implemented.
  • the text recognition method, image recognition classification method, document recognition processing method, electronic device, and computer-readable storage medium provided by the present invention have the following advantages:
  • the text recognition method and the corresponding electronic device and computer-readable storage medium provided by the present invention, when performing text recognition, firstly, the text lines in the text to be recognized are marked with a general text line box, and then a character recognition model is used to identify each text line. Recognition is performed to obtain the preliminary recognition result of the text to be recognized, and then the language type is recognized on the preliminary recognition result, and the corresponding language recognition model is called according to the recognized language type to further recognize the character part corresponding to the language type. After optimization character recognition results.
  • a separate language recognition model is used for accurate recognition according to the language type involved, thereby improving the accuracy of text recognition.
  • the image recognition and classification method provided by the present invention, the corresponding electronic equipment, and the computer-readable storage medium can use the above-mentioned OCR text recognition method for text recognition for both textual images and non-textual images, and obtain the The text recognition results of the textual images and the non-textual images, and the keywords are determined according to the text recognition results to classify the textual images and the non-textual images, because the image classification is performed according to the text content in the images. , the classification result is more accurate, and at the same time, the determined keywords provide convenience for the subsequent use of keywords to search for images, and realize fast search of images.
  • non-text images can also be classified by image content, which also improves the accuracy of the classification results.
  • the document recognition processing method and the corresponding electronic equipment and computer-readable storage medium provided by the present invention use the OCR text recognition method to recognize the to-be-recognized document in the input image, so as to obtain the recognized document, because the uneditable document is converted into
  • the editable document provides convenience for the subsequent use of keywords in the document to obtain the document, and realizes the fast search of the document.
  • by correcting the radian of the input image and recognizing and adjusting the fonts marked and referenced in the document errors in the process of converting the document to be recognized in the input image into editable electronic text are reduced, and the conversion accuracy is improved.
  • FIG. 1 is a schematic flowchart of a text recognition method provided by an embodiment of the present invention.
  • FIG. 2 is a schematic flowchart of an image recognition and classification method provided by an embodiment of the present invention.
  • Fig. 3 is an example diagram of image recognition classification display
  • FIG. 4 is a schematic flowchart of a document identification processing method provided by an embodiment of the present invention.
  • Figure 5a is an example diagram of an input image containing an original document
  • Figure 5b is an example diagram of a recognized document obtained after the method of the present invention is used to recognize the input image shown in Figure 5a;
  • Figure 6a is another example diagram of an input image containing an original document
  • Figure 6b is an example diagram of a recognized document obtained after the input image shown in Figure 6a is recognized by an existing method
  • Figure 6c is an example diagram of a recognized document obtained after the method of the present invention is used to recognize the input image shown in Figure 6a;
  • FIG. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention.
  • FIG. 1 shows a flowchart of a text recognition method according to an exemplary embodiment of the present invention.
  • the method can be implemented in an application program (app) installed on a smart terminal such as a mobile phone and a tablet computer.
  • the method may include:
  • Step S101 identifying text lines in the text to be identified in the text image, and marking each of the text lines with a general text line box.
  • a text image refers to an image whose image content is mainly text, such as a business card image, a document image, a certificate image, a certificate image, and a note image, which may be an image obtained by taking pictures of the text, or it may be Image of scanned text.
  • the note image may be an image obtained by taking pictures of the handwritten font text content on the paper.
  • the text to be recognized in the text image includes one or more text lines.
  • the present invention uses the text OCR recognition method for text recognition. During recognition, each text line is recognized separately. Finally, the recognition results of the entire text to be recognized are obtained by combining the recognition results of all text lines. Therefore, during recognition, each text line in the text to be recognized in the text image needs to be recognized, and at the same time, each text line is marked with a general text line box.
  • the language in the text line is not limited, but is only processed according to the word line, that is, when the characters in a text line have multiple language types, as long as these characters are located in the In the same text line, it is marked in the same general text line box.
  • a text image has the front and back sides of the ID card, and these two documents need to be identified separately, so before step S101 is executed, it is also possible to identify the The document area in the text image (that is, the area where the text to be recognized is located), and the document area is sliced. For example, it can be sliced through the callout box, or the edge of the document area can be identified by the edge recognition method, and then sliced according to the edge. .
  • the method further includes: recognizing the direction of the text to be recognized in the text image, and if the direction does not meet a preset condition, correcting the direction of the text to be recognized.
  • a direction recognition model may be used to recognize the direction of the text to be recognized in the text image, and the direction recognition model may be a CNN-based neural network model.
  • the reference direction may be set as a positive direction along the horizontal direction.
  • the direction recognition model can identify the angle between the arrangement direction of the characters in the text line and the positive horizontal direction in the text image. If the angle is 0, no correction is required. If the angle is not 0, the Text images are corrected.
  • the method of the correction processing is to invert the text image, so that the included angle between the characters in the text line of the text to be recognized and the horizontal positive direction in the text image is 0. In this embodiment, it may be considered that the direction to the right along the horizontal direction is the positive horizontal direction. In other embodiments, other directions may also be set as the positive direction, which is not limited in the present invention.
  • the correction processing method can also be based on the average slope of a plurality of text lines as a correction reference, or other correction methods, which are not limited in the present invention.
  • step S102 a character recognition model is used to recognize characters in each of the text lines, and a preliminary recognition result of the text to be recognized is obtained.
  • the character recognition model is an All in one model, which is obtained by training with multiple character sets, such as CJK character set and ISO8859 1-16 character set, etc. Therefore, the character recognition model can support CJK and Recognition of Latin fonts.
  • the character recognition model is a neural network model based on the CTC connectitul time classification technology and the Attention mechanism. Inputting each text line into the character recognition model respectively, the character recognition model can output the character recognition result of the text line, and then the character recognition result of the text to be recognized can be obtained in combination with the character recognition results of each text line, as a preliminary identification result.
  • Connectionist Temporal Classification is a time series classification algorithm that does not have strict alignment information between data units and annotation units. This algorithm is currently widely used in optical character recognition (OCR) and speech recognition.
  • OCR optical character recognition
  • CTC The main function of the model is to construct a loss function for the sequence, and in the process of backpropagation, the gradient determined according to the loss function is returned to the previous layer to complete the training of the CTC model.
  • Attention attention mechanism has a huge improvement effect on sequence learning tasks.
  • the codec framework by adding A model to the encoding segment, the data weighting transformation is performed on the source data sequence, or the A model is introduced at the decoding end to improve the target data. Making weighted changes can effectively improve the system performance in a sequence-to-sequence natural way.
  • the invention adopts the combination of CTC connectionism time classification technology and Attention mechanism to construct a character recognition model, which can improve the accuracy of character recognition.
  • Step S103 using a language classification model to perform language identification on the preliminary identification result, obtain the language types involved in the preliminary identification result, and divide the preliminary identification result into a plurality of different character parts according to the language type.
  • step S102 Since the character recognition model used in step S102 is obtained by training character sets of multiple different languages, the accuracy of the character recognition model for the character recognition results in the text line is not high, so it is necessary to carry out the preliminary recognition results. Optimization, further recognize characters of different languages in the characters to improve the accuracy of character recognition.
  • a language classification model is used to perform language recognition on the preliminary recognition results, and the language types involved in the preliminary recognition results are obtained, wherein the langid technology can be used to identify language types (ie, language types), and the language classification model is based on A fasttext ⁇ N-Gram> language classification model for the wiki dataset.
  • fasttext is a word vector and text classification tool.
  • the typical application scenario is "supervised text classification problem”. It provides a simple and efficient method for text classification and representation learning, and its performance is faster than that of deep learning.
  • N-Gram is a language model commonly used in large-vocabulary continuous language recognition.
  • CLM Chinese Language Model
  • the preliminary recognition result can be divided into a plurality of different character parts, that is, the characters of each language type are divided into the same character part.
  • Step S104 calling a corresponding language recognition model according to the language type to recognize the corresponding character part, and obtain the target recognition result of the text to be recognized.
  • each language type has a corresponding language recognition model. After obtaining the language types involved in the to-be-recognized text and the character parts corresponding to each language type in step S103, call the corresponding language recognition model for the corresponding After the character parts are recognized, more accurate character recognition results of each character part can be obtained, and then the target recognition results of the text to be recognized can be obtained.
  • the text line in the text to be recognized is first marked with a general text line frame, and then the character recognition model is used to recognize each text line to obtain the text to be recognized. Then, the language type is recognized on the preliminary recognition result, and the corresponding language recognition model is called according to the recognized language type to further recognize the character part corresponding to the language type, and the optimized character recognition result is obtained.
  • a separate language recognition model is used for accurate recognition according to the language type involved, thereby improving the accuracy of text recognition.
  • the present invention also proposes an image recognition and classification method, which is used to classify and organize a large number of images, and classify images with similar content into the same folder, so as to facilitate users to consult and search .
  • the image recognition and classification method includes the following steps:
  • step S201 an image recognition model is used to recognize the image to be classified, and a textual image or a non-textual image is recognized.
  • the image to be classified may be a newly captured image, or may be an image that has been captured and saved in a folder, such as an image saved in a mobile phone album.
  • Text-based images refer to images whose image content is mainly text, such as business card images, document images, certificate images, certificate images, and note images, which can be images obtained by taking pictures of texts, or images of texts. Image obtained after scanning.
  • the note image may be an image obtained by taking pictures of the handwritten font text content on the paper.
  • Non-text images refer to images whose content is mainly non-text, such as photos of people's lives, landscapes, and photos of animals and plants.
  • the image recognition model By recognizing the image to be classified by the image recognition model, it can be identified whether the image to be classified belongs to a textual image or a non-textual image, so that a textual image and a non-textual image can be classified.
  • the images are automatically classified and stored in different preset folders. That is, after recognizing the textual image, the textual image is classified into the textual image folder, and after recognizing the non-textual image, the non-textual image is classified into the non-textual image folder.
  • Step S202 Recognize the text in the textual image or the non-textual image to obtain a text recognition result of the textual image or the non-textual image.
  • the text recognition method shown in FIG. 1 may be used to recognize the text in the textual image or the non-textual image.
  • the specific identification process is not repeated here.
  • Different pictures can also be classified according to the language type of the text recognition result.
  • Step S203 Determine a keyword according to the text recognition result, determine a first subdivision type of the content of the textual image or a second subdivision type of the non-textual image according to the keyword, and use the The textual images are classified into a folder corresponding to the first subdivision type, and the non-textual images are classified into a folder corresponding to the second subdivision type.
  • a keyword classification model may be used to obtain keywords from the text recognition result, and then determine the first subdivision type of the content of the textual image or the first subdivision type of the content of the non-textual image according to the keyword the second subdivision type, and further classify the textual images into the folders corresponding to the first subdivision type, and classify the non-text images into the folders corresponding to the second subdivision type middle.
  • the first subdivision type includes, but is not limited to, one or more of: notes, certificates, receipts, screenshots, documents, and certificates.
  • the keyword classification model can obtain the keyword "" from the text recognition result. ID card”, thus it can be determined according to the keyword that the first subdivision type of the content of the textual image is a document image, and then the textual image can be classified into a document of the subdivision type of "document image” in the folder.
  • the subdivision type of "document image” can be further divided, for example, it can be further divided into various specific types including ID card, driver's license, passport, military officer's photo, work permit, birth certificate, household registration book and so on. Therefore, the specific type of the textual image can also be determined according to the keyword, and the textual image is further classified into a subfolder of the specific type under the folder corresponding to the first subdivision type.
  • the keyword classification model can obtain the keyword "ID card” from the text recognition result, and the textual image can be further classified into the "document image”
  • the subfolder of this specific type is "ID card” under the sub-type folder. It can be understood that, under the sub-folder of "document image", several specific types of subfolders can be set, such as ID card, driver's license, passport, military officer's photo, work permit, birth certificate, and household registration book.
  • the classified text images can be set into a file tree, in which each folder is named progressively layer by layer, so that each text image to be classified can be automatically classified into the corresponding folder.
  • the keyword may also be used to automatically name the textual image.
  • the classified images can be sorted according to the modified chronological order, or can be sorted according to the shooting chronological order, or the sorting method can also be set as required.
  • the user can search the classified text-based images according to keywords, so as to quickly find the target file. Specifically, in response to an operation of inputting a search term by the user, it is searched whether there is a keyword matching the search term, and if so, a text image corresponding to the keyword is output. For example, when the search term entered by the user is "identity”, it searches whether there is a keyword matching the search term, if there is a matching keyword "identity card”, then the keyword "identity card” corresponds to Textual image output is displayed to the user.
  • the second subdivision type may include: life photos of people, landscape photos, animal photos, plant photos, and the like.
  • the non-text image is a photo of Leifeng Pagoda, and the image contains three words "Leifeng Pagoda”
  • the text recognition result of the non-text image is "Leifeng Pagoda”
  • the keyword is Leifeng Pagoda Peak tower
  • the subdivision type "landscape photo” can be further divided, for example, the scenery photo can be further divided according to the name of the scenic spot. Therefore, the specific type of the non-text image can also be determined according to the identified name of the scenic spot, and the non-text image can be further classified into a subfolder of the specific type under the folder corresponding to the second subdivision type middle. For example, for the non-text image in the preceding example, since the non-text image is identified as a photo of Leifeng Pagoda, the non-text image can be further classified under the sub-category “landscape photos” "Photos of Leifeng Pagoda" in this specific subfolder. It can be understood that specific types of sub-folders corresponding to different scenic spots can be set under the sub-folder of “landscape photos”.
  • the image recognition model can also recognize the content in the non-text image, so it can also The second subdivision type is determined according to the content of the non-text image, and the non-text image is classified into a folder corresponding to the second subdivision type. For example, if the image recognition model recognizes that the content displayed by the non-text image is Leifeng Pagoda, it can be determined that the second subdivision type of the content of the non-text image is landscape photos, and then the non-text image can be Images are grouped into sub-categories "Landscape".
  • the classified non-text images can be set into a file tree, in which each folder is named progressively, so that each to-be-classified non-text image can be automatically classified into a corresponding folder.
  • the non-text images may be automatically named according to the content of the non-text images.
  • the content of the non-text image may include the recognized names of animals and plants, the names of scenic spots, etc., so the non-text images can be automatically named according to the recognized names of animals and plants, the names of scenic spots, and the like.
  • the non-text images are automatically named according to the keywords obtained by the keyword classification model. Through the automatic naming, the finding of the non-text images can be facilitated.
  • classification of non-text images can also be classified according to the shooting time, location, relevance of people, and names.
  • encryption processing can be performed to ensure the security of the files, for example, encryption processing is performed for important documents such as certificates, or encryption processing is performed for private life photos of people.
  • encryption processing you can encrypt a single file, or you can encrypt the corresponding folder.
  • the document to be printed can be imported with one key and related processing of the document can be performed according to the classification result. , you can search for pictures by keywords to import the pictures to be printed, and realize the printing function.
  • the method further includes: if there is a text image that needs to be signed in all the imported text images, signing in a preset signature area in the text image that needs to be signed; and/or, if If there are defective text images in all imported text images, filter the defective text images.
  • a signature area is set for some documents to be signed, and the signature can be directly performed on the image, and the signed document is then printed.
  • the image can also be binarized.
  • the OCR text recognition method described above can be used for text recognition, and the textual images and the non-textual images can be obtained.
  • the text recognition result of the textual image, and the keywords are determined according to the text recognition result to classify the textual image and the non-textual image. Since the image classification is performed according to the text content in the image, the classification result is more accurate, and at the same time
  • the determined keywords provide convenience for subsequent use of keywords to search for images, thereby realizing fast image search.
  • non-text images can also be classified by image content, which also improves the accuracy of the classification results.
  • the present invention also proposes a document recognition processing method for converting different types of files, such as scanned files, PDF files or pictures, into texts that can be searched or edited at any time.
  • a document recognition processing method for converting different types of files, such as scanned files, PDF files or pictures, into texts that can be searched or edited at any time.
  • the search can be carried out according to the document content or the text on the picture, that is, only the Enter keywords in the search box, no matter the title, content, remarks, or text on the picture, it can be intelligently searched.
  • the image recognition and classification method includes the following steps:
  • Step S301 acquiring an input image, where the input image contains the original document to be recognized.
  • the type of the original document may be a paper document
  • the input image may be formed by taking a photo or scanning
  • the type of the original document may also be an electronic document, such as a PDF document or a picture document with uneditable text, in this case the input image can be obtained directly.
  • Step S302 Recognize the original document in the input image to obtain a character recognition result of the original document.
  • the text recognition method shown in FIG. 1 may be used to recognize the original document in the input image.
  • the specific identification process is not repeated here.
  • Step S303 Arrange the character recognition results of the original document according to the position information of each character of the original document in the input image to obtain a recognized document.
  • the character recognition results of the original document are arranged to obtain a recognized document, including:
  • the original text in the original document is replaced by the character recognition result of the original document to obtain a recognized document.
  • Figure 5a shows the input image containing the original document
  • Figure 5b shows the finally obtained recognized document
  • each character of the original document can be obtained during processing
  • the coordinate information in the input image so that after the character recognition result of the original document is obtained, each character is placed in the corresponding position in the input image according to the coordinate information of the characters to replace the characters in the original document, thereby Get the identification document.
  • the use of OCR can convert the characters on the input image into editable characters, intelligent recognition, without manual typing input, can instantly convert PPT, PDF files, pictures, business cards, test papers, etc. into electronic manuscripts that can be edited and modified Identify documents.
  • the original document may also be compared with the identification document to determine whether there is a difference between the identification document and the original document, and if so, compare the identification document to the identification document. Correct the difference.
  • a manual verification method can be used to compare the original document with the output editable electronic text of the identification document, and find out the difference between the editable electronic text and the original document during the conversion process.
  • the original document in the acquired input image cannot be recognized due to problems such as radian, or the recognized recognized document may not be recognized.
  • the output is garbled.
  • a correction model may be used to identify the radian of the curve of the original document in the input image, and if the radian of the curve satisfies a preset correction condition, the original document in the input image is corrected to remove all
  • the radian of the curve of the original document can be corrected manually in practical applications, or other correction methods can be used.
  • the present invention uses an annotation recognition model to recognize the input image to recognize the annotation content in the original document, and in the recognized document, the character recognition result corresponding to the annotation content is typeset into Format consistent with the original document.
  • the invention recognizes the input image through the label recognition model, distinguishes the label content from the characters of the original document, and outputs the label content not in the same text form as other character content, but in a form consistent with the original document.
  • Fig. 6c shows the identification document processed by the method of the present invention. It can be seen from Figs. 6a and 6c that in the process of OCR identification, the label is automatically identified by the label recognition model, and then the identification document is checked according to the identification result. After that, it is automatically typeset into a format consistent with the original text, so that the OCR-recognized text is consistent with the original image, and no manual proofreading is required.
  • the document recognition processing method provided by the present invention adopts the OCR text recognition method to recognize the document to be recognized in the input image, thereby obtaining the recognized document.
  • Using the keywords in the document to search to obtain the document provides convenience and realizes the fast search of the document.
  • by correcting the radian of the input image and recognizing and adjusting the fonts marked and referenced in the document errors in the process of converting the document to be recognized in the input image into editable electronic text are reduced, and the conversion accuracy is improved.
  • the present invention also provides an electronic device.
  • the electronic device includes a processor 301 , a communication interface 302 , a memory 303 and a communication bus 304 , wherein the processor 301 , the communication interface 302 , and the memory 303 pass through the communication bus 304 complete communication with each other;
  • the memory 303 is used to store computer programs
  • the processor 301 When the processor 301 is configured to execute the program stored in the memory 303, it can implement the steps in the text recognition method described above, or the steps in the image recognition classification method described above, or The steps in the document recognition processing method as described above are implemented.
  • the communication bus 304 mentioned in the above electronic device may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus or the like.
  • PCI peripheral component interconnect standard
  • EISA Extended Industry Standard Architecture
  • the communication bus 304 can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
  • the communication interface 302 is used for communication between the above-mentioned electronic device and other devices.
  • the so-called processor 301 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processors, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor can be a microprocessor or the processor can also be any conventional processor, etc.
  • the processor 301 is the control center of the electronic device, and uses various interfaces and lines to connect various parts of the entire electronic device.
  • the memory 303 can be used to store the computer program, and the processor 301 implements various functions of the electronic device by running or executing the computer program stored in the memory 303 and calling the data stored in the memory 303. Function.
  • the memory 303 may include non-volatile and/or volatile memory.
  • Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • the present invention also provides a computer-readable storage medium, where instructions are stored on the computer-readable storage medium, and when the instructions are executed, the steps in the text recognition method described above can be implemented, or the The steps in the image recognition classification method as described above, or the steps in the document recognition processing method as described above are implemented.
  • computer-readable storage media in embodiments of the present invention may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. It should be noted that computer-readable storage media described herein are intended to include, but not be limited to, these and any other suitable types of memory.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logic for implementing the specified logic Executable instructions for the function.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.
  • the various example embodiments of the invention may be implemented in hardware or special purpose circuits, software, firmware, logic, or any combination thereof. Certain aspects may be implemented in hardware, while other aspects may be implemented in firmware or software that may be executed by a controller, microprocessor or other computing device. While aspects of the embodiments of the invention are illustrated or described as block diagrams, flowcharts, or using some other graphical representation, it is to be understood that the blocks, apparatus, systems, techniques or methods described herein may be taken as non-limiting Examples are implemented in hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controllers or other computing devices, or some combination thereof.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Character Discrimination (AREA)

Abstract

Procédé de reconnaissance de texte, procédé de classification de reconnaissance d'image et procédé de traitement de reconnaissance de document. Le procédé de reconnaissance de texte consiste : pendant une reconnaissance de texte, tout d'abord à étiqueter, à l'aide de blocs de ligne de texte universels, des lignes de texte dans un texte, devant être reconnu, dans une image de texte ; à reconnaître ensuite chaque ligne de texte à l'aide d'un modèle de reconnaissance de caractères, de façon à obtenir un résultat de reconnaissance préliminaire dudit texte ; puis à reconnaître des types de langue du résultat de reconnaissance préliminaire, et à appeler un modèle de reconnaissance de langue correspondant en fonction des types de langue reconnus, afin de reconnaître en outre une partie de caractère correspondant au type de langue, de façon à obtenir un résultat de reconnaissance de caractère optimisé. Grâce au procédé, après l'obtention du résultat de reconnaissance préliminaire dudit texte, un modèle de reconnaissance de langue séparé est en outre utilisé pour une reconnaissance de précision en fonction des types de langue impliqués dans le résultat de reconnaissance préliminaire, de telle sorte que la précision de la reconnaissance de texte est améliorée.
PCT/CN2021/117222 2020-09-15 2021-09-08 Procédé de reconnaissance de texte, procédé de classification de reconnaissance d'image et procédé de traitement de reconnaissance de document WO2022057707A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010968750.3 2020-09-15
CN202010968750.3A CN112101367A (zh) 2020-09-15 2020-09-15 文本识别方法、图像识别分类方法、文档识别处理方法

Publications (1)

Publication Number Publication Date
WO2022057707A1 true WO2022057707A1 (fr) 2022-03-24

Family

ID=73759143

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/117222 WO2022057707A1 (fr) 2020-09-15 2021-09-08 Procédé de reconnaissance de texte, procédé de classification de reconnaissance d'image et procédé de traitement de reconnaissance de document

Country Status (2)

Country Link
CN (1) CN112101367A (fr)
WO (1) WO2022057707A1 (fr)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112101367A (zh) * 2020-09-15 2020-12-18 杭州睿琪软件有限公司 文本识别方法、图像识别分类方法、文档识别处理方法
CN113420622A (zh) * 2021-06-09 2021-09-21 四川百川四维信息技术有限公司 基于机器深度学习的智能扫描识别归档系统
CN113254595B (zh) * 2021-06-22 2021-10-22 北京沃丰时代数据科技有限公司 闲聊识别方法、装置、电子设备及存储介质
CN113792659B (zh) * 2021-09-15 2024-04-05 上海金仕达软件科技股份有限公司 文档识别方法、装置及电子设备
CN114173019B (zh) * 2021-12-23 2023-12-01 青岛黄海学院 一种多功能档案扫描装置及其工作方法
CN114267046A (zh) * 2021-12-31 2022-04-01 上海合合信息科技股份有限公司 一种文档图像的方向校正方法与装置
CN114419636A (zh) * 2022-01-10 2022-04-29 北京百度网讯科技有限公司 文本识别方法、装置、设备以及存储介质
CN114596566B (zh) * 2022-04-18 2022-08-02 腾讯科技(深圳)有限公司 文本识别方法及相关装置
CN115205868B (zh) * 2022-06-24 2023-05-05 荣耀终端有限公司 一种图像校验方法
CN117932657B (zh) * 2023-12-18 2024-08-20 深圳安科百腾科技有限公司 一种相册智能整理和隐私保护方法
CN117593752B (zh) * 2024-01-18 2024-04-09 星云海数字科技股份有限公司 一种pdf文档录入方法、系统、存储介质及电子设备

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150278658A1 (en) * 2014-03-31 2015-10-01 Kyocera Document Solutions Inc. Image Forming Apparatus Capable of Changing Image Data into Document Data, an Image Forming System, and an Image Forming Method
US20160034559A1 (en) * 2014-07-31 2016-02-04 Samsung Electronics Co., Ltd. Method and device for classifying content
US20160092730A1 (en) * 2014-09-30 2016-03-31 Abbyy Development Llc Content-based document image classification
US20160189008A1 (en) * 2014-12-31 2016-06-30 Xiaomi Inc. Methods and deivces for classifying pictures
WO2019012570A1 (fr) * 2017-07-08 2019-01-17 ファーストアカウンティング株式会社 Système et procédé de classification de documents, et système et procédé de comptabilité
CN110569830A (zh) * 2019-08-01 2019-12-13 平安科技(深圳)有限公司 多语言文本识别方法、装置、计算机设备及存储介质
CN110766020A (zh) * 2019-10-30 2020-02-07 哈尔滨工业大学 一种面向多语种自然场景文本检测与识别的系统及方法
CN112101367A (zh) * 2020-09-15 2020-12-18 杭州睿琪软件有限公司 文本识别方法、图像识别分类方法、文档识别处理方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7106905B2 (en) * 2002-08-23 2006-09-12 Hewlett-Packard Development Company, L.P. Systems and methods for processing text-based electronic documents
US8588528B2 (en) * 2009-06-23 2013-11-19 K-Nfb Reading Technology, Inc. Systems and methods for displaying scanned images with overlaid text

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150278658A1 (en) * 2014-03-31 2015-10-01 Kyocera Document Solutions Inc. Image Forming Apparatus Capable of Changing Image Data into Document Data, an Image Forming System, and an Image Forming Method
US20160034559A1 (en) * 2014-07-31 2016-02-04 Samsung Electronics Co., Ltd. Method and device for classifying content
US20160092730A1 (en) * 2014-09-30 2016-03-31 Abbyy Development Llc Content-based document image classification
US20160189008A1 (en) * 2014-12-31 2016-06-30 Xiaomi Inc. Methods and deivces for classifying pictures
WO2019012570A1 (fr) * 2017-07-08 2019-01-17 ファーストアカウンティング株式会社 Système et procédé de classification de documents, et système et procédé de comptabilité
CN110569830A (zh) * 2019-08-01 2019-12-13 平安科技(深圳)有限公司 多语言文本识别方法、装置、计算机设备及存储介质
CN110766020A (zh) * 2019-10-30 2020-02-07 哈尔滨工业大学 一种面向多语种自然场景文本检测与识别的系统及方法
CN112101367A (zh) * 2020-09-15 2020-12-18 杭州睿琪软件有限公司 文本识别方法、图像识别分类方法、文档识别处理方法

Also Published As

Publication number Publication date
CN112101367A (zh) 2020-12-18

Similar Documents

Publication Publication Date Title
WO2022057707A1 (fr) Procédé de reconnaissance de texte, procédé de classification de reconnaissance d'image et procédé de traitement de reconnaissance de document
US11645826B2 (en) Generating searchable text for documents portrayed in a repository of digital images utilizing orientation and text prediction neural networks
CN109766438B (zh) 简历信息提取方法、装置、计算机设备和存储介质
US9626555B2 (en) Content-based document image classification
US8340425B2 (en) Optical character recognition with two-pass zoning
CN112508011A (zh) 一种基于神经网络的ocr识别方法及设备
CN110909123B (zh) 一种数据提取方法、装置、终端设备及存储介质
WO2014086277A1 (fr) Ordinateur portable professionnel commode pour une électronisation et procédé pour identifier automatiquement un numéro de page de celui-ci
CN111914597B (zh) 一种文档对照识别方法、装置、电子设备和可读存储介质
US11379690B2 (en) System to extract information from documents
US8953228B1 (en) Automatic assignment of note attributes using partial image recognition results
Isheawy et al. Optical character recognition (OCR) system
CN114021543B (zh) 基于表格结构解析的文档比对分析方法及系统
WO2022161293A1 (fr) Procédé et appareil de traitement d'image, ainsi que dispositif électronique et support de stockage
CN112132710A (zh) 法律要素处理方法、装置、电子设备及存储介质
CN111783710A (zh) 医药影印件的信息提取方法和系统
CN110889341A (zh) 基于ai的表单图像识别方法、装置、计算机设备和存储介质
CN111357015B (zh) 文本转换方法、装置、计算机设备和计算机可读存储介质
US20220269898A1 (en) Information processing device, information processing system, information processing method, and non-transitory computer readable medium
US11881041B2 (en) Automated categorization and processing of document images of varying degrees of quality
US20220343663A1 (en) Methods and systems for performing on-device image to text conversion
CN115116079A (zh) 一种基于图像的公文要素信息抽取方法及装置
Tang The Field of Intelligent Recognition that be Advance by Machine Learning
TWI648685B (zh) 自動化辨識表單並建立動態表單之系統及其方法
CN113362026B (zh) 文本处理方法及装置

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21868532

Country of ref document: EP

Kind code of ref document: A1