WO2022057707A1

WO2022057707A1 - Text recognition method, image recognition classification method, and document recognition processing method

Info

Publication number: WO2022057707A1
Application number: PCT/CN2021/117222
Authority: WO
Inventors: 徐青松; 李青
Original assignee: 杭州睿琪软件有限公司
Priority date: 2020-09-15
Filing date: 2021-09-08
Publication date: 2022-03-24
Also published as: CN112101367A

Abstract

A text recognition method, an image recognition classification method, and a document recognition processing method. The text recognition method comprises: during text recognition, first labeling, with universal text line boxes, text lines in text, which is to be recognized, in a text image; next, recognizing each text line by using a character recognition model, so as to obtain a preliminary recognition result of said text; and then recognizing language types of the preliminary recognition result, and calling a corresponding language recognition model according to the recognized language types, to further recognize a character part corresponding to the language type, so as to obtain an optimized character recognition result. By means of the method, after the preliminary recognition result of said text is obtained, a separate language recognition model is further used for precision recognition according to the language types involved in the preliminary recognition result, such that the accuracy of text recognition is improved.

Description

Text recognition method, image recognition classification method, document recognition processing method

technical field

The present invention relates to the technical field of machine learning, and in particular, to a text recognition method, an image recognition classification method, a document recognition processing method, an electronic device, and a computer-readable storage medium.

Background technique

OCR (Optical Character Recognition, Optical Character Recognition) refers to an electronic device (such as a scanner or digital camera) that examines characters printed on paper, determines its shape by detecting dark and light patterns, and then uses character recognition methods to translate the shape into a computer The process of text; that is, for printed characters, the text in the paper document is optically converted into a black and white dot matrix image file, and the text in the image is converted into a text format by the recognition software for the word processing software. Editing techniques.

In OCR recognition, the recognition model can usually be used to recognize the characters in the document. However, the same model cannot be used for the recognition of documents in different languages. It is necessary to know the language of the document before calling the corresponding recognition model. If it is a mixed language It is more difficult to identify the documents of different languages. It can be seen that the existing OCR recognition technology has the problem of low text recognition accuracy for documents in different languages.

In addition, there is also a problem that the recognized documents cannot be effectively classified, resulting in a messy management of the recognized documents and inconvenient to find; due to the problems such as curves and radians of the documents to be recognized, the typesetting after recognition is inconsistent with the original documents. situation, even garbled characters appear.

SUMMARY OF THE INVENTION

The purpose of the present invention is to provide a text recognition method, an image recognition classification method, a document recognition processing method, an electronic device, and a computer-readable storage medium. The specific technical solutions are as follows:

In order to achieve the above object, the present invention provides a text recognition method, comprising:

Recognizing text lines in the text to be recognized in the text image, and marking each of the text lines with a general text line frame;

Adopt character recognition model to recognize the character in each described text line, obtain the preliminary recognition result of described text to be recognized;

Use a language classification model to perform language recognition on the preliminary recognition results, obtain the language types involved in the preliminary recognition results, and divide the preliminary recognition results into a plurality of different character parts according to the language types;

The corresponding language recognition model is called according to the language type, and the corresponding character part is recognized to obtain the target recognition result of the text to be recognized.

Optionally, in the above text recognition method, the method further includes: recognizing the direction of the text to be recognized in the text image, and if the direction does not meet a preset condition, correcting the direction of the text to be recognized;

Wherein, the identifying the direction of the text to be identified in the text image includes:

A direction recognition model is used to recognize the direction of the text to be recognized in the text image, and the direction recognition model is a CNN-based neural network model.

Optionally, in the above text recognition method, the character recognition model is a neural network model based on the CTC connectionism time classification technology and the Attention mechanism.

Optionally, in the above text recognition method, the character recognition model is obtained by training a training sample set including the CJK character set and the ISO8859 1-16 character set.

Optionally, in the above text recognition method, the language classification model is a fasttext<N-Gram> language classification model based on the wiki data set.

Based on the same inventive concept, the present invention also provides an image recognition and classification method, including:

Use the image recognition model to recognize the images to be classified, and identify text images or non-text images;

Use the text recognition method as described above to recognize the text in the textual image or the non-textual image, and obtain the text recognition result of the textual image or the non-textual image;

A keyword is determined according to the text recognition result, a first subdivision type of the content of the textual image or a second subdivision type of the content of the non-textual image is determined according to the keyword, and the text Class images are classified into folders corresponding to the first subdivision type, and the non-text class images are classified into folders corresponding to the second subdivision type.

Optionally, in the above-mentioned image recognition and classification method, after determining the keyword, the method further includes:

The textual image or the non-textual image is automatically named by using the keyword.

Optionally, in the above-mentioned image recognition classification method, after recognizing a textual image or a non-textual image, it also includes:

classifying the textual images into a textual image folder, and classifying the non-textual images into a non-textual image folder;

Correspondingly, classifying the textual image into a folder corresponding to the first subdivision type, and classifying the non-textual image into a folder corresponding to the second subdivision type, include:

Classifying the textual images in the textual image folder into a folder corresponding to the first subdivision type, and classifying the non-textual images in the non-textual image folder into the folder corresponding to the second subdivision type.

Optionally, in the above image recognition classification method, the first subdivision type includes: one or more of notes, certificates, receipts, screenshots, documents, and certificates.

Optionally, in the above image recognition classification method, for the identified non-text images, the image recognition model identifies the content in the non-text images;

The image recognition and classification method further includes:

The second subdivision type is determined according to the content of the non-text image, and the non-text image is classified into a folder corresponding to the second subdivision type.

Optionally, in the above image recognition and classification method, after recognizing the content in the non-text image, the method further includes:

The non-text images are automatically named according to the content in the non-text images.

Optionally, in the above image recognition and classification method, after classifying the textual images in the textual image folder into a folder corresponding to the first subdivision type, the method further includes:

In response to an operation of the user inputting a search term, it is searched whether there is a keyword matching the search term, and if so, a text image corresponding to the matched keyword is output.

In response to a user's printing operation, according to a pre-configured one-key import function, import all text-based images in the folder corresponding to the first subdivision type for printing.

Optionally, in the above image recognition and classification method, before executing printing, the method further includes:

If there is a text image that needs to be signed in all the imported text images, the signature is performed in the preset signature area in the text image that needs to be signed;

And/or, if there is a defective text-based image in all the imported text-based images, filter processing is performed on the defective text-based image.

Based on the same inventive concept, the present invention also provides a document identification processing method, including:

obtaining an input image, which contains the original document to be identified;

Using the text recognition method as described above to recognize the original document in the input image to obtain a character recognition result of the original document;

According to the position information of each character of the original document in the input image, the character recognition results of the original document are arranged to obtain a recognized document.

Optionally, in the above document recognition processing method, according to the position information of each character of the original document in the input image, the character recognition results of the original document are arranged to obtain a recognized document, including:

According to the position information of each character of the original document in the input image, the original text in the original document is replaced by the character recognition result of the original document to obtain a recognized document.

Optionally, in the above document identification processing method, after obtaining the identified document, the method further includes:

The original document is compared with the identification document to determine whether there is a difference between the identification document and the original document, and if there is, the difference is corrected in the identification document.

Optionally, in the above document recognition processing method, before the input image is recognized, the method further includes:

A correction model is used to identify the radian of the curve of the original document in the input image, and if the radian of the curve satisfies a preset correction condition, a correction process is performed on the original document in the input image to remove the radian of the original document. Curve radian.

Identify the input image by using an annotation recognition model to identify the annotation content in the original document;

In the recognized document, the character recognition result corresponding to the marked content is typeset into a format consistent with the original document.

The present invention also provides an electronic device, comprising a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory communicate with each other through the communication bus;

the memory for storing computer programs;

The processor, when executing the program stored in the memory, implements the steps in the text recognition method as described above, or implements the steps in the image recognition and classification method as described above, or implements the steps in the above-described image recognition and classification method. Steps in the described document identification processing method.

The present invention also provides a computer-readable storage medium on which instructions are stored, and when the instructions are executed, implement the steps in the text recognition method as described above, or implement the above-described steps in the text recognition method. The steps in the image recognition classification method described, or the steps in the document recognition processing method as described above are implemented.

Compared with the prior art, the text recognition method, image recognition classification method, document recognition processing method, electronic device, and computer-readable storage medium provided by the present invention have the following advantages:

In the text recognition method and the corresponding electronic device and computer-readable storage medium provided by the present invention, when performing text recognition, firstly, the text lines in the text to be recognized are marked with a general text line box, and then a character recognition model is used to identify each text line. Recognition is performed to obtain the preliminary recognition result of the text to be recognized, and then the language type is recognized on the preliminary recognition result, and the corresponding language recognition model is called according to the recognized language type to further recognize the character part corresponding to the language type. After optimization character recognition results. In this embodiment, after obtaining the preliminary recognition result of the text to be recognized, a separate language recognition model is used for accurate recognition according to the language type involved, thereby improving the accuracy of text recognition.

The image recognition and classification method provided by the present invention, the corresponding electronic equipment, and the computer-readable storage medium can use the above-mentioned OCR text recognition method for text recognition for both textual images and non-textual images, and obtain the The text recognition results of the textual images and the non-textual images, and the keywords are determined according to the text recognition results to classify the textual images and the non-textual images, because the image classification is performed according to the text content in the images. , the classification result is more accurate, and at the same time, the determined keywords provide convenience for the subsequent use of keywords to search for images, and realize fast search of images. In addition, non-text images can also be classified by image content, which also improves the accuracy of the classification results.

The document recognition processing method and the corresponding electronic equipment and computer-readable storage medium provided by the present invention use the OCR text recognition method to recognize the to-be-recognized document in the input image, so as to obtain the recognized document, because the uneditable document is converted into The editable document provides convenience for the subsequent use of keywords in the document to obtain the document, and realizes the fast search of the document. In addition, by correcting the radian of the input image and recognizing and adjusting the fonts marked and referenced in the document, errors in the process of converting the document to be recognized in the input image into editable electronic text are reduced, and the conversion accuracy is improved.

Description of drawings

In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.

1 is a schematic flowchart of a text recognition method provided by an embodiment of the present invention;

2 is a schematic flowchart of an image recognition and classification method provided by an embodiment of the present invention;

Fig. 3 is an example diagram of image recognition classification display;

4 is a schematic flowchart of a document identification processing method provided by an embodiment of the present invention;

Figure 5a is an example diagram of an input image containing an original document;

Figure 5b is an example diagram of a recognized document obtained after the method of the present invention is used to recognize the input image shown in Figure 5a;

Figure 6a is another example diagram of an input image containing an original document;

Figure 6b is an example diagram of a recognized document obtained after the input image shown in Figure 6a is recognized by an existing method;

Figure 6c is an example diagram of a recognized document obtained after the method of the present invention is used to recognize the input image shown in Figure 6a;

FIG. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention.

detailed description

A text recognition method, an image recognition classification method, a document recognition processing method, an electronic device, and a computer-readable storage medium proposed by the present invention will be described in further detail below in conjunction with the accompanying drawings and specific embodiments. The advantages and features of the present invention will become more apparent from the following description. It should be noted that, the accompanying drawings are all in a very simplified form and in inaccurate scales, and are only used to facilitate and clearly assist the purpose of explaining the embodiments of the present invention. It should be noted that the structures, proportions, sizes, etc. shown in the drawings in this specification are only used to cooperate with the contents disclosed in the specification, so as to be understood and read by those who are familiar with the technology, and are not used to limit the implementation of the present invention. Therefore, it does not have technical substantive significance, and any modification of structure, change of proportional relationship or adjustment of size should still fall within the scope of the present invention without affecting the effect that the present invention can produce and the purpose that can be achieved. The scope of the disclosed technical content can be covered.

In order to solve the problems in the prior art, the present invention provides a text recognition method. FIG. 1 shows a flowchart of a text recognition method according to an exemplary embodiment of the present invention. The method can be implemented in an application program (app) installed on a smart terminal such as a mobile phone and a tablet computer. As shown in Figure 1, the method may include:

Step S101 , identifying text lines in the text to be identified in the text image, and marking each of the text lines with a general text line box.

In the present invention, a text image refers to an image whose image content is mainly text, such as a business card image, a document image, a certificate image, a certificate image, and a note image, which may be an image obtained by taking pictures of the text, or it may be Image of scanned text. For example, the note image may be an image obtained by taking pictures of the handwritten font text content on the paper.

Generally speaking, the text to be recognized in the text image includes one or more text lines. The present invention uses the text OCR recognition method for text recognition. During recognition, each text line is recognized separately. Finally, the recognition results of the entire text to be recognized are obtained by combining the recognition results of all text lines. Therefore, during recognition, each text line in the text to be recognized in the text image needs to be recognized, and at the same time, each text line is marked with a general text line box.

It should be noted that when recognizing a text line, the language in the text line is not limited, but is only processed according to the word line, that is, when the characters in a text line have multiple language types, as long as these characters are located in the In the same text line, it is marked in the same general text line box.

It should be noted that there may be multiple documents in a picture, for example, a text image has the front and back sides of the ID card, and these two documents need to be identified separately, so before step S101 is executed, it is also possible to identify the The document area in the text image (that is, the area where the text to be recognized is located), and the document area is sliced. For example, it can be sliced through the callout box, or the edge of the document area can be identified by the edge recognition method, and then sliced according to the edge. .

Preferably, before step S101 is performed, the method further includes: recognizing the direction of the text to be recognized in the text image, and if the direction does not meet a preset condition, correcting the direction of the text to be recognized.

It is understandable that, before recognizing the text to be recognized in the text image, it is necessary to ensure that the direction of the text to be recognized in the text image satisfies a preset condition, for example, to ensure that the characters in the text line of the text to be recognized are in the text image. are arranged along a certain reference direction. Therefore, it is first necessary to correct the direction of the text to be recognized in the text image. Specifically, a direction recognition model may be used to recognize the direction of the text to be recognized in the text image, and the direction recognition model may be a CNN-based neural network model.

The reference direction may be set as a positive direction along the horizontal direction. The direction recognition model can identify the angle between the arrangement direction of the characters in the text line and the positive horizontal direction in the text image. If the angle is 0, no correction is required. If the angle is not 0, the Text images are corrected. The method of the correction processing is to invert the text image, so that the included angle between the characters in the text line of the text to be recognized and the horizontal positive direction in the text image is 0. In this embodiment, it may be considered that the direction to the right along the horizontal direction is the positive horizontal direction. In other embodiments, other directions may also be set as the positive direction, which is not limited in the present invention.

The correction processing method can also be based on the average slope of a plurality of text lines as a correction reference, or other correction methods, which are not limited in the present invention.

In step S102, a character recognition model is used to recognize characters in each of the text lines, and a preliminary recognition result of the text to be recognized is obtained.

In this embodiment, the character recognition model is an All in one model, which is obtained by training with multiple character sets, such as CJK character set and ISO8859 1-16 character set, etc. Therefore, the character recognition model can support CJK and Recognition of Latin fonts. The character recognition model is a neural network model based on the CTC connectionism time classification technology and the Attention mechanism. Inputting each text line into the character recognition model respectively, the character recognition model can output the character recognition result of the text line, and then the character recognition result of the text to be recognized can be obtained in combination with the character recognition results of each text line, as a preliminary identification result.

Connectionist Temporal Classification (CTC) is a time series classification algorithm that does not have strict alignment information between data units and annotation units. This algorithm is currently widely used in optical character recognition (OCR) and speech recognition. CTC The main function of the model is to construct a loss function for the sequence, and in the process of backpropagation, the gradient determined according to the loss function is returned to the previous layer to complete the training of the CTC model.

Attention attention mechanism has a huge improvement effect on sequence learning tasks. In the codec framework, by adding A model to the encoding segment, the data weighting transformation is performed on the source data sequence, or the A model is introduced at the decoding end to improve the target data. Making weighted changes can effectively improve the system performance in a sequence-to-sequence natural way.

The invention adopts the combination of CTC connectionism time classification technology and Attention mechanism to construct a character recognition model, which can improve the accuracy of character recognition.

Step S103, using a language classification model to perform language identification on the preliminary identification result, obtain the language types involved in the preliminary identification result, and divide the preliminary identification result into a plurality of different character parts according to the language type.

Since the character recognition model used in step S102 is obtained by training character sets of multiple different languages, the accuracy of the character recognition model for the character recognition results in the text line is not high, so it is necessary to carry out the preliminary recognition results. Optimization, further recognize characters of different languages in the characters to improve the accuracy of character recognition.

First, a language classification model is used to perform language recognition on the preliminary recognition results, and the language types involved in the preliminary recognition results are obtained, wherein the langid technology can be used to identify language types (ie, language types), and the language classification model is based on A fasttext<N-Gram> language classification model for the wiki dataset.

fasttext is a word vector and text classification tool. The typical application scenario is "supervised text classification problem". It provides a simple and efficient method for text classification and representation learning, and its performance is faster than that of deep learning.

N-Gram is a language model commonly used in large-vocabulary continuous language recognition. For Chinese, it can be called Chinese Language Model (CLM, Chinese Language Model). It uses the collocation information between adjacent words in the context to Realize automatic conversion to Chinese characters. Specifically, using the collocation information between adjacent words in the context, when it is necessary to convert continuous pinyin without spaces, strokes, or numbers representing letters or strokes into Chinese character strings (ie, sentences), the maximum probability can be calculated. sentence, so as to realize automatic conversion to Chinese characters, without the need for manual selection by the user, avoiding the problem of repeated codes that many Chinese characters correspond to the same pinyin (or stroke string, or number string).

Using the above-mentioned language classification model to perform language recognition on the preliminary recognition results can more accurately obtain the language types involved in the preliminary recognition results. After the language type is recognized, the preliminary recognition result can be divided into a plurality of different character parts, that is, the characters of each language type are divided into the same character part.

Step S104 , calling a corresponding language recognition model according to the language type to recognize the corresponding character part, and obtain the target recognition result of the text to be recognized.

In this embodiment, each language type has a corresponding language recognition model. After obtaining the language types involved in the to-be-recognized text and the character parts corresponding to each language type in step S103, call the corresponding language recognition model for the corresponding After the character parts are recognized, more accurate character recognition results of each character part can be obtained, and then the target recognition results of the text to be recognized can be obtained.

To sum up, in the text recognition method provided by the present invention, when performing text recognition, the text line in the text to be recognized is first marked with a general text line frame, and then the character recognition model is used to recognize each text line to obtain the text to be recognized. Then, the language type is recognized on the preliminary recognition result, and the corresponding language recognition model is called according to the recognized language type to further recognize the character part corresponding to the language type, and the optimized character recognition result is obtained. In this embodiment, after obtaining the preliminary recognition result of the text to be recognized, a separate language recognition model is used for accurate recognition according to the language type involved, thereby improving the accuracy of text recognition.

On the basis of the above text recognition method, the present invention also proposes an image recognition and classification method, which is used to classify and organize a large number of images, and classify images with similar content into the same folder, so as to facilitate users to consult and search .

As shown in Figure 2, the image recognition and classification method includes the following steps:

In step S201, an image recognition model is used to recognize the image to be classified, and a textual image or a non-textual image is recognized.

In this embodiment, the image to be classified may be a newly captured image, or may be an image that has been captured and saved in a folder, such as an image saved in a mobile phone album. Text-based images refer to images whose image content is mainly text, such as business card images, document images, certificate images, certificate images, and note images, which can be images obtained by taking pictures of texts, or images of texts. Image obtained after scanning. For example, the note image may be an image obtained by taking pictures of the handwritten font text content on the paper. Non-text images refer to images whose content is mainly non-text, such as photos of people's lives, landscapes, and photos of animals and plants.

By recognizing the image to be classified by the image recognition model, it can be identified whether the image to be classified belongs to a textual image or a non-textual image, so that a textual image and a non-textual image can be classified.

After identifying and classifying textual images and non-textual images, the images are automatically classified and stored in different preset folders. That is, after recognizing the textual image, the textual image is classified into the textual image folder, and after recognizing the non-textual image, the non-textual image is classified into the non-textual image folder.

Step S202: Recognize the text in the textual image or the non-textual image to obtain a text recognition result of the textual image or the non-textual image.

Specifically, the text recognition method shown in FIG. 1 may be used to recognize the text in the textual image or the non-textual image. The specific identification process is not repeated here. Different pictures can also be classified according to the language type of the text recognition result.

Step S203: Determine a keyword according to the text recognition result, determine a first subdivision type of the content of the textual image or a second subdivision type of the non-textual image according to the keyword, and use the The textual images are classified into a folder corresponding to the first subdivision type, and the non-textual images are classified into a folder corresponding to the second subdivision type.

Specifically, a keyword classification model may be used to obtain keywords from the text recognition result, and then determine the first subdivision type of the content of the textual image or the first subdivision type of the content of the non-textual image according to the keyword the second subdivision type, and further classify the textual images into the folders corresponding to the first subdivision type, and classify the non-text images into the folders corresponding to the second subdivision type middle.

The first subdivision type includes, but is not limited to, one or more of: notes, certificates, receipts, screenshots, documents, and certificates.

For example, if the textual image is an ID card image, and the text recognition result contains words such as the characters "People's Republic of China Resident Identity Card", the keyword classification model can obtain the keyword "" from the text recognition result. ID card", thus it can be determined according to the keyword that the first subdivision type of the content of the textual image is a document image, and then the textual image can be classified into a document of the subdivision type of "document image" in the folder.

In addition, the subdivision type of "document image" can be further divided, for example, it can be further divided into various specific types including ID card, driver's license, passport, military officer's photo, work permit, birth certificate, household registration book and so on. Therefore, the specific type of the textual image can also be determined according to the keyword, and the textual image is further classified into a subfolder of the specific type under the folder corresponding to the first subdivision type. For example, for the textual image in the foregoing example, the keyword classification model can obtain the keyword "ID card" from the text recognition result, and the textual image can be further classified into the "document image" The subfolder of this specific type is "ID card" under the sub-type folder. It can be understood that, under the sub-folder of "document image", several specific types of subfolders can be set, such as ID card, driver's license, passport, military officer's photo, work permit, birth certificate, and household registration book.

Through the above method, the classified text images can be set into a file tree, in which each folder is named progressively layer by layer, so that each text image to be classified can be automatically classified into the corresponding folder. In addition, in order to facilitate the search of the textual image, the keyword may also be used to automatically name the textual image.

For example, you can classify all the images in the album as shown in Figure 3: first display All Documents (all files), then display Handwritten notes (note image), ID Card&Passport (document image), Receipt (receipt image), Screens (screenshot image), Certificate (certificate image), Other Card (other image), etc. Of course, this is just an example, and it can also be classified in other ways in practical applications. The classified images can be sorted according to the modified chronological order, or can be sorted according to the shooting chronological order, or the sorting method can also be set as required.

In practical applications, the user can search the classified text-based images according to keywords, so as to quickly find the target file. Specifically, in response to an operation of inputting a search term by the user, it is searched whether there is a keyword matching the search term, and if so, a text image corresponding to the keyword is output. For example, when the search term entered by the user is "identity", it searches whether there is a keyword matching the search term, if there is a matching keyword "identity card", then the keyword "identity card" corresponds to Textual image output is displayed to the user.

The second subdivision type may include: life photos of people, landscape photos, animal photos, plant photos, and the like.

For example, if the non-text image is a photo of Leifeng Pagoda, and the image contains three words "Leifeng Pagoda", the text recognition result of the non-text image is "Leifeng Pagoda", and then it can be determined that the keyword is Leifeng Pagoda Peak tower, thus it can be determined according to the keyword that the second subdivision type of the content of the non-text image is landscape photos, and then the non-text images can be classified into files of the subdivision type "landscape photos" in the folder.

In addition, the subdivision type "landscape photo" can be further divided, for example, the scenery photo can be further divided according to the name of the scenic spot. Therefore, the specific type of the non-text image can also be determined according to the identified name of the scenic spot, and the non-text image can be further classified into a subfolder of the specific type under the folder corresponding to the second subdivision type middle. For example, for the non-text image in the preceding example, since the non-text image is identified as a photo of Leifeng Pagoda, the non-text image can be further classified under the sub-category “landscape photos” "Photos of Leifeng Pagoda" in this specific subfolder. It can be understood that specific types of sub-folders corresponding to different scenic spots can be set under the sub-folder of “landscape photos”.

In other embodiments, for the recognized non-text image, when an image recognition model is used for recognition in step S201, the image recognition model can also recognize the content in the non-text image, so it can also The second subdivision type is determined according to the content of the non-text image, and the non-text image is classified into a folder corresponding to the second subdivision type. For example, if the image recognition model recognizes that the content displayed by the non-text image is Leifeng Pagoda, it can be determined that the second subdivision type of the content of the non-text image is landscape photos, and then the non-text image can be Images are grouped into sub-categories "Landscape".

Through the above method, the classified non-text images can be set into a file tree, in which each folder is named progressively, so that each to-be-classified non-text image can be automatically classified into a corresponding folder.

The non-text images may be automatically named according to the content of the non-text images. For example, the content of the non-text image may include the recognized names of animals and plants, the names of scenic spots, etc., so the non-text images can be automatically named according to the recognized names of animals and plants, the names of scenic spots, and the like. Or the non-text images are automatically named according to the keywords obtained by the keyword classification model. Through the automatic naming, the finding of the non-text images can be facilitated.

In addition, the classification of non-text images can also be classified according to the shooting time, location, relevance of people, and names.

Preferably, for the classified text images and non-text images, encryption processing can be performed to ensure the security of the files, for example, encryption processing is performed for important documents such as certificates, or encryption processing is performed for private life photos of people. When encrypting, you can encrypt a single file, or you can encrypt the corresponding folder.

Preferably, in order to facilitate the user's operation, when the user needs to print, the document to be printed can be imported with one key and related processing of the document can be performed according to the classification result. , you can search for pictures by keywords to import the pictures to be printed, and realize the printing function.

In addition, before performing printing, the method further includes: if there is a text image that needs to be signed in all the imported text images, signing in a preset signature area in the text image that needs to be signed; and/or, if If there are defective text images in all imported text images, filter the defective text images.

Specifically, a signature area is set for some documents to be signed, and the signature can be directly performed on the image, and the signed document is then printed.

Filter the defective image, such as the following:

a) Some text images have shadows due to light and other problems when they are shot. To ensure the effect, the shadows can be removed when printing;

b) Completion can be done for old and distorted photos;

c) For the handwritten characters in the text, smears, oil stains, etc. can be automatically removed during printing;

d) In order to save the ink consumption during printing, the image can also be binarized.

To sum up, in the image recognition and classification method provided by the present invention, for both textual images and non-textual images, the OCR text recognition method described above can be used for text recognition, and the textual images and the non-textual images can be obtained. The text recognition result of the textual image, and the keywords are determined according to the text recognition result to classify the textual image and the non-textual image. Since the image classification is performed according to the text content in the image, the classification result is more accurate, and at the same time The determined keywords provide convenience for subsequent use of keywords to search for images, thereby realizing fast image search. In addition, non-text images can also be classified by image content, which also improves the accuracy of the classification results.

On the basis of the above text recognition method, the present invention also proposes a document recognition processing method for converting different types of files, such as scanned files, PDF files or pictures, into texts that can be searched or edited at any time. When a user wants to find a file or picture, but does not remember the title, he can only think of a few words in the document. However, because the document is in an uneditable format, the document cannot be searched based on the words in the document. With the document identification processing method provided by the present invention, since an uneditable document is converted into an editable document, when searching according to the document content, the search can be carried out according to the document content or the text on the picture, that is, only the Enter keywords in the search box, no matter the title, content, remarks, or text on the picture, it can be intelligently searched.

As shown in Figure 4, the image recognition and classification method includes the following steps:

Step S301 , acquiring an input image, where the input image contains the original document to be recognized.

The type of the original document may be a paper document, the input image may be formed by taking a photo or scanning, and the type of the original document may also be an electronic document, such as a PDF document or a picture document with uneditable text, in this case the input image can be obtained directly.

Step S302: Recognize the original document in the input image to obtain a character recognition result of the original document.

Specifically, the text recognition method shown in FIG. 1 may be used to recognize the original document in the input image. The specific identification process is not repeated here.

Step S303: Arrange the character recognition results of the original document according to the position information of each character of the original document in the input image to obtain a recognized document.

Specifically, according to the position information of each character of the original document in the input image, the character recognition results of the original document are arranged to obtain a recognized document, including:

As shown in Figures 5a and 5b, Figure 5a shows the input image containing the original document, Figure 5b shows the finally obtained recognized document, it can be seen from Figures 5a and 5b that each character of the original document can be obtained during processing The coordinate information in the input image, so that after the character recognition result of the original document is obtained, each character is placed in the corresponding position in the input image according to the coordinate information of the characters to replace the characters in the original document, thereby Get the identification document.

As can be seen from the above, the use of OCR can convert the characters on the input image into editable characters, intelligent recognition, without manual typing input, can instantly convert PPT, PDF files, pictures, business cards, test papers, etc. into electronic manuscripts that can be edited and modified Identify documents. In order to ensure the accuracy of the converted characters, the original document may also be compared with the identification document to determine whether there is a difference between the identification document and the original document, and if so, compare the identification document to the identification document. Correct the difference. For example, a manual verification method can be used to compare the original document with the output editable electronic text of the identification document, and find out the difference between the editable electronic text and the original document during the conversion process.

Preferably, when scanning a thicker book, since the document is shot in the presence of radians, the original document in the acquired input image cannot be recognized due to problems such as radian, or the recognized recognized document may not be recognized. The output is garbled. In this case, it is necessary to correct the radian of the original document in the input image, and then perform text recognition and output on the input image after the correction and remove the radian, so as to avoid the occurrence of garbled characters. Specifically, a correction model may be used to identify the radian of the curve of the original document in the input image, and if the radian of the curve satisfies a preset correction condition, the original document in the input image is corrected to remove all The radian of the curve of the original document can be corrected manually in practical applications, or other correction methods can be used.

As shown in Figures 6a and 6b, for the fonts with annotation references in the original document (the font size of which is generally smaller than the text), after the current document recognition processing method is recognized, the content of the annotation will be inconsistent with the original document, such as In Figures 6a and 6b, the contents of the boxes are shown in several places, which requires the user to check one by one and modify them manually, which greatly reduces the efficiency. In this case, the present invention uses an annotation recognition model to recognize the input image to recognize the annotation content in the original document, and in the recognized document, the character recognition result corresponding to the annotation content is typeset into Format consistent with the original document. The invention recognizes the input image through the label recognition model, distinguishes the label content from the characters of the original document, and outputs the label content not in the same text form as other character content, but in a form consistent with the original document. . Fig. 6c shows the identification document processed by the method of the present invention. It can be seen from Figs. 6a and 6c that in the process of OCR identification, the label is automatically identified by the label recognition model, and then the identification document is checked according to the identification result. After that, it is automatically typeset into a format consistent with the original text, so that the OCR-recognized text is consistent with the original image, and no manual proofreading is required.

To sum up, the document recognition processing method provided by the present invention adopts the OCR text recognition method to recognize the document to be recognized in the input image, thereby obtaining the recognized document. Using the keywords in the document to search to obtain the document provides convenience and realizes the fast search of the document. In addition, by correcting the radian of the input image and recognizing and adjusting the fonts marked and referenced in the document, errors in the process of converting the document to be recognized in the input image into editable electronic text are reduced, and the conversion accuracy is improved.

Based on the same inventive concept, the present invention also provides an electronic device. As shown in FIG. 7 , the electronic device includes a processor 301 , a communication interface 302 , a memory 303 and a communication bus 304 , wherein the processor 301 , the communication interface 302 , and the memory 303 pass through the communication bus 304 complete communication with each other;

The memory 303 is used to store computer programs;

When the processor 301 is configured to execute the program stored in the memory 303, it can implement the steps in the text recognition method described above, or the steps in the image recognition classification method described above, or The steps in the document recognition processing method as described above are implemented.

The communication bus 304 mentioned in the above electronic device may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus or the like. The communication bus 304 can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.

The communication interface 302 is used for communication between the above-mentioned electronic device and other devices.

The so-called processor 301 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processors, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor can be a microprocessor or the processor can also be any conventional processor, etc. The processor 301 is the control center of the electronic device, and uses various interfaces and lines to connect various parts of the entire electronic device.

The memory 303 can be used to store the computer program, and the processor 301 implements various functions of the electronic device by running or executing the computer program stored in the memory 303 and calling the data stored in the memory 303. Function.

The memory 303 may include non-volatile and/or volatile memory. Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Based on the same inventive concept, the present invention also provides a computer-readable storage medium, where instructions are stored on the computer-readable storage medium, and when the instructions are executed, the steps in the text recognition method described above can be implemented, or the The steps in the image recognition classification method as described above, or the steps in the document recognition processing method as described above are implemented.

Similarly, computer-readable storage media in embodiments of the present invention may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. It should be noted that computer-readable storage media described herein are intended to include, but not be limited to, these and any other suitable types of memory.

It should be noted that the flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functions and operations of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logic for implementing the specified logic Executable instructions for the function. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.

In general, the various example embodiments of the invention may be implemented in hardware or special purpose circuits, software, firmware, logic, or any combination thereof. Certain aspects may be implemented in hardware, while other aspects may be implemented in firmware or software that may be executed by a controller, microprocessor or other computing device. While aspects of the embodiments of the invention are illustrated or described as block diagrams, flowcharts, or using some other graphical representation, it is to be understood that the blocks, apparatus, systems, techniques or methods described herein may be taken as non-limiting Examples are implemented in hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controllers or other computing devices, or some combination thereof.

It should be noted that each embodiment in this specification is described in a related manner, and the same and similar parts between the various embodiments can be referred to each other, and each embodiment focuses on the differences from other embodiments. . In particular, for electronic devices and computer-readable storage media, since they are basically similar to the method embodiments, the description is relatively simple, and reference may be made to some descriptions of the method embodiments for related parts.

In this document, relational terms such as first and second, etc. are used only to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any such existence between these entities or operations. The actual relationship or sequence. Moreover, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion such that a process, method, article or device comprising a list of elements includes not only those elements, but also includes not explicitly listed or other elements inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

The above description is only a description of the preferred embodiments of the present invention, and is not intended to limit the scope of the present invention. Any changes and modifications made by those of ordinary skill in the field of the present invention based on the above disclosure all belong to the protection scope of the claims.

Claims

A text recognition method, comprising:

Recognizing text lines in the text to be recognized in the text image, and marking each of the text lines with a general text line frame;

A character recognition model is used to recognize characters in each of the text lines, and a preliminary recognition result of the text to be recognized is obtained;

Use a language classification model to perform language recognition on the preliminary recognition results, obtain the language types involved in the preliminary recognition results, and divide the preliminary recognition results into a plurality of different character parts according to the language types;

The corresponding language recognition model is called according to the language type, and the corresponding character part is recognized to obtain the target recognition result of the text to be recognized.
The text recognition method according to claim 1, further comprising: recognizing the direction of the text to be recognized in the text image, and if the direction does not meet a preset condition, correcting the direction of the text to be recognized;

Wherein, the identifying the direction of the text to be identified in the text image includes:

A direction recognition model is used to recognize the direction of the text to be recognized in the text image, and the direction recognition model is a CNN-based neural network model.
The text recognition method according to claim 1, wherein the character recognition model is a neural network model based on the CTC connectionism time classification technology and the Attention mechanism.
The text recognition method of claim 1, wherein the character recognition model is obtained by training a training sample set comprising a CJK character set and an ISO8859 1-16 character set.
The text recognition method according to claim 1, wherein the language classification model is a fasttext<N-Gram> language classification model based on the wiki data set.
A method for image recognition and classification, comprising:

Use the image recognition model to recognize the images to be classified, and identify text images or non-text images;

Using the text recognition method according to any one of claims 1 to 5 to recognize the text in the textual image or the non-textual image, to obtain the text of the textual image or the non-textual image identification results;

A keyword is determined according to the text recognition result, a first subdivision type of the content of the textual image or a second subdivision type of the content of the non-textual image is determined according to the keyword, and the text Class images are classified into folders corresponding to the first subdivision type, and the non-text class images are classified into folders corresponding to the second subdivision type.
The image recognition and classification method according to claim 6, wherein after determining the keyword, it further comprises:

The textual image or the non-textual image is automatically named by using the keyword.
The image recognition and classification method according to claim 6, wherein after recognizing the textual image or the non-textual image, the method further comprises:

classifying the textual images into a textual image folder, and classifying the non-textual images into a non-textual image folder;

Wherein, classifying the textual images into a folder corresponding to the first subdivision type, and classifying the non-textual images into a folder corresponding to the second subdivision type includes: :

Classifying the textual images in the textual image folder into a folder corresponding to the first subdivision type, and classifying the non-textual images in the non-textual image folder into the folder corresponding to the second subdivision type.
The image recognition classification method according to claim 6, wherein the first subdivision type comprises: one or more of notes, certificates, receipts, screenshots, documents, and certificates.
The image recognition and classification method according to claim 6, wherein, for the recognized non-text image, the image recognition model recognizes the content in the non-text image;

The image recognition and classification method further includes:

The second subdivision type is determined according to the content of the non-text image, and the non-text image is classified into a folder corresponding to the second subdivision type.
The image recognition and classification method according to claim 10, wherein after recognizing the content in the non-text image, the method further comprises:

The non-text images are automatically named according to the content in the non-text images.
The image recognition and classification method according to claim 8, wherein after classifying the textual images in the textual image folder into the folder corresponding to the first subdivision type, further include:

In response to an operation of the user inputting a search term, it is searched whether there is a keyword matching the search term, and if so, a text image corresponding to the matched keyword is output.
The image recognition and classification method according to claim 8, wherein after classifying the textual images in the textual image folder into the folder corresponding to the first subdivision type, further include:

In response to a user's printing operation, according to a pre-configured one-key import function, import all text-based images in the folder corresponding to the first subdivision type for printing.
The image recognition and classification method according to claim 13, characterized in that, before performing printing, further comprising:

If there is a text image that needs to be signed in all the imported text images, the signature is performed in the preset signature area in the text image that needs to be signed;

And/or, if there is a defective text-based image in all the imported text-based images, filter processing is performed on the defective text-based image.
A document identification processing method, characterized in that, comprising:

obtaining an input image, which contains the original document to be identified;

Recognize the original document in the input image by using the text recognition method according to any one of claims 1-5 to obtain a character recognition result of the original document;

According to the position information of each character of the original document in the input image, the character recognition results of the original document are arranged to obtain a recognized document.
The document recognition processing method according to claim 15, wherein the character recognition results of the original document are arranged according to the position information of each character of the original document in the input image to obtain the recognition result. Documentation, including:

According to the position information of each character of the original document in the input image, the original text in the original document is replaced by the character recognition result of the original document to obtain a recognized document.
The document identification processing method according to claim 15, characterized in that, after obtaining the identification document, further comprising:

The original document is compared with the identification document to determine whether there is a difference between the identification document and the original document, and if there is, the difference is corrected in the identification document.
The document recognition processing method according to claim 15, characterized in that before recognizing the original document in the input image, the method further comprises:

A correction model is used to identify the radian of the curve of the original document in the input image, and if the radian of the curve satisfies a preset correction condition, a correction process is performed on the original document in the input image to remove the radian of the original document. Curve radian.
The document identification processing method according to claim 15, characterized in that, after obtaining the identification document, further comprising:

Identify the input image by using an annotation recognition model to identify the annotation content in the original document;

In the recognized document, the character recognition result corresponding to the marked content is typeset into a format consistent with the original document.
An electronic device, characterized in that it comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface, and the memory communicate with each other through the communication bus;

the memory for storing computer programs;

The processor, when executing the program stored on the memory, implements the steps in the text recognition method as described in any one of claims 1 to 5, or implements any one of claims 6 to 14 The steps in the image recognition and classification method described in item 1, or the steps in the document recognition processing method according to any one of claims 15 to 19 are implemented.