WO2021184692A1

WO2021184692A1 - Document classification collaborative robot and image character recognition method based thereon

Info

Publication number: WO2021184692A1
Application number: PCT/CN2020/112598
Authority: WO
Inventors: 邓辅秦; 李伟科; 林淮荣; 黄永深; 冯华; 岳洪伟; 丁毅; 龙佳乐; 张建民; 王栋; 钟东洲; 李澄非; 习江涛
Original assignee: 五邑大学
Priority date: 2020-03-16
Filing date: 2020-08-31
Publication date: 2021-09-23
Also published as: CN111428710A

Abstract

A document classification collaborative robot and an image character recognition method based thereon, wherein the document classification collaborative robot comprises: a camera device for capturing an image of a document; a robot main body for placing the document at a location specified by a user; and an upper computer for recognizing and outputting characters in a text picture and correspondingly performing overall control on the robot main body to make same move, the upper computer being connected to the camera device and the robot main body, respectively. The image character recognition method for the robot comprises: converting an acquired text line picture into a standard text line picture (S100); constructing a picture character recognition model according to the standard text line picture (S200); on the basis of the picture character recognition model, outputting characters in a text picture to be subjected to recognition (S300). The method is cleverly designed, realizes intelligent and efficient classification and recognition, and can classify and sort documents in a non-manual manner, thereby facilitating the reduction in the workload of office workers.

Description

Document classification collaborative robot and image character recognition method based on it

Technical field

The invention relates to the field of file management, in particular to a file classification collaborative robot and an image character recognition method based thereon.

Background technique

Nowadays, urban people are under great work pressure and need to process a large number of paper documents. The time spent on paper document sorting is very precious today when efficiency is paramount. In view of the current existing document classification methods, most of them use manual manual classification. The efficiency is low and classification errors are prone to occur. A series of equipment has been launched on the market to perform machine classification to reduce the labor burden, but it is impossible to achieve complete de-manualization. For example, in the Chinese patent CN110369296A, an office is proposed. The file sorting system and its prompting method use RFID electronic tags to store and classify paper documents, but it still needs to manually perform preliminary identification of classified files and install electronic tags on the classified files. When the number of files to be classified is large , Or when a large number of files to be classified need to be classified, a large number of electronic tags still need to be manually installed, the efficiency is still low, and there is a certain error rate.

Summary of the invention

In order to solve the above-mentioned problems, the purpose of the present invention is to provide a document classification collaborative robot and an image and text recognition method based on it, which can sort and organize documents manually, which is beneficial to reduce the workload of office workers.

In order to make up for the deficiencies of the prior art, the technical solutions adopted in the embodiments of the present invention are:

A collaborative robot for document classification, including:

The camera device is used to take images of files;

The main body of the robot, which is used to put the file in the position designated by the user;

The upper computer is used for recognizing and outputting the characters in the text picture and correspondingly overall controlling the movement of the robot main body, which is respectively connected with the camera device and the robot main body.

An image and text recognition method of a document classification collaborative robot, including:

Convert the obtained text line picture into a standard text line picture;

Constructing a picture character recognition model according to the standard text line picture;

Based on the picture character recognition model, the characters in the text picture to be recognized are output.

The one or more technical solutions provided in the embodiments of the present invention have at least the following beneficial effects: based on the cooperation of the camera device, the main body of the robot, and the host computer, it can take a picture of a document and recognize the text in it, and then realize the realization based on the recognized text For document classification management, there is no need to manually assist the entire process, and the degree of intelligent control is high. In particular, the construction of a relevant recognition model for the taken pictures can systematically recognize the text in the document, and the recognition effect is more prominent and not easy In the case of misrecognition, the recognition error rate is greatly reduced. Therefore, the present invention is ingenious in design, intelligent and efficient in classification and recognition, and can manually sort and organize files, which is beneficial to reducing the work burden of office workers.

Further, said converting the obtained text line picture into a standard text line picture includes:

Form a preset text line binarization picture according to the obtained text line picture;

Determining a preset background picture according to the text picture to be recognized;

The text line binarized picture and the background picture are synthesized to obtain a standard text line picture.

Further, said forming a preset text line binarization picture according to the obtained text line picture includes:

Extract a number of relevant text content from the obtained text line picture;

Processing the text content to generate a corresponding text image;

A preset text line binary picture is formed according to the text image.

Further, said determining the preset background picture according to the text picture to be recognized includes:

Determine the relevant standard template picture according to the text picture to be recognized;

Obtain a background area without text from the standard template picture;

A preset background picture is formed according to the background area without text.

Further, the construction of a picture character recognition model based on the standard text line picture includes:

Obtain a corresponding sample picture according to the standard text line picture;

Integrating the sample picture and the text content in the sample picture to form a training sample set;

A picture text recognition model is constructed through a deep neural network based on the training sample set, wherein the deep neural network is designed to use the sample picture as training data and the text content of the sample picture as a label.

Further, the obtaining the corresponding sample picture according to the standard text line picture includes:

Perform expansion and change processing on the standard text line picture to obtain a corresponding sample picture, where the expansion and change processing includes perspective transformation, tone transformation, adding shadow effects, adding highlight effects, adding noise, cropping, scaling, and compression. One treatment or multiple treatments.

Further, the output of the text in the text picture to be recognized based on the picture text recognition model includes:

Acquiring a number of related entries from the text in the text picture to be recognized;

Split and combine a number of said entries to generate new entries;

The new entry is converted into corresponding text content according to the preset font type.

Further, after the output of the text in the to-be-recognized text image based on the image text recognition model, the method further includes:

Compare the output text in the to-be-recognized text picture with the user classification rules determined in the host computer. If the two are consistent, the host computer sends instructions to the robot body to put the file in The user specifies the location, otherwise, return to get the text line picture and execute the image text recognition method.

The additional aspects and advantages of the present invention will be partly given in the following description, and partly will become obvious from the following description, or be understood through the practice of the present invention.

Description of the drawings

Hereinafter, preferred embodiments of the present invention are given in conjunction with the drawings to illustrate the implementation of the present invention in detail.

FIG. 1 is a schematic block diagram of the structure of a file classification collaborative robot according to an embodiment of the present invention;

2 is a schematic block diagram of the overall steps of an image character recognition method according to an embodiment of the present invention;

3 is a schematic flowchart of the step of "converting the obtained text line picture into a standard text line picture" in the image character recognition method of the embodiment of the present invention;

4 is a schematic flowchart of the step of "building a picture character recognition model based on the standard text line picture" in the image character recognition method of the embodiment of the present invention;

5 is a schematic flowchart of the step of "outputting text in a text picture to be recognized based on the picture text recognition model" in the image text recognition method according to an embodiment of the present invention;

Fig. 6 is a schematic flowchart of steps of an image character recognition method according to an embodiment of the present invention.

Detailed ways

In order to make the objectives, technical solutions, and advantages of the present invention clearer, the following further describes the present invention in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, but not used to limit the present invention.

It should be noted that if there is no conflict, the various features in the embodiments of the present invention can be combined with each other, and all fall within the protection scope of the present invention. In addition, although functional modules are divided in the system schematic diagram, and the logical sequence is shown in the flowchart, in some cases, the module division in the system may be different from the module division in the system, or the sequence shown in the flowchart may be executed. Or the steps described.

The embodiments of the present invention will be further described below in conjunction with the accompanying drawings.

1, a document classification collaborative robot according to an embodiment of the present invention includes:

The camera device is used to take images of files;

2 and 6, an image and text recognition method of a document classification collaborative robot according to an embodiment of the present invention includes:

Convert the obtained text line picture into a standard text line picture;

Specifically, based on the cooperation of the camera, the main body of the robot, and the host computer, it can take a picture of the document and recognize the text in it, and then realize the classification and management of the document according to the recognized text, without manual assistance in the entire process, and a high degree of intelligent control In particular, the construction of a relevant recognition model for the taken pictures can systematically recognize the text in the document, the recognition effect is more prominent, the situation of misrecognition is not easy to occur, and the recognition error rate is greatly reduced. Therefore, the present invention is ingenious in design, intelligent and efficient in classification and recognition, and can manually sort and organize files, which is beneficial to reducing the work burden of office workers.

Among them, the function of the robot body is not limited to the above-mentioned content. In fact, when the recognition and shooting of the camera device is blocked, it can feed back to the host computer, and then control the robot body to move the obstacle, which plays a good auxiliary shooting role; In addition, the upper computer stores a database of standard text line pictures, so it can easily construct a corresponding picture character recognition model based on any standard text line picture; and, when the camera device recognizes, it is classified based on the paper document input by the user According to the rules, the document is generally photographed and identified from near to far, that is, the documents are photographed one by one, and the documents are identified separately to avoid errors and omissions. Referring to Figure 5, this step: output the text in the text picture to be recognized based on the picture text recognition model, only need to obtain the text picture to be recognized, and then input it into the picture text recognition model, then the text to be recognized can be output It is very convenient and reliable to recognize the text in the text picture.

Furthermore, referring to FIG. 3, the conversion of the obtained text line picture into a standard text line picture includes:

Specifically, the obtained text line picture is converted into a standard text line picture, which can be matched with a database of standard text line pictures in the host computer, so as to facilitate its construction of a relevant picture character recognition model, wherein the obtained text line Binarization of the picture rendering can make its brightness more clearly visible and improve its recognition. On the other hand, the background of the text picture to be recognized can be extracted, which can determine the application range of the text picture to be recognized, and then make the text line binary. By synthesizing the chemical picture and the background picture, the standard can be unified, and the standard text line picture can be obtained to facilitate the construction of the recognition model.

Furthermore, said forming a preset text line binarization picture according to the obtained text line picture includes:

Extract a number of relevant text content from the obtained text line picture;

Processing the text content to generate a corresponding text image;

A preset text line binary picture is formed according to the text image.

In this embodiment, by extracting the text content of the obtained text line picture, the text characteristics of the text line picture can be reflected, and then the text characteristics are output in the form of a text image, and finally converted into a text line binarized picture It can be seen that, through the steps of this embodiment, the acquired text characteristics of the text line picture can be extracted and presented in the text line binarized picture, which is beneficial to the subsequent text recognition processing.

Furthermore, the said determining the preset background picture according to the text picture to be recognized includes:

Obtain a background area without text from the standard template picture;

Specifically, the standard template image is compared to the text image to be recognized, and the text image to be recognized can be mapped to the standard template image. Then only the background area without text in the standard template image is obtained, which is equivalent to also from the text to be recognized In addition to the corresponding background area extracted from the picture, the conversion can be used to conveniently and effectively generate a background picture for the text picture to be recognized.

Furthermore, referring to FIG. 4, the construction of a picture character recognition model based on the standard text line picture includes:

In this embodiment, training based on sample pictures and their text content can reflect the characteristics of standard text line pictures in the training set, so that the recognition of the constructed picture text recognition model is more accurate, and a deep neural network is used. With the construction, the image data can be integrated and processed more stably, making the construction of the recognition model more convenient and effective. Specifically, set the CRNN model as a deep neural network model. The CRNN model includes a convolutional layer using CNN, a recurrent layer using BiLSTM, and a transcription layer using CTC. The formula is expressed as

Among them, χ={I _i, ,L _I }, i represents the training sample set, I _i, is the i-th sample picture, L _I is the text content in _{the i-th sample picture, Y I} is the i-th sample picture Corresponding to the predicted text content, the subscript i is the sequence number of the training data in the training sample set.

Furthermore, said obtaining the corresponding sample picture according to the standard text line picture includes:

The standard text line picture is subjected to one or more processing of expansion and variable transformation, tone transformation, adding shadow effect, adding highlight effect, adding noise, cropping, scaling, and compression. Specifically, the expansion and change processing of the standard text line picture can eliminate and reduce its own picture defects, so as to obtain a more stable and rich sample picture. The above expansion and change processing is obtained by the inventor based on experiments and experience. of.

Split and combine a number of said entries to generate new entries;

In this embodiment, the combined expansion of the text in the text image can generate more new entries and match them with the text content, so this is equivalent to expanding the scope of its application on the original basis. , Can be adapted to the picture character recognition model to output the characters in the text picture to be recognized.

Specifically, the user classification rule determined in the host computer is that the user decides the classification method, such as mathematical characters, subject differences, etc., which can facilitate the user to participate in the formulation of the classification; because there is likely to be more than one document processed in practice, Therefore, the above steps can also be repeated until it is judged whether it is the last file. If it is, the classification is ended. Otherwise, continue to classify according to the above steps. It can be seen that through comparison, it can be verified whether the text in the text image to be recognized meets the user's classification. Demands in order to find errors and reduce the error rate.

The above content has described the preferred embodiments and basic principles of the present invention in detail, but the present invention is not limited to the above-mentioned embodiments. Those skilled in the art should understand that there will be various types without departing from the spirit of the present invention. Equivalent deformations and replacements, these equivalent deformations and replacements all fall within the scope of the claimed invention.

Claims

A document classification collaborative robot, which is characterized in that it includes:

The camera device is used to take images of files;

The main body of the robot, which is used to put the file in the position designated by the user;

The upper computer is used for recognizing and outputting the characters in the text picture and correspondingly overall controlling the movement of the robot main body, which is respectively connected with the camera device and the robot main body.
An image character recognition method based on a document classification collaborative robot according to claim 1, characterized in that it comprises:

Convert the obtained text line picture into a standard text line picture;

Constructing a picture character recognition model according to the standard text line picture;

Based on the picture character recognition model, the characters in the text picture to be recognized are output.
The image text recognition method based on a document classification collaborative robot according to claim 2, wherein said converting the obtained text line picture into a standard text line picture comprises:

Form a preset text line binarization picture according to the obtained text line picture;

Determining a preset background picture according to the text picture to be recognized;

The text line binarized picture and the background picture are synthesized to obtain a standard text line picture.
The method for image text recognition based on a document classification collaborative robot according to claim 3, wherein said forming a preset text line binarization picture according to the obtained text line picture comprises:

Extract a number of relevant text content from the obtained text line picture;

Processing the text content to generate a corresponding text image;

A preset text line binary picture is formed according to the text image.
The image and text recognition method based on a document classification collaborative robot according to claim 3, wherein said determining a preset background picture according to a text picture to be recognized comprises:

Determine the relevant standard template picture according to the text picture to be recognized;

Obtain a background area without text from the standard template picture;

A preset background picture is formed according to the background area without text.
The image character recognition method based on a document classification collaborative robot according to claim 2, wherein said constructing an image character recognition model based on the standard text line image comprises:

Obtain a corresponding sample picture according to the standard text line picture;

Integrating the sample picture and the text content in the sample picture to form a training sample set;

A picture text recognition model is constructed through a deep neural network based on the training sample set, wherein the deep neural network is designed to use the sample picture as training data and the text content of the sample picture as a label.
The method for image character recognition based on a document classification collaborative robot according to claim 6, wherein said obtaining corresponding sample pictures according to said standard text line pictures comprises:

Perform expansion and change processing on the standard text line picture to obtain a corresponding sample picture, where the expansion and change processing includes perspective transformation, tone transformation, adding shadow effects, adding highlight effects, adding noise, cropping, scaling, and compression. One treatment or multiple treatments.
The image character recognition method based on a document classification collaborative robot according to claim 2, wherein said outputting the characters in the text picture to be recognized based on the picture character recognition model comprises:

Acquiring a number of related entries from the text in the text picture to be recognized;

Split and combine a number of said entries to generate new entries;

The new entry is converted into corresponding text content according to the preset font type.
The image text recognition method based on a document classification collaborative robot according to claim 2 or 8, wherein after the text in the text image to be recognized is output based on the image text recognition model, further include:

Compare the output text in the to-be-recognized text picture with the user classification rule determined in the host computer. If the two are consistent, the host computer sends an instruction to the robot body to put the file in The user specifies the location, otherwise, return to get the text line picture and execute the image text recognition method.