CN112396047B

CN112396047B - Training sample generation method and device, computer equipment and storage medium

Info

Publication number: CN112396047B
Application number: CN202011185686.8A
Authority: CN
Inventors: 周进洋; 刘洋; 刘渊; 张科; 梁扩战
Original assignee: Zhongdian Jinxin Software Co Ltd
Current assignee: Zhongdian Jinxin Software Co Ltd
Priority date: 2020-10-30
Filing date: 2020-10-30
Publication date: 2022-03-08
Anticipated expiration: 2040-10-30
Also published as: CN112396047A

Abstract

The application relates to a training sample generation method, a training sample generation device, computer equipment and a storage medium. The method comprises the following steps: acquiring a document to be detected and acquiring a manual marking position; inputting the bill to be detected to a text position detection model to obtain a bill text position output by the text position detection model; if the document text position is not matched with the manual marking position, generating a detection model training sample according to the document to be detected; the detection model training sample is used for training the text position detection model. By adopting the method, the training sample generation efficiency can be improved.

Description

Training sample generation method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of character recognition technologies, and in particular, to a training sample generation method, an apparatus, a computer device, and a storage medium.

Background

With the development of character recognition technology, character recognition models have appeared, and when character recognition is performed using the character recognition models, it is common to sequentially recognize characters printed on paper or characters in photographs in a line-by-line scanning manner.

When identifying a document, because characters are usually located at specific positions on the document, the conventional line-by-line scanning method is inefficient. In order to improve the recognition efficiency and reduce the recognition difficulty, the character position can be detected on the document through the position detection model, the document is split into a plurality of pictures containing character contents according to the character position, and then the pictures are recognized through the character recognition model.

However, when a training sample of a position detection model is obtained at present, usually after a document is collected, character positions on the document are identified and labeled manually, the document with the label is used as the training sample, when the position model detection accuracy needs to be improved, a large number of documents need to be identified and labeled manually, and the generation efficiency of the training sample is low.

Therefore, the current training sample generation technology for character recognition has the problem of low efficiency.

Disclosure of Invention

In view of the above, it is necessary to provide a training sample generation method, apparatus, computer device and storage medium capable of improving efficiency.

The embodiment of the invention provides a training sample generation method, which comprises the following steps:

acquiring a document to be detected and manually marking a position;

inputting the bill to be detected to a text position detection model to obtain a bill text position output by the text position detection model;

if the document text position is not matched with the manual marking position, generating a detection model training sample according to the document to be detected; the detection model training sample is used for training the text position detection model.

In one embodiment, the generating a detection model training sample according to the to-be-detected document includes:

acquiring a first original training sample of the text position detection model;

obtaining a first enhancement training sample by carrying out image enhancement on the document to be detected; the image enhancement comprises at least one of image warping, image stretching, and image tilting;

and obtaining the detection model training sample according to the first original training sample and the first enhanced training sample.

In one embodiment, the obtaining a first enhanced training sample by performing image enhancement on the document to be detected includes:

obtaining an enhanced image of the document to be detected by enhancing the image of the document to be detected;

obtaining the text position of the document enhancement image to be detected according to the manual marking position;

and obtaining the first enhanced training sample according to the to-be-detected receipt enhanced image and the text position.

In one embodiment, the method further comprises:

training the text position detection model according to the detection model training sample to obtain a target text position detection model;

acquiring a document to be identified;

inputting the bill to be recognized to the target text position detection model to obtain a target text position output by the target text position detection model;

intercepting a target text picture in the bill to be identified according to the target text position;

inputting the target text picture into a text recognition model to obtain text content output by the text recognition model;

counting the recognition condition of the text recognition model according to the text content;

generating a recognition model training sample according to the recognition condition; the recognition model training sample is used for training the text recognition model.

In one embodiment, the generating a recognition model training sample according to the recognition condition includes:

if the recognition condition meets a preset sample enhancement condition, acquiring a second original training sample of the text recognition model;

obtaining a second enhancement training sample by carrying out image enhancement on the target text picture; the image enhancement comprises at least one of image warping, image stretching, and image tilting;

and obtaining the recognition model training sample according to the second original training sample and the second enhanced training sample.

In one embodiment, the identification condition comprises an accuracy level; the counting the recognition condition of the text recognition model according to the text content comprises the following steps:

inputting the target text picture into a cross validation model to obtain cross validation text content output by the cross validation model;

counting the matching condition between the text content and the cross validation text content; the matching condition comprises at least one of complete matching, partial matching and non-matching;

and obtaining the accuracy grade according to the matching condition.

In one embodiment, the identification condition further comprises an identification confidence; the counting the recognition condition of the text recognition model according to the text content further comprises:

counting a first confidence coefficient of the text content and a second confidence coefficient of the cross-validation text content;

and weighting the first confidence coefficient and the second confidence coefficient according to a preset first confidence coefficient weight and a preset second confidence coefficient weight to obtain the recognition confidence coefficient.

The embodiment of the invention provides a training sample generating device, which comprises:

the acquisition module is used for acquiring the document to be detected and manually marking the position;

the text position detection module is used for inputting the document to be detected into the text position detection model to obtain a document text position output by the text position detection model;

the training sample generation module is used for generating a detection model training sample according to the to-be-detected receipt if the receipt text position is not matched with the manual marking position; the detection model training sample is used for training the text position detection model.

The embodiment of the invention provides computer equipment, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to realize the following steps:

acquiring a document to be detected and manually marking a position;

An embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the following steps:

acquiring a document to be detected and manually marking a position;

The training sample generation method, the training sample generation device, the computer equipment and the storage medium in the embodiment of the invention have the advantages that by acquiring the document to be detected and the manual marking position, the document to be detected is input into the document position detection model, the document text position output by the document position detection model is obtained, the document text position detected by the document position detection model can be compared with the manual marking position, if the document text position is not matched with the manual marking position, the inaccuracy of the model to be detected in the document detection can be determined, the training sample of the text position detection model is generated according to the document to be detected, the proportion of the inaccurate detection sample in the whole training sample can be increased, the detection accuracy of the model to the inaccurate detection sample is improved, the training sample is directly generated according to the inaccurate detection document to be detected, the detection accuracy of the text position can be improved, the generation efficiency of the training samples is improved.

Drawings

FIG. 1 is a schematic flow chart diagram of a training sample generation method in one embodiment;

FIG. 2 is a schematic flow chart of a training sample generation method according to another embodiment;

FIG. 3 is a block diagram of a training sample generation apparatus according to an embodiment;

FIG. 4 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In one embodiment, as shown in fig. 1, a training sample generation method is provided, which may be applied to a terminal or a server, where the terminal may be, but is not limited to, various personal computers, laptops, smartphones, tablets and portable wearable devices, and the server may be implemented by an independent server or a server cluster composed of multiple servers. Taking the application of the method to the server as an example for explanation, the method comprises the following steps:

and step S110, obtaining the bill to be detected and manually marking the position.

The document to be detected can be a document needing to detect the text position.

The manually marked position can be a text position on the document obtained through manual identification and marking.

In specific implementation, the server can acquire the real document and use the real document as the document to be detected. And manually identifying the text position on the document to be detected, and storing the identified text position on the server corresponding to the document to be detected as a label.

For example, after business day ends, a server of a bank can acquire all documents transacted on the day as documents to be detected, a model trainer can identify vertex coordinates of each text box on the documents, the vertex coordinates are used as labels and input into the server, and the labels and the documents to be detected are correspondingly stored by the server.

And step S120, inputting the bill to be detected into the text position detection model to obtain the bill text position output by the text position detection model.

The text position detection model can be a model for identifying the text position on the document. The document text position may be the position of each text region on the document.

The Text position Detection model can be obtained by training in advance according to the manually labeled document Text position, and can be a DBNet (differential binary Network) or a CRAFT (Character Region aware for Text Detection) model.

In specific implementation, the server can input all the obtained real documents into the text position detection model, and the text position detection model can output the positions of the text regions on the documents.

For example, the text position detection model may identify a rectangular text region on the document, output four vertices of the rectangular text region as document text positions, and output coordinates of top-left vertices of the rectangular text region and the length and width of the rectangle as document text positions.

Step S130, if the document text position is not matched with the manual marking position, generating a detection model training sample according to the document to be detected; and the detection model training sample is used for training the text position detection model.

The detection model training sample can be a training sample of a text position detection model.

In the concrete implementation, the server can compare the document text position with the manual marking position, if the document text position and the manual marking position are the same, the document text position can be determined to be matched with the manual marking position, the document to be detected can not be processed at the moment, the document to be detected is directly used as an element in the document position detection model training sample set, otherwise, if the document text position and the manual marking position are different, the document text position can be determined to be not matched with the manual marking position, the document to be detected can be subjected to image enhancement processing at the moment, the image distortion, the image stretching and the image inclination are included, a plurality of copies of the document to be detected are generated, and the document to be detected and the plurality of copies of the document to be detected are used as elements in the document position detection model training sample set.

It should be noted that, when the document text position is not matched with the manual labeling position, because the multiple copies are obtained by image enhancement of the document to be detected, and the manual labeling position is an accurate label of the text position in the document to be detected, the document to be detected and the text positions of the multiple copies thereof can be labeled by the manual labeling position, and a sample label is obtained.

In practical application, the document text position can be corrected manually, if the document text position is not corrected manually, the document to be marked can be directly used as a training sample, otherwise, if the document text position is corrected manually, the document to be detected can be subjected to image enhancement, a plurality of copies of the document to be detected are generated, the document to be marked and the plurality of copies of the document to be marked are used as the training sample, and the manually corrected text position can be used as a training sample label. The training sample set is obtained by processing all documents to be detected, and the text position detection model can be trained.

For example, for an original document transacted on the same day, if the vertex coordinate of the upper left corner of a text box on the original document is (100, 200) and the manually recognized coordinate is (80, 190) which are detected by the text position detection model and are not matched with each other, multiple document copies can be manufactured by performing deformation processing such as distortion, stretching and tilting on the original document, the original document and the multiple document copies can be used as training samples of the text position detection model, wherein the original document and the multiple document copies can be labeled by using the manually recognized coordinate (80, 190).

According to the training sample generation method, the document to be detected and the manual marking position are obtained, the document to be detected is input into the document position detection model, the document text position output by the document position detection model is obtained, the document text position detected by the document position detection model can be compared with the manual marking position, if the document text position is not matched with the manual marking position, the fact that the document to be detected is detected inaccurately by the model can be determined, the training sample of the document position detection model is generated according to the document to be detected, the proportion of the sample to be detected inaccurately in all training samples can be increased, the detection accuracy of the model on the sample to be detected inaccurately is improved, and the generation efficiency of the training sample can be improved while the detection accuracy of the text position is improved.

In an embodiment, the step S130 may specifically include:

step S132, acquiring a first original training sample of the text position detection model;

s134, performing image enhancement on the document to be detected to obtain a first enhanced training sample; the image enhancement comprises at least one of image warping, image stretching and image tilting;

and S136, obtaining a detection model training sample according to the first original training sample and the first enhanced training sample.

The first original training sample can be each document to be detected of the input text position detection model.

The first enhancement training sample can be a training sample obtained by performing image enhancement processing on a document to be detected input into the text position detection model when the document text position output by the text position detection model is not matched with the manual labeling position.

In the concrete implementation, all documents to be detected of the input text position detection model can be used as first original training samples, after the documents to be detected are input into the text position detection model, if the detected document text position is different from the manual labeling position, image enhancement processing including image distortion, image stretching and image inclination can be carried out on the documents to be detected, a plurality of image enhancement copies of the documents to be detected are generated and used as first enhancement training samples, and the set of the detection model training samples can be composed of a plurality of first original training samples and a plurality of first enhancement training samples.

In practical application, if the document text position detected by the text position detection model does not need to be corrected, the corresponding document to be detected can be used as a first original training sample, if the document text position detected by the text position detection model needs to be corrected, the corresponding document to be detected can be used as the first original training sample, the image enhanced copy of the document to be detected can be used as a first enhanced training sample, and the detection model training sample set comprising a plurality of first original training samples and a plurality of first enhanced training samples can be obtained by processing all documents traded on the same day.

In this embodiment, by obtaining the first original training sample of the text position detection model, the original document to be detected can be directly used as the training sample, the generation efficiency of the training sample is improved, and by performing image enhancement on the document to be detected, the first enhanced training sample is obtained, when the text position detection model detects errors, multiple copies of the document to be detected can be used as the training sample, the detection model training sample is obtained according to the first original training sample and the first enhanced training sample, the proportion of the detection error sample in all samples can be increased while the generation efficiency of the training sample is improved, and the detection accuracy of the text position detection model is improved.

In an embodiment, the step S134 may specifically include: carrying out image enhancement on a document to be detected to obtain an enhanced image of the document to be detected; obtaining the text position of the document enhancement image to be detected according to the manual marking position; and obtaining a first enhanced training sample according to the enhanced image and the text position of the document to be detected.

The document enhancement image to be detected can be an image obtained by enhancing the image of the document to be detected.

In the concrete implementation, if the detected document text position is different from the manual marking position, image enhancement processing can be carried out on a to-be-detected document of the input text position detection model, the to-be-detected document enhanced image is obtained by image distortion, image stretching and image inclination, the text position in the to-be-detected document enhanced image can be manually marked, the to-be-detected document enhanced image can be marked by the manual marking position, a label of the to-be-detected document enhanced image is obtained, and a first enhanced training sample is obtained according to the to-be-detected document enhanced image and the label thereof.

In the embodiment, the enhanced image of the to-be-detected document is obtained by image enhancement of the to-be-detected document, the text position of the enhanced image of the to-be-detected document is obtained according to the manually marked position, the enhanced image of the to-be-detected document can be rapidly generated, the text position of the enhanced image can be rapidly obtained, the first enhanced training sample is obtained according to the enhanced image of the to-be-detected document and the text position, and the generation efficiency of the training sample can be improved.

In an embodiment, the training sample generating method may further include:

step S140, training the text position detection model according to the detection model training sample to obtain a target text position detection model;

step S141, obtaining a bill to be identified;

step S142, inputting the bill to be identified into the target text position detection model to obtain the target text position output by the target text position detection model;

step S143, intercepting a target text picture in the bill to be identified according to the position of the target text;

step S144, inputting the target text picture into a text recognition model to obtain text content output by the text recognition model;

step S145, counting the recognition condition of the text recognition model according to the text content;

step S146, generating a recognition model training sample according to the recognition condition; and the recognition model training sample is used for training the text recognition model.

The target text position detection model may be a text position detection model obtained by training a text position detection model using a detection model training sample. The document to be identified can be a document needing to identify the text content. The target text position may be a text position detected by the target text position detection model. The target text picture may be a picture taken at a target text position on the document to be recognized, and the picture may contain one or more texts. The text recognition model can be a model for recognizing text contents in a text picture, can be obtained by training in advance according to a manually labeled text picture, and can be an EfficientNet model. The recognition condition of the text recognition model can be a recognition accuracy level and a recognition confidence of the text recognition model. The recognition model training samples may be training samples of a text recognition model.

In a specific implementation, after the detection model training sample is generated in step S130, the detection model training sample may be used to train the text position detection model to obtain a target text position detection model, the document to be recognized is input into the target text position detection model, and the text position in the document to be recognized is detected, the server may intercept one or more text pictures in the document to be recognized according to the text position, and use the text pictures as target text pictures, and input the target text pictures into the text recognition model, and text contents in the target text pictures may be recognized by the text recognition model. The method can also be used for identifying a target text picture through a cross validation model to obtain cross validation text content output by the cross validation model, or identifying the text picture in a manual mode to obtain manual labeling text content, comparing the text content identified by the text identification model with the cross validation text content or the manual labeling text content to obtain the identification condition of the text identification model, generating an identification model training sample according to the identification condition, and training the text identification model by using the identification model training sample.

In the embodiment, the text position detection model is trained according to the detection model training sample to obtain the target text position detection model, the text position detection model with high position detection accuracy can be obtained, the document to be recognized is input into the target text position detection model to obtain the target text position output by the target text position detection model, the target text picture is intercepted in the document to be recognized according to the target text position, the text picture can be accurately intercepted on the basis of accurately determining the text position, further, the target text picture is input into the text recognition model to obtain the text content output by the text recognition model, the interference caused by the inaccurate text position determination on the text recognition can be avoided, the recognition condition of the text recognition model is counted according to the text content, the recognition model training sample is generated according to the recognition condition, the training samples can be adjusted according to the recognition condition, and the recognition accuracy of the text recognition model is improved.

In an embodiment, the step S146 may specifically include: if the recognition condition meets a preset sample enhancement condition, acquiring a second original training sample of the text recognition model; obtaining a second enhancement training sample by carrying out image enhancement on the target text picture; the image enhancement comprises at least one of image warping, image stretching and image tilting; and obtaining a recognition model training sample according to the second original training sample and the second enhanced training sample.

The second original training sample may be each target text picture of the input text recognition model.

The second enhancement training sample may be a training sample obtained by performing image enhancement processing on a target text picture input to the text recognition model when the text content output by the text recognition model is inaccurate or has low accuracy.

In the specific implementation, all target text pictures input into the text recognition model can be used as second original training samples, after the target text pictures are input into the text recognition model, the recognition condition of the text recognition model can be counted according to the text content output by the text recognition model, the recognition accuracy can be counted, if the recognition accuracy is lower than a preset accuracy threshold, image enhancement including image distortion, image stretching and image inclination can be performed on the target text pictures to generate a plurality of image enhancement copies of the target text pictures, the image enhancement copies serve as second enhancement training samples, and a set of the recognition model training samples can be composed of a plurality of second original training samples and a plurality of second enhancement training samples.

In the embodiment, if the recognition condition meets the preset sample enhancement condition, a second original training sample of the text recognition model is obtained, the image enhancement is performed on the target text picture to obtain a second enhanced training sample, the target text picture and the image enhanced copy thereof can be directly used as the training sample when the recognition condition meets the preset condition, the generation efficiency of the training sample is improved, the recognition model training sample is obtained according to the second original training sample and the second enhanced training sample, and the recognition accuracy of the text recognition model can be improved while the generation efficiency of the training sample is improved.

In an embodiment, the identifying condition includes an accuracy level, and the step S145 may specifically include: inputting the target text picture into the cross validation model to obtain cross validation text content output by the cross validation model; counting the matching condition between the text content and the cross validation text content; the matching condition comprises at least one of complete matching, partial matching and non-matching; and obtaining the accuracy grade according to the matching condition.

The cross-validation content may be text content identified by a cross-validation model.

In specific implementation, the text picture can be identified through a third party payment interface to serve as cross validation text content, and the text picture can also be identified through an open-source identification engine to serve as cross validation text content. The text content and the cross validation text content can be compared to obtain the matching condition of the text content and the cross validation text content, the matching condition can be complete matching, partial matching or unmatched, the corresponding relationship between the matching condition and the accuracy grade can be preset, and the accuracy grade of the text content can be determined according to the matching condition based on the corresponding relationship. And if the accuracy rate grade is lower than a preset grade threshold value, image enhancement needs to be carried out on the target text picture.

For example, the EfficientNet model can be used as a text recognition model, the third party payment interface and the open source recognition engine can be used as a cross validation model, the target text picture is respectively recognized, three recognition results are obtained, when the three recognition results are completely consistent, the three recognition results can be marked by green, when the three recognition results are partially consistent, the three recognition results can be marked by yellow, and when the recognition results are all different, the three recognition results can be marked by red. The corresponding relationship between the color mark and the accuracy level may also be set, for example, green, yellow, and red may be set to correspond to high, medium, and low accuracy levels, respectively. Through color marking, a user can pay key attention to the recognition result with lower accuracy grade.

In the embodiment, the target text picture is input into the cross validation model to obtain the cross validation text content output by the cross validation model, the matching condition between the text content and the cross validation text content is counted, the accuracy grade is obtained according to the matching condition, the identification accuracy grade of the text identification model can be accurately determined by performing cross validation on the text content identified by the text identification model, and then whether a training sample of the text identification model needs to be generated through image enhancement is determined according to the accuracy grade, so that the generation efficiency of the training sample is improved.

In an embodiment, the recognition condition further includes a recognition confidence, and the step S145 may further include: inputting the target text picture into the cross validation model to obtain cross validation text content output by the cross validation model; counting a first confidence coefficient of the text content and a second confidence coefficient of the cross validation text content; and weighting the first confidence coefficient and the second confidence coefficient according to the preset first confidence coefficient weight and second confidence coefficient weight to obtain the recognition confidence coefficient.

In the specific implementation, the target text picture can be identified through the text identification model and the cross validation model, the text content and the cross validation text content are respectively output, and the first confidence coefficient of the text content and the second confidence coefficient of the cross validation text content are counted. The confidence weights of the text content and the cross-validation text content may also be preset, for example, the weight of the text content may be set as a first confidence weight, the weight of the cross-validation text content may be set as a second confidence weight, and the first confidence and the second confidence are weighted according to the first confidence weight and the second confidence weight, so as to obtain the recognition confidence. And if the recognition confidence is lower than a preset confidence threshold, image enhancement needs to be carried out on the target text picture.

For example, the EfficientNet model is used as a text recognition model, the third party payment interface and the open source recognition engine are used as cross-validation models, the target text picture is recognized, and recognition results "hello", "hello" and "any" are respectively obtained, and accordingly, the confidence degrees can be respectively determined to be 1, 1 and 0. The confidence weighting coefficients are set to 0.5, 0.3, and 0.2 in advance, and weighting is performed to obtain a recognition confidence of 0.5 × 1+0.3 × 1+0.2 × 0 — 0.8, that is, a recognition result confidence of 0.8 × 100% — 80%. By counting the confidence, the user can pay more attention to the recognition result with lower confidence.

In the embodiment, the cross validation text content output by the cross validation model is obtained by inputting the target text picture into the cross validation model, the first confidence coefficient of the text content and the second confidence coefficient of the cross validation text content are counted, the first confidence coefficient and the second confidence coefficient are weighted according to the preset first confidence coefficient weight and the preset second confidence coefficient weight to obtain the recognition confidence coefficient, the recognition confidence coefficient of the text recognition model can be accurately determined by performing cross validation on the text content recognized by the text recognition model, and then whether a training sample of the text recognition model needs to be generated through image enhancement is determined according to the recognition confidence coefficient, so that the generation efficiency of the training sample is improved.

In one embodiment, as shown in fig. 2, a training sample generation method is provided, which is described by taking the method as an example for being applied to a server, and includes the following steps:

step S201, acquiring a document to be detected and manually marking a position;

step S202, inputting the bill to be detected into a text position detection model to obtain a bill text position output by the text position detection model;

step S203, if the document text position is not matched with the manual marking position, generating a detection model training sample according to the document to be detected;

step S204, training the text position detection model according to the detection model training sample to obtain a target text position detection model;

step S205, obtaining a document to be identified;

step S206, inputting the bill to be identified into the target text position detection model to obtain a target text position output by the target text position detection model;

step S207, intercepting a target text picture in the bill to be identified according to the target text position;

step S208, inputting the target text picture into a text recognition model to obtain text content output by the text recognition model;

step S209, counting the recognition condition of the text recognition model according to the text content;

step S210, generating a recognition model training sample according to the recognition condition; the recognition model training sample is used for training the text recognition model.

To facilitate a thorough understanding of the embodiments of the present application by those skilled in the art, the following description will be given with reference to a specific example.

In the document identification process, the character position in the document needs to be identified and marked through the detection model, so that when the character identification model identifies the character content of the document, the document can be split into a plurality of small pictures containing the character content according to the mark of the character position, and then the identification difficulty is reduced and the identification efficiency is improved. However, when the detection model is trained, each training data is marked manually, and the text position in the document picture to be trained is identified manually, which is inefficient.

Through adopting evening to carry out text detection in batches automatically, discernment in batches changes artifical mark process into artifical correction process, can greatly improve mark efficiency. Specifically, the text position and character recognition may be labeled separately:

(a) and (3) text position marking: and carrying out text position detection and manual correction in batches at night, and respectively carrying out dynamic statistics and analysis on labels without manual correction and labels with manual correction in the manual correction process. The method is characterized in that data enhancement expansion and training are carried out on data with wrong character position detection, namely data enhancement is carried out on documents with high character detection error rate, and specifically, a plurality of pictures can be manufactured through various transformations to increase the number of training data. The training process and the labeling process are combined continuously, the training data volume is greatly reduced, and meanwhile, the training effect is guaranteed.

(b) Marking character recognition training data: and aiming at the manually corrected character area, performing data enhancement and strengthening training by dynamic statistics and accumulation and high error rate.

And marking all the marks based on the real documents.

According to the training sample generation method, the unmarked document picture is directly detected and marked through the detection model, the detection result is corrected manually, the corrected recognition result is used as the training data to train the detection model, so that the original manual marking process of the document picture is converted into the manual error correction process, the training process and the marking process are combined continuously, the training data amount is greatly reduced, and the training effect is guaranteed.

It should be understood that although the various steps in the flow charts of fig. 1-2 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1-2 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps.

In one embodiment, as shown in fig. 3, there is provided a training sample generating apparatus including: an acquisition module 310, a text position detection module 320, and a training sample generation module 330, wherein:

the acquiring module 310 is used for acquiring a document to be detected and manually marking a position;

the text position detection module 320 is configured to input the document to be detected to the text position detection model, so as to obtain a document text position output by the text position detection model;

the training sample generation module 330 is configured to generate a detection model training sample according to the to-be-detected document if the document text position does not match the manually-labeled position; the detection model training sample is used for training the text position detection model.

In an embodiment, the training sample generating module 330 is further configured to obtain a first original training sample of the text position detection model; obtaining a first enhancement training sample by carrying out image enhancement on the document to be detected; the image enhancement comprises at least one of image warping, image stretching, and image tilting; and obtaining the detection model training sample according to the first original training sample and the first enhanced training sample.

In an embodiment, the training sample generating module 330 is further configured to perform image enhancement on the to-be-detected document to obtain an enhanced image of the to-be-detected document; obtaining the text position of the document enhancement image to be detected according to the manual marking position; and obtaining the first enhanced training sample according to the to-be-detected receipt enhanced image and the text position.

In one embodiment, the training sample generating apparatus further includes:

the text position detection model training module is used for training the text position detection model according to the detection model training sample to obtain a target text position detection model;

the to-be-identified document acquisition module is used for acquiring a to-be-identified document;

the text position detection module is used for inputting the bill to be identified to the target text position detection model to obtain a target text position output by the target text position detection model;

the intercepting module is used for intercepting a target text picture in the bill to be identified according to the target text position;

the text recognition module is used for inputting the target text picture into a text recognition model to obtain text content output by the text recognition model;

the recognition condition counting module is used for counting the recognition condition of the text recognition model according to the text content;

the recognition model training sample generation module is used for generating a recognition model training sample according to the recognition condition; the recognition model training sample is used for training the text recognition model.

In an embodiment, the recognition model training sample generating module is further configured to obtain a second original training sample of the text recognition model if the recognition condition meets a preset sample enhancement condition; obtaining a second enhancement training sample by carrying out image enhancement on the target text picture; the image enhancement comprises at least one of image warping, image stretching, and image tilting; and obtaining the recognition model training sample according to the second original training sample and the second enhanced training sample.

In an embodiment, the identification condition statistics module is further configured to input the target text picture into a cross validation model, so as to obtain a cross validation text content output by the cross validation model; counting the matching condition between the text content and the cross validation text content; the matching condition comprises at least one of complete matching, partial matching and non-matching; and obtaining the accuracy grade according to the matching condition.

In an embodiment, the identification condition statistics module is further configured to input the target text picture into a cross validation model, so as to obtain a cross validation text content output by the cross validation model; counting a first confidence coefficient of the text content and a second confidence coefficient of the cross-validation text content; and weighting the first confidence coefficient and the second confidence coefficient according to a preset first confidence coefficient weight and a preset second confidence coefficient weight to obtain the recognition confidence coefficient.

For the specific definition of the training sample generation device, reference may be made to the above definition of the training sample generation method, which is not described herein again. The modules in the training sample generation device can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 4. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used to store training sample generation data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a training sample generation method.

Those skilled in the art will appreciate that the architecture shown in fig. 4 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of a training sample generation method as described above. Here, the steps of a training sample generation method may be steps in a training sample generation method of the above embodiments.

In one embodiment, a computer-readable storage medium is provided, storing a computer program that, when executed by a processor, causes the processor to perform the steps of a training sample generation method as described above. Here, the steps of a training sample generation method may be steps in a training sample generation method of the above embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method for generating training samples, the method comprising:

acquiring a document to be detected and manually marking a position;

if the document text position is not matched with the manual marking position, performing image enhancement processing on the document to be detected to generate a detection model training sample;

inputting the bill to be recognized into the target text position detection model to obtain a target text position output by the target text position detection model; intercepting a target text picture in the bill to be identified according to the target text position;

inputting the target text picture into a text recognition model to obtain text content output by the text recognition model, and counting the recognition condition of the text recognition model;

if the recognition condition meets a preset sample enhancement condition, generating a recognition model training sample by carrying out image enhancement on the target text picture; the recognition model training sample is used for training the text recognition model.

2. The method according to claim 1, wherein the image enhancement processing of the document to be detected to generate multiple copies of the document to be detected, and the taking of the document to be detected and the multiple copies of the document to be detected as a detection model training sample comprises:

3. The method according to claim 2, wherein the obtaining a first enhanced training sample by image enhancement of the document to be detected comprises:

4. The method of claim 1, further comprising:

acquiring a document to be identified;

5. The method of claim 4, wherein generating recognition model training samples according to the recognition condition comprises:

6. The method of claim 4, wherein the identification condition comprises a level of accuracy; the counting the recognition condition of the text recognition model according to the text content comprises the following steps:

and obtaining the accuracy grade according to the matching condition.

7. The method of claim 4, wherein the recognition case further comprises a recognition confidence; the counting the recognition condition of the text recognition model according to the text content further comprises:

8. A training sample generation apparatus, the apparatus comprising:

the training sample generation module is used for carrying out image enhancement processing on the document to be detected if the document text position is not matched with the manual marking position, and generating a detection model training sample;

the training sample generation device is also used for training the text position detection model according to the detection model training sample to obtain a target text position detection model; inputting the bill to be recognized into the target text position detection model to obtain a target text position output by the target text position detection model; intercepting a target text picture in the bill to be identified according to the target text position; inputting the target text picture into a text recognition model to obtain text content output by the text recognition model, and counting the recognition condition of the text recognition model; if the recognition condition meets a preset sample enhancement condition, generating a recognition model training sample by carrying out image enhancement on the target text picture; the recognition model training sample is used for training the text recognition model.

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.