CN113159212A - OCR recognition model training method, device and computer readable storage medium - Google Patents

OCR recognition model training method, device and computer readable storage medium Download PDF

Info

Publication number
CN113159212A
CN113159212A CN202110485412.9A CN202110485412A CN113159212A CN 113159212 A CN113159212 A CN 113159212A CN 202110485412 A CN202110485412 A CN 202110485412A CN 113159212 A CN113159212 A CN 113159212A
Authority
CN
China
Prior art keywords
ocr recognition
image
recognition model
image sample
model training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110485412.9A
Other languages
Chinese (zh)
Inventor
邹锦富
杨皓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Yuncong Enterprise Development Co ltd
Original Assignee
Shanghai Yuncong Enterprise Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Yuncong Enterprise Development Co ltd filed Critical Shanghai Yuncong Enterprise Development Co ltd
Priority to CN202110485412.9A priority Critical patent/CN113159212A/en
Publication of CN113159212A publication Critical patent/CN113159212A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of machine learning, in particular to an OCR recognition model training method, an OCR recognition model training device and a computer readable storage medium, and aims to solve the technical problem of how to conveniently and efficiently label an image sample so as to quickly finish model training of an OCR recognition model. To this end, the OCR recognition model training method of the embodiment of the present invention includes: acquiring a first type of image sample with label data; training a preset OCR recognition model by adopting a first type of image sample to obtain an initial OCR recognition model; identifying business data in the second type image sample without label data by using an initial OCR recognition model; generating label data according to the identification result, and labeling the second type image sample with a label; and training an initial OCR recognition model by adopting the first type of image samples and the labeled second type of image samples to obtain a final OCR recognition model. Based on the implementation mode, the OCR recognition model can quickly complete model training, and the labeling accuracy of the image sample is improved.

Description

OCR recognition model training method, device and computer readable storage medium
Technical Field
The invention relates to the technical field of machine learning, in particular to an OCR recognition model training method, an OCR recognition model training device and a computer readable storage medium.
Background
With the advent of the information age, more and more image information is presented to people, and in order to accurately convert text information contained in the image information into information that can be edited by a computer or other devices, an OCR (Optical Character Recognition) Recognition model constructed based on an OCR technology may be employed to detect a text region in an image, recognize text information in the text region, and convert the recognized text information into information that can be edited by the computer or other devices. When the OCR model is constructed, a large number of image samples marked with correct label data (including but not limited to text regions in images and text information in the text regions) are required to be used for model training, so that the OCR model has high OCR recognition capability.
However, as the number of the OCR recognition application scenes is increased and the image differentiation of different application scenes is larger, such as the format difference of text information in an image is larger, the requirements of OCR recognition under different application scenes cannot be met simultaneously through one OCR recognition model, if a special OCR recognition model is respectively constructed for each application scene, because label labeling needs to be carried out on a large number of image samples when the model is constructed, and in order to ensure the accuracy of label labeling, only a manual labeling mode is often adopted, time and labor are wasted, errors are easy to occur, the image sample labeling work cannot be completed conveniently and efficiently, and further, an available special OCR recognition model cannot be quickly constructed for each application scene.
Accordingly, there is a need in the art for a new training scheme for OCR recognition models to solve the above-mentioned problems.
Disclosure of Invention
In order to overcome the above-mentioned drawbacks, the present invention is proposed to solve or at least partially solve the technical problem of how to perform label labeling on image samples conveniently and efficiently to complete model training of an OCR recognition model quickly.
In a first aspect, an OCR recognition model training method is provided, which includes:
acquiring a first type of image sample with label data;
performing model training on a preset OCR (optical character recognition) model by using the first type of image sample to obtain an initial OCR model;
performing OCR recognition on the second type image sample of the label-free data by using the initial OCR recognition model;
generating label data of the second type image sample according to the OCR recognition result, and labeling the second type image sample according to the generated label data;
and performing model training on the initial OCR recognition model by adopting the first type of image sample and the second type of image sample marked by the label to obtain a final OCR recognition model.
In one technical solution of the OCR recognition model training method, the label data of the first type image sample and the second type image sample each include a position of an image recognition area, business data recorded in each image recognition area, and a data type thereof;
the first type of image sample with the label data is obtained by the following method:
responding to a received annotation instruction, and acquiring annotation information of an image sample to be annotated specified in the annotation instruction, wherein the annotation information comprises the position of each image identification area in the image to be annotated, and business data and data types thereof recorded in each image identification area;
generating label data of the image sample to be labeled according to the labeling information and labeling the image sample to be labeled according to the generated label data to obtain a first type of image sample with the label data;
the annotation information is determined according to information which is annotated on the image sample to be annotated by a user through a visual interface.
In one technical solution of the OCR recognition model training method, the position of the image recognition area in the annotation information is determined according to the position of the area selected by the user on the visual interface in a frame selection manner on the image sample to be annotated, and the service data and the category thereof in the annotation information are determined according to the service data and the category thereof entered by the user on the visual interface for each image recognition area.
In one embodiment of the OCR recognition model training method, after the step of "performing model training on the initial OCR recognition model to obtain a final OCR recognition model", the method further includes:
generating a download path of the final OCR recognition model according to the storage position of the final OCR recognition model;
generating and displaying release information of the final OCR recognition model according to the download path;
and/or when the first-class image samples and the second-class image samples under different service scenes are used for respectively training to obtain the initial OCR recognition models corresponding to the service scenes, the step of performing model training on the initial OCR recognition models specifically comprises the following steps:
generating a model training queue according to the training completion time corresponding to each initial OCR recognition model;
sequentially carrying out model training on each initial OCR recognition model according to the training sequence corresponding to each initial OCR recognition model in the model training queue;
and/or the step of "performing model training on the initial OCR recognition model" specifically includes:
and displaying the model training progress of the initial OCR recognition model through a visual interface.
In a second aspect, an OCR recognition model training apparatus is provided, the OCR recognition model training apparatus comprising:
a sample acquisition module configured to acquire a first type of image sample with label data;
a first model training module configured to perform model training on a preset OCR recognition model by using the first type of image sample to obtain an initial OCR recognition model;
an attribute category prediction module configured to perform OCR recognition on a second type of image sample of unlabeled data using the initial OCR recognition model;
a label labeling module configured to generate label data of the second type image sample according to the result of the OCR recognition and label the second type image sample according to the generated label data;
and the second model training module is configured to perform model training on the initial OCR recognition model by adopting the first type of image sample and the second type of image sample labeled by the label so as to obtain a final OCR recognition model.
In one technical solution of the OCR recognition model training apparatus, the label data of the first type image sample and the second type image sample each include a position of an image recognition area, business data recorded in each image recognition area, and a data type thereof;
the sample acquisition module is further configured to perform the following operations:
responding to a received annotation instruction, and acquiring annotation information of an image sample to be annotated specified in the annotation instruction, wherein the annotation information comprises the position of each image identification area in the image to be annotated, and business data and data types thereof recorded in each image identification area;
generating label data of the image sample to be labeled according to the labeling information and labeling the image sample to be labeled according to the generated label data to obtain a first type of image sample with the label data;
the annotation information is determined according to information which is annotated on the image sample to be annotated by a user through a visual interface.
In one technical solution of the OCR recognition model training apparatus, the position of the image recognition area in the annotation information is determined according to the position of the area selected by the user on the visual interface in a frame selection manner on the image sample to be annotated, and the service data and the category thereof in the annotation information are determined according to the service data and the category thereof entered by the user on the visual interface for each image recognition area.
In one aspect of the OCR recognition model training apparatus, the apparatus includes a model issuing module configured to perform the following operations:
generating a download path of the final OCR recognition model according to the storage position of the final OCR recognition model;
generating and displaying release information of the final OCR recognition model according to the download path;
and/or the second model training module comprises a first model training unit and/or a second model training unit;
the first model training unit is configured to, when initial OCR recognition models corresponding to the service scenes are obtained by respectively training first-class image samples and second-class image samples under different service scenes, perform model training on each initial OCR recognition model by performing the following operations:
generating a model training queue according to the training completion time corresponding to each initial OCR recognition model;
sequentially carrying out model training on each initial OCR recognition model according to the training sequence corresponding to each initial OCR recognition model in the model training queue;
the second model training unit is configured to display a model training progress of the initial OCR recognition model through a visualization interface.
In a third aspect, a control device is provided, which comprises a processor and a storage device, wherein the storage device is adapted to store a plurality of program codes, and the program codes are adapted to be loaded and run by the processor to execute the OCR recognition model training method according to any one of the above-mentioned OCR recognition model training methods.
In a fourth aspect, a computer readable storage medium is provided, having stored therein a plurality of program codes adapted to be loaded and run by a processor to execute the OCR recognition model training method according to any one of the above-mentioned OCR recognition model training methods.
One or more technical schemes of the invention at least have one or more of the following beneficial effects:
in the technical scheme of the invention, a preset OCR recognition model can be initially trained by using a first type of image sample with label data to obtain an initial OCR recognition model, a second type of image sample without label data is recognized by using the initial OCR recognition model, and the second type of image sample is labeled according to a recognition result to determine the label data of the second type of image sample. Because the first type of image sample has accurate label data, the initial OCR recognition model trained by the first type of image sample has higher OCR recognition capability, and then the OCR recognition result obtained by performing OCR recognition on the second type of image sample by using the initial OCR recognition model (including but not limited to the position of one or more image recognition areas in the second type of image sample, the business data recorded in each image recognition area and the data type thereof) is also a more accurate result, so that the label data of the second type of image sample generated according to the OCR recognition result is also more accurate label data. That is to say, according to the OCR model training method of the embodiment of the present invention, through the initial OCR recognition model, not only can automatic labeling of the label data of the second type image sample be realized, but also the labeled label data can have higher accuracy. In practical application, in order to ensure the accuracy of the label data of the first-class image samples, a small number of first-class image samples can be labeled in a manual labeling mode, and then the small number of first-class image samples are used by the OCR model training method according to the embodiment of the invention to automatically label the label data of a large number of second-class image samples, so that the workload of manual labeling can be greatly reduced on the premise of ensuring that the second-class image samples have label data with higher accuracy. Further, after label data labeling of the second type of image sample is completed, the initial OCR recognition model can be retrained by using the first type of image sample and the second type of image sample simultaneously, so that the OCR recognition capability of the OCR recognition model is further improved, and the final OCR recognition model is obtained.
Drawings
The disclosure of the present invention will become more readily understood with reference to the accompanying drawings. As is readily understood by those skilled in the art: these drawings are for illustrative purposes only and are not intended to constitute a limitation on the scope of the present invention. Wherein:
FIG. 1 is a flow diagram illustrating the main steps of a method for training OCR recognition models, according to one embodiment of the present invention;
FIG. 2 is a flow chart illustrating the main steps of an OCR recognition model training method according to another embodiment of the present invention;
FIG. 3 is a flow chart illustrating the main steps of a first type of image sample acquisition method according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a visual model training progress real-time monitoring interface for an OCR recognition model according to one embodiment of the present invention;
FIG. 5 is a schematic diagram of a visualization model training progress real-time monitoring interface of an OCR recognition model according to another embodiment of the invention;
FIG. 6 is a block diagram of the main structure of a model training apparatus for OCR recognition models according to an embodiment of the present invention;
FIG. 7 is a block diagram of the main structure of a model training apparatus for OCR recognition models according to another embodiment of the present invention;
list of reference numerals:
61: a sample acquisition module; 62: a first model training module; 63: an attribute category prediction module; 64: a label labeling module; 65: a second model training module; 71: a data processing module; 72: a model training module; 73: a model deployment verification module; 74: and (4) configuring a model algorithm output recognition engine module.
Detailed Description
Some embodiments of the invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.
In the description of the present invention, a "module" or "processor" may include hardware, software, or a combination of both. A module may comprise hardware circuitry, various suitable sensors, communication ports, memory, may comprise software components such as program code, or may be a combination of software and hardware. The processor may be a central processing unit, microprocessor, image processor, digital signal processor, or any other suitable processor. The processor has data and/or signal processing functionality. The processor may be implemented in software, hardware, or a combination thereof. Non-transitory computer readable storage media include any suitable medium that can store program code, such as magnetic disks, hard disks, optical disks, flash memory, read-only memory, random-access memory, and the like. The term "a and/or B" denotes all possible combinations of a and B, such as a alone, B alone or a and B. The term "at least one A or B" or "at least one of A and B" means similar to "A and/or B" and may include only A, only B, or both A and B. The singular forms "a", "an" and "the" may include the plural forms as well.
The conventional OCR recognition model is generally a generalized recognition model, and such recognition model cannot simultaneously perform detailed and accurate recognition on images to be recognized in all application scenes under the condition that more and more images of different application scenes which need to be recognized are present, that is, the generalized recognition model cannot adapt to various specific recognition images, and the recognition effect is poor.
In the embodiment of the invention, a preset OCR recognition model can be initially trained by using a first type of image sample with label data to obtain an initial OCR recognition model, then a second type of image sample without label data is recognized by using the initial OCR recognition model, and the second type of image sample is labeled according to a recognition result to determine the label data of the second type of image sample. Because the first type of image sample has accurate label data, the initial OCR recognition model trained by the first type of image sample has higher OCR recognition capability, and then the OCR recognition result obtained by performing OCR recognition on the second type of image sample by using the initial OCR recognition model (including but not limited to the position of one or more image recognition areas in the second type of image sample, the business data recorded in each image recognition area and the data type thereof) is also a more accurate result, so that the label data of the second type of image sample generated according to the OCR recognition result is also more accurate label data. That is to say, according to the OCR model training method of the embodiment of the present invention, through the initial OCR recognition model, not only can automatic labeling of the label data of the second type image sample be realized, but also the labeled label data can have higher accuracy. In practical application, in order to ensure the accuracy of the label data of the first-class image samples, a small number of first-class image samples can be labeled in a manual labeling mode, and then the small number of first-class image samples are used by the OCR model training method according to the embodiment of the invention to automatically label the label data of a large number of second-class image samples, so that the workload of manual labeling can be greatly reduced on the premise of ensuring that the second-class image samples have label data with higher accuracy. Further, after label data labeling of the second type of image sample is completed, the initial OCR recognition model can be retrained by simultaneously using the first type of image sample and the second type of image sample, so that the OCR recognition capability of the OCR recognition model meeting the requirement is further improved, and the final OCR recognition model is obtained.
Referring to FIG. 1, FIG. 1 is a flow chart illustrating the main steps of an OCR recognition model training method according to an embodiment of the present invention. As shown in fig. 1, the OCR recognition model training method in the embodiment of the present invention mainly includes the following steps:
step S101: a first type of image sample is acquired with label data.
The image recognition area refers to an image area containing information to be recognized, namely a target area for performing OCR recognition on the first type of image sample. The business data refers to data recorded in the image recognition area, and the data are target data for performing OCR recognition on the first type image sample. In one example, if the first type of image sample is a bank card image and a bank card number on the bank card image needs to be identified, an area on the bank card image containing the bank card number may be selected as an image identification area and a location of the image identification area may be obtained, and the data type of the bank card number may be set as "card number". Then, according to the above, it can be determined that the tag data of the bank card image includes "the position of the area containing the bank card number", "the bank card number", and the data category "card number".
In an implementation manner of the embodiment of the present invention, the tag data of the first type image sample may include a position of each image recognition area in the first type image sample, service data recorded in each image recognition area, and a data type thereof, and for this reason, the first type image sample with the tag data may be obtained through steps S301 to S302 shown in fig. 3 in this implementation manner.
Step S301: and responding to the received annotation instruction, and acquiring annotation information of the to-be-annotated image sample specified in the annotation instruction, wherein the annotation information can comprise the position of each image identification area in the to-be-annotated image, the business data recorded in each image identification area and the data type of the business data.
Step S302: and generating label data of the image sample to be labeled according to the labeling information, and labeling the image sample to be labeled according to the generated label data to obtain a first-class image sample with the label data.
The annotation information can be determined according to information that a user annotates on the image sample to be annotated through a visual interface. Specifically, in the present embodiment, the position of the image recognition area in the annotation information is determined according to the position of the area selected by the user on the image sample to be annotated by frame selection on the visual interface, and if the position of the selected area is directly used as the position of the image recognition area, the position obtained by scaling the position of the selected area may be used as the position of the image recognition area. The service data and the category thereof in the annotation information are determined according to the service data and the category thereof which are input by a user on the visual interface aiming at each image identification area.
The user manually marks the first type of image sample on a visual interface, and manually selects a position to be identified, such as a name, a date, a number, letters and the like, and the position and real information of the identification content. And setting the sample which is labeled according to the manual work and comprises the real information as a first type of image sample.
Step S102: performing model training on a preset OCR (optical character recognition) model by adopting a first type of image sample to obtain an initial OCR model;
after the first-class image samples are set, the first-class image samples are required to be used for carrying out first training on the preset OCR recognition model, so that the preset OCR recognition model has certain OCR recognition capability. It should be noted that, in the embodiment of the present invention, a model structure of an OCR recognition model that is conventional in the OCR technical field may be adopted to construct the preset OCR recognition model. Meanwhile, a conventional model training method can be adopted to perform model training on the preset OCR recognition model by using the first type of image samples. For the sake of brevity, the model structure of the preset OCR recognition model and the model training method that can be adopted are not described again here.
Step S103: and performing OCR recognition on the second type image sample of the non-label data by using an initial OCR recognition model.
The trained initial OCR recognition model can recognize the second type of image samples to be recognized to a certain degree, and the initial OCR recognition model is used for recognizing the second type of samples without label data to obtain the recognition result of the initial OCR recognition model. As can be seen from the foregoing step S101, the label data of the first type image sample may include the position of each image recognition area in the first type image sample, the service data recorded in each image recognition area, and the data type thereof, so that the initial OCR recognition model trained by using the first type image sample has the capability of determining the position of the image recognition area in the image to be detected, and recognizing the service data recorded in the image recognition area, and the data type thereof, that is, in this embodiment, the recognition result of performing OCR recognition on the second type image sample without label data by using the initial OCR recognition model may include the positions of one or more image recognition areas in the second type image sample, the service data recorded in each image recognition area, and the data type thereof. It should be noted that the meanings of "the position of the image recognition area", "the service data recorded in the image recognition area", and "the data category" in the second type image sample are similar to the meanings of "the position of the image recognition area", "the service data recorded in the image recognition area", and "the data category" in step S101, respectively, and are not repeated herein for brevity of description.
Step S104: and generating label data of the second type of image samples according to the OCR recognition result, and labeling the second type of image samples according to the generated label data. It should be noted that, in the implementation of the present invention, a conventional tag data generation method in the data processing technology field may be adopted, and tag data of a second type image sample is generated according to "the position of one or more image identification areas in the second type image sample, the service data recorded in each image identification area, and the data type thereof", which is not described herein again for brevity of description.
Step S105: and performing model training on the initial OCR recognition model by adopting the first type of image sample and the second type of image sample marked by the label to obtain a final OCR recognition model.
And training the initial OCR recognition model through the image sample with the artificially marked real information and the image sample of the recognition result of the initial OCR recognition model. The recognition result of the initial OCR recognition model may include the recognized location, category, specific content, etc., for example, the category of the real sample is the identification card, and the recognition result is the bank card, which is an obvious category recognition error, and for example, the location of the recognition result is the birth date column of the identification card, and the real location is also the birth date column of the identification card, i.e., the recognition location is correct.
In an implementation manner of the embodiment of the present invention, the final OCR recognition model obtained through training may be further configured to have an ability to evaluate a sample acquisition difficulty, in addition to an ability to perform OCR recognition on an image, for example, if a noise value of the sample is too high, that is, the sample is influenced too much, or if the sample is damaged or has a surface contamination degree too high, the recognition of the OCR recognition model is influenced, the OCR recognition model may also output the evaluation of the acquisition difficulty of the sample information, for example, a confidence of the acquired information, and when the acquired confidence is too low, the OCR recognition model may send a prompt to a user.
Application scenarios of the OCR recognition model in the embodiment of the present invention include, but are not limited to: card identification, bill identification, and the like. The card identification can include bank card identification, driver's license identification, identity card identification and the like. In the embodiment of the invention, the OCR recognition model special for each application scene can be obtained by training the image samples under different application scenes. Further, in an implementation manner of the embodiment of the present invention, when the initial OCR recognition models corresponding to each service scene are obtained by respectively training the first type image samples and the second type image samples in different service scenes, the step S105 may perform model training on the initial OCR recognition models according to the following steps 1 to 2 to obtain the final OCR recognition models:
step 1: and generating a model training queue according to the training completion time corresponding to each initial OCR recognition model.
Step 2: and sequentially carrying out model training on each initial OCR recognition model according to the training sequence corresponding to each initial OCR recognition model in the model training queue.
In addition, in the embodiment, the model training progress of the initial OCR recognition model can be displayed through the visual interface, so that the user can control the training progress in real time, and the training effect required by the user is achieved.
In the embodiment of the invention, the training of a plurality of initial OCR recognition models can be finished by performing queue type management and control on the model training, so that the plurality of initial OCR recognition models respectively meet the recognition requirements of different types of samples.
According to the OCR recognition model training method embodiments in the above steps S101 to S105, not only can automatic labeling of label data be performed on the second type image samples without label data, but also the labeled label data can have higher accuracy, so that the workload of manual labeling can be greatly reduced on the premise of ensuring that the second type image samples have label data with higher accuracy. After the labeling of the label data of the second type of image sample is completed, the initial OCR recognition model can be retrained by using the first type of image sample with the label data and the automatically labeled second type of image sample at the same time, so that the OCR recognition capability of obtaining the OCR recognition model meeting the requirement is further improved.
Further, in another embodiment of the OCR recognition model training method according to the present invention, the OCR recognition model training method may further include step S206 and step S207 as shown in fig. 2, in addition to step S101-step S105 in the aforementioned embodiment of the OCR recognition model training method.
Step S206: and generating a download path of the final OCR recognition model according to the storage position of the final OCR recognition model.
Step S207: and generating and displaying the release information of the final OCR recognition model according to the download path.
In the embodiment of the invention, the trained OCR recognition model is stored to the preset position and the download path is generated, so that a user can download the trained OCR recognition model to the electronic equipment or the computer through the download path in any required scene to complete the recognition of the image sample of the specific type without additional training, thereby saving the use time.
In one implementation manner of the embodiment of the present invention, a user may save and release the final OCR recognition model to obtain a product of the OCR recognition model adapted to a certain usage scenario. In a usage scenario, a user may view the training progress of an initial OCR model through a visual model training progress real-time monitoring interface shown in fig. 4 to 5, where an abscissa of a curve in fig. 4 represents the training times, an ordinate represents a loss value of a loss function adopted by model training, an abscissa of a curve in fig. 5 represents the training times, and an ordinate represents accuracy of a model recognition result; through the curve diagrams in the two visual model progress real-time monitoring interfaces, a user can clearly check the training progress of the current OCR recognition model, whether the loss value of the loss function and the accuracy of the model recognition result reach the standard or not, and training can be continued or stopped in a self-defined mode according to the real-time training progress.
It should be noted that, although the foregoing embodiments describe each step in a specific sequence, those skilled in the art will understand that, in order to achieve the effect of the present invention, different steps do not necessarily need to be executed in such a sequence, and they may be executed simultaneously (in parallel) or in other sequences, and these changes are all within the protection scope of the present invention.
Furthermore, the invention also provides an OCR recognition model training device.
Referring to fig. 6, fig. 6 is a main structural block diagram of an OCR recognition model training apparatus according to an embodiment of the present invention. As shown in fig. 6, the OCR recognition model training apparatus in the embodiment of the present invention mainly includes a sample obtaining module 61, a first model training module 62, an attribute class predicting module 63, a label labeling module 64, and a second model training module 65. In some embodiments, one or more of the sample acquisition module 61, the first model training module 62, the attribute class prediction module 63, the label labeling module 64, and the second model training module 65 may be combined together into one module. In some embodiments, the sample acquisition module 61 may be configured to acquire a first type of image sample with the label data. The first model training module 62 may be configured to perform model training on a preset OCR recognition model using the first type of image samples to obtain an initial OCR recognition model. The attribute class prediction module 63 may be configured to perform OCR recognition on the second type image samples of unlabeled data using an initial OCR recognition model. The labeling module 64 may be configured to generate label data of the second type of image sample according to the OCR recognition result, and label the second type of image sample according to the generated label data. The second model training module 65 may be configured to perform model training on the initial OCR recognition model using the first type image samples and the labeled second type image samples to obtain a final OCR recognition model. In one embodiment, the description of the specific implementation function may refer to steps S101 to S105.
In one embodiment, the tag data may include a position of each image recognition area in the first type image sample, the business data recorded in each image recognition area, and a data category thereof, and the sample acquiring module 61 may be further configured to perform the following operations:
in response to the received annotation instruction, obtaining annotation information of the image sample to be annotated specified in the annotation instruction, wherein the annotation information can comprise the position of each image identification area in the image to be annotated, the business data recorded in each image identification area and the data type of the business data; generating label data of the image sample to be labeled according to the labeling information, and labeling the image sample to be labeled according to the generated label data to obtain a first type of image sample with the label data; the annotation information can be determined according to information annotated on the image sample to be annotated by a user through a visual interface. In one embodiment, the description of the specific implementation function may be referred to in step S101.
In one embodiment, the position of the image identification area in the annotation information is determined according to the position of the area selected by the user on the image sample to be annotated by means of frame selection on the visual interface, and the service data and the category thereof in the annotation information are determined according to the service data and the category thereof which are input by the user on the visual interface aiming at each image identification area. In one embodiment, the description of the specific implementation function may be referred to in step S101.
In one embodiment, the OCR recognition model training apparatus shown in fig. 6 may further include a model issuing module, and in this embodiment, the model issuing module may be configured to perform the following operations:
generating a download path of the final OCR recognition model according to the storage position of the final OCR recognition model; generating and displaying final release information of the OCR recognition model according to the download path;
in one embodiment, the second model training module may comprise the first model training unit and/or the second model training unit;
the first model training unit can be configured to, when the initial OCR recognition models corresponding to the service scenes are respectively trained by using the first type image samples and the second type image samples under different service scenes, model train each initial OCR recognition model by performing the following operations: generating a model training queue according to the training completion time corresponding to each initial OCR recognition model; and sequentially carrying out model training on each initial OCR recognition model according to the training sequence corresponding to each initial OCR recognition model in the model training queue. In one embodiment, the detailed implementation functions may be described in reference to steps S106-S107.
In one embodiment, the second model training unit may be configured to display a model training progress of the initial OCR recognition model through the visualization interface. In one embodiment, the detailed implementation functions may be described in reference to steps S106-S107.
The OCR recognition model training apparatus is used for executing the OCR recognition model training method embodiment shown in fig. 1, and the technical principles, the solved technical problems and the generated technical effects of the two are similar, and it can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working process and the related description of the OCR recognition model training apparatus may refer to the content described in the OCR recognition model training method embodiment, and are not repeated here.
Furthermore, the invention also provides an OCR recognition model training device.
Referring to fig. 7, fig. 7 is a main block diagram of an OCR recognition model training apparatus according to another embodiment of the present invention. As shown in fig. 7, the OCR recognition model training apparatus in the embodiment of the present invention mainly includes:
a data processing module 71, a model training module 72, a model deployment verification module 73, a configuration model algorithm, and an output recognition engine module 74.
In some embodiments, the data processing module 71 has the same functions as a part of the aforementioned sample acquiring module 61 in fig. 6, and is capable of acquiring labeling data and labeling pictures; the model training module 72 has the same functions as part of the first model training module 62, the attribute category prediction module 63 and the label labeling module 64, and can complete the training of the initial OCR recognition model; the model deployment verification module 73 has the same function as part of the second model training module 65, and can complete the retraining of the initial OCR recognition model and improve the recognition accuracy of the OCR recognition model;
in addition, the functions performed by the configuration model algorithm and the output recognition engine module 74 in the configuration mode are as in steps S206-S207, and are not described herein for brevity.
It will be understood by those skilled in the art that all or part of the flow of the method according to the above-described embodiment may be implemented by a computer program, which may be stored in a computer-readable storage medium and used to implement the steps of the above-described embodiments of the method when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying said computer program code, media, usb disk, removable hard disk, magnetic diskette, optical disk, computer memory, read-only memory, random access memory, electrical carrier wave signals, telecommunication signals, software distribution media, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
Furthermore, the invention also provides a control device. In an embodiment of the control device according to the present invention, the control device comprises a processor and a storage device, the storage device may be configured to store a program for executing the OCR recognition model training method of the above-mentioned method embodiment, and the processor may be configured to execute the program in the storage device, the program including but not limited to the program for executing the OCR recognition model training method of the above-mentioned method embodiment. For convenience of explanation, only the parts related to the embodiments of the present invention are shown, and details of the specific techniques are not disclosed. The control device may be a control device apparatus formed including various electronic apparatuses.
Further, the invention also provides a computer readable storage medium. In one computer-readable storage medium embodiment according to the present invention, a computer-readable storage medium may be configured to store a program for executing the OCR recognition model training method of the above-described method embodiment, which may be loaded and executed by a processor to implement the OCR recognition model training method described above. For convenience of explanation, only the parts related to the embodiments of the present invention are shown, and details of the specific techniques are not disclosed. The computer readable storage medium may be a storage device formed by including various electronic devices, and optionally, the computer readable storage medium is a non-transitory computer readable storage medium in the embodiment of the present invention.
Further, it should be understood that, since the configuration of each module is only for explaining the functional units of the apparatus of the present invention, the corresponding physical devices of the modules may be the processor itself, or a part of software, a part of hardware, or a part of a combination of software and hardware in the processor. Thus, the number of individual modules in the figures is merely illustrative.
Those skilled in the art will appreciate that the various modules in the apparatus may be adaptively split or combined. Such splitting or combining of specific modules does not cause the technical solutions to deviate from the principle of the present invention, and therefore, the technical solutions after splitting or combining will fall within the protection scope of the present invention.
So far, the technical solution of the present invention has been described with reference to one embodiment shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (10)

1. An OCR recognition model training method, the method comprising:
acquiring a first type of image sample with label data;
performing model training on a preset OCR (optical character recognition) model by using the first type of image sample to obtain an initial OCR model;
performing OCR recognition on the second type image sample of the label-free data by using the initial OCR recognition model;
generating label data of the second type image sample according to the OCR recognition result, and labeling the second type image sample according to the generated label data;
and performing model training on the initial OCR recognition model by adopting the first type of image sample and the second type of image sample marked by the label to obtain a final OCR recognition model.
2. An OCR recognition model training method according to claim 1, wherein the label data of the first type image sample and the second type image sample each include a position of an image recognition area, business data recorded in each image recognition area, and a data category thereof;
the first type of image sample with the label data is obtained by the following method:
responding to a received annotation instruction, and acquiring annotation information of an image sample to be annotated specified in the annotation instruction, wherein the annotation information comprises the position of each image identification area in the image to be annotated, and business data and data types thereof recorded in each image identification area;
generating label data of the image sample to be labeled according to the labeling information and labeling the image sample to be labeled according to the generated label data to obtain a first type of image sample with the label data;
the annotation information is determined according to information which is annotated on the image sample to be annotated by a user through a visual interface.
3. An OCR recognition model training method according to claim 2, wherein the position of the image recognition area in the annotation information is determined according to the position of the area selected by the user on the image sample to be annotated by means of frame selection on the visual interface, and the business data and the category thereof in the annotation information are determined according to the business data and the category thereof entered by the user for each image recognition area on the visual interface.
4. An OCR recognition model training method according to any one of claims 1 to 3 and further comprising, after the step of "model training said initial OCR recognition model to obtain a final OCR recognition model":
generating a download path of the final OCR recognition model according to the storage position of the final OCR recognition model;
generating and displaying release information of the final OCR recognition model according to the download path;
and/or the like and/or,
when the first-class image samples and the second-class image samples under different service scenes are used for respectively training to obtain the initial OCR recognition models corresponding to the service scenes, the step of performing model training on the initial OCR recognition models specifically comprises the following steps:
generating a model training queue according to the training completion time corresponding to each initial OCR recognition model;
sequentially carrying out model training on each initial OCR recognition model according to the training sequence corresponding to each initial OCR recognition model in the model training queue;
and/or the like and/or,
the step of "performing model training on the initial OCR recognition model" specifically includes:
and displaying the model training progress of the initial OCR recognition model through a visual interface.
5. An OCR recognition model training apparatus, the apparatus comprising:
the system comprises a sample acquisition module, a data acquisition module and a data acquisition module, wherein the sample acquisition module is configured to acquire a first type of image sample with label data, and the label data comprises the position of each image identification area in the first type of image sample, business data recorded in each image identification area and the data category of the business data;
a first model training module configured to perform model training on a preset OCR recognition model by using the first type of image sample to obtain an initial OCR recognition model;
an attribute category prediction module configured to perform OCR recognition on a second type of image sample of unlabeled data using the initial OCR recognition model;
a label labeling module configured to generate label data of the second type image sample according to a result of OCR recognition, and label the second type image sample according to the generated label data, wherein the result of OCR recognition includes a position of one or more image recognition areas in the second type image sample, business data recorded in each image recognition area, and a data category thereof;
and the second model training module is configured to perform model training on the initial OCR recognition model by adopting the first type of image sample and the second type of image sample labeled by the label so as to obtain a final OCR recognition model.
6. An OCR recognition model training apparatus as claimed in claim 5, wherein the label data of the first type image sample and the second type image sample each comprise the location of an image recognition area, the business data recorded in each image recognition area and its data category;
the sample acquisition module is further configured to perform the following operations:
responding to a received annotation instruction, and acquiring annotation information of an image sample to be annotated specified in the annotation instruction, wherein the annotation information comprises the position of each image identification area in the image to be annotated, and business data and data types thereof recorded in each image identification area;
generating label data of the image sample to be labeled according to the labeling information and labeling the image sample to be labeled according to the generated label data to obtain a first type of image sample with the label data;
the annotation information is determined according to information which is annotated on the image sample to be annotated by a user through a visual interface.
7. An OCR recognition model training apparatus according to claim 6, wherein the position of the image recognition area in the annotation information is determined according to the position of the area selected by the user on the image sample to be annotated by means of frame selection on the visual interface, and the business data and the category thereof in the annotation information are determined according to the business data and the category thereof entered by the user for each image recognition area on the visual interface.
8. An OCR recognition model training apparatus as claimed in any of claims 5 to 7, wherein the apparatus comprises a model issuing module configured to:
generating a download path of the final OCR recognition model according to the storage position of the final OCR recognition model;
generating and displaying release information of the final OCR recognition model according to the download path;
and/or the like and/or,
the second model training module comprises a first model training unit and/or a second model training unit;
the first model training unit is configured to, when initial OCR recognition models corresponding to the service scenes are obtained by respectively training first-class image samples and second-class image samples under different service scenes, perform model training on each initial OCR recognition model by performing the following operations:
generating a model training queue according to the training completion time corresponding to each initial OCR recognition model;
sequentially carrying out model training on each initial OCR recognition model according to the training sequence corresponding to each initial OCR recognition model in the model training queue;
the second model training unit is configured to display a model training progress of the initial OCR recognition model through a visualization interface.
9. A control apparatus comprising a processor and a storage device, the storage device being adapted to store a plurality of program codes, wherein the program codes are adapted to be loaded and run by the processor to perform the OCR recognition model training method of any of claims 1 to 4.
10. A computer readable storage medium having stored therein a plurality of program codes, characterized in that the program codes are adapted to be loaded and executed by a processor to perform the OCR recognition model training method according to any one of claims 1 to 4.
CN202110485412.9A 2021-04-30 2021-04-30 OCR recognition model training method, device and computer readable storage medium Pending CN113159212A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110485412.9A CN113159212A (en) 2021-04-30 2021-04-30 OCR recognition model training method, device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110485412.9A CN113159212A (en) 2021-04-30 2021-04-30 OCR recognition model training method, device and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN113159212A true CN113159212A (en) 2021-07-23

Family

ID=76873109

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110485412.9A Pending CN113159212A (en) 2021-04-30 2021-04-30 OCR recognition model training method, device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN113159212A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113673501A (en) * 2021-08-23 2021-11-19 广东电网有限责任公司 OCR classification method, system, electronic device and storage medium
CN113781607A (en) * 2021-09-17 2021-12-10 平安科技(深圳)有限公司 Method, device and equipment for processing annotation data of OCR (optical character recognition) image and storage medium
CN114092759A (en) * 2021-10-27 2022-02-25 北京百度网讯科技有限公司 Training method and device of image recognition model, electronic equipment and storage medium
CN115512348A (en) * 2022-11-08 2022-12-23 浪潮金融信息技术有限公司 Object identification method, system, equipment and medium based on double identification technology
WO2023015922A1 (en) * 2021-08-13 2023-02-16 北京百度网讯科技有限公司 Image recognition model training method and apparatus, device, and storage medium
CN116543392A (en) * 2023-04-19 2023-08-04 钛玛科(北京)工业科技有限公司 Labeling method for deep learning character recognition

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108960232A (en) * 2018-06-08 2018-12-07 Oppo广东移动通信有限公司 Model training method, device, electronic equipment and computer readable storage medium
CN110796143A (en) * 2019-10-31 2020-02-14 天津大学 Scene text recognition method based on man-machine cooperation
CN110909780A (en) * 2019-11-14 2020-03-24 腾讯科技(深圳)有限公司 Image recognition model training and image recognition method, device and system
CN111539309A (en) * 2020-04-21 2020-08-14 广州云从鼎望科技有限公司 Data processing method, system, platform, equipment and medium based on OCR
CN111783993A (en) * 2019-05-23 2020-10-16 北京京东尚科信息技术有限公司 Intelligent labeling method and device, intelligent platform and storage medium
CN112418304A (en) * 2020-11-19 2021-02-26 北京云从科技有限公司 OCR (optical character recognition) model training method, system and device
CN112434794A (en) * 2020-11-30 2021-03-02 国电南瑞科技股份有限公司 Computer vision data set semi-automatic labeling method and system based on deep learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108960232A (en) * 2018-06-08 2018-12-07 Oppo广东移动通信有限公司 Model training method, device, electronic equipment and computer readable storage medium
CN111783993A (en) * 2019-05-23 2020-10-16 北京京东尚科信息技术有限公司 Intelligent labeling method and device, intelligent platform and storage medium
CN110796143A (en) * 2019-10-31 2020-02-14 天津大学 Scene text recognition method based on man-machine cooperation
CN110909780A (en) * 2019-11-14 2020-03-24 腾讯科技(深圳)有限公司 Image recognition model training and image recognition method, device and system
CN111539309A (en) * 2020-04-21 2020-08-14 广州云从鼎望科技有限公司 Data processing method, system, platform, equipment and medium based on OCR
CN112418304A (en) * 2020-11-19 2021-02-26 北京云从科技有限公司 OCR (optical character recognition) model training method, system and device
CN112434794A (en) * 2020-11-30 2021-03-02 国电南瑞科技股份有限公司 Computer vision data set semi-automatic labeling method and system based on deep learning

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023015922A1 (en) * 2021-08-13 2023-02-16 北京百度网讯科技有限公司 Image recognition model training method and apparatus, device, and storage medium
CN113673501A (en) * 2021-08-23 2021-11-19 广东电网有限责任公司 OCR classification method, system, electronic device and storage medium
CN113673501B (en) * 2021-08-23 2023-01-13 广东电网有限责任公司 OCR classification method, system, electronic device and storage medium
CN113781607A (en) * 2021-09-17 2021-12-10 平安科技(深圳)有限公司 Method, device and equipment for processing annotation data of OCR (optical character recognition) image and storage medium
CN113781607B (en) * 2021-09-17 2023-09-19 平安科技(深圳)有限公司 Processing method, device, equipment and storage medium for labeling data of OCR (optical character recognition) image
CN114092759A (en) * 2021-10-27 2022-02-25 北京百度网讯科技有限公司 Training method and device of image recognition model, electronic equipment and storage medium
CN115512348A (en) * 2022-11-08 2022-12-23 浪潮金融信息技术有限公司 Object identification method, system, equipment and medium based on double identification technology
CN116543392A (en) * 2023-04-19 2023-08-04 钛玛科(北京)工业科技有限公司 Labeling method for deep learning character recognition
CN116543392B (en) * 2023-04-19 2024-03-12 钛玛科(北京)工业科技有限公司 Labeling method for deep learning character recognition

Similar Documents

Publication Publication Date Title
CN113159212A (en) OCR recognition model training method, device and computer readable storage medium
US11321583B2 (en) Image annotating method and electronic device
EP3432197B1 (en) Method and device for identifying characters of claim settlement bill, server and storage medium
US11004234B2 (en) Method and apparatus for annotating point cloud data
CN109034159A (en) image information extracting method and device
CN109034069B (en) Method and apparatus for generating information
CN109886928A (en) A kind of target cell labeling method, device, storage medium and terminal device
US20210090266A1 (en) Method and device for labeling point of interest
CN110533940B (en) Method, device and equipment for identifying abnormal traffic signal lamp in automatic driving
CN109657675B (en) Image annotation method and device, computer equipment and readable storage medium
CN112380981A (en) Face key point detection method and device, storage medium and electronic equipment
EP3588325A1 (en) Method, device and system for processing image tagging information
CN110348463A (en) The method and apparatus of vehicle for identification
CN109885929B (en) Automatic driving decision planning data reproduction method and device
CN110188303A (en) Page fault recognition methods and device
CN112488222B (en) Crowdsourcing data labeling method, system, server and storage medium
CN111259980B (en) Method and device for processing annotation data
WO2020103462A1 (en) Video search method and apparatus, computer device, and storage medium
CN113033297B (en) Method, device, equipment and storage medium for programming real object
CN114419493A (en) Image annotation method and device, electronic equipment and storage medium
CN110874554A (en) Action recognition method, terminal device, server, system and storage medium
CN112434585A (en) Method, system, electronic device and storage medium for identifying virtual reality of lane line
CN112241749A (en) Character recognition model training method, device and equipment
CN106373121A (en) Blurred image recognition method and device
CN111914863A (en) Target detection method and device, terminal equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination