CN113642619B

CN113642619B - Training method, training device, training equipment and training readable storage medium for character recognition model

Info

Publication number: CN113642619B
Application number: CN202110861484.9A
Authority: CN
Inventors: 杜吉祥; 郑剑锋; 张洪博; 翟传敏
Original assignee: Huaqiao University
Current assignee: Huaqiao University
Priority date: 2021-07-29
Filing date: 2021-07-29
Publication date: 2023-12-26
Anticipated expiration: 2041-07-29
Also published as: CN113642619A

Abstract

The invention provides a training method, a training device, training equipment and a readable storage medium of a character recognition model, comprising the following steps: invoking the synthetic data set to train the initial learning model to obtain an intermediate model; and invoking a real data set to train the intermediate model to obtain a character recognition model, wherein the character recognition model is used for receiving an image acquired by an image acquisition device and generating recognition information of a character object and corresponding position information of the character object according to the image. The method solves the problem that the irregular distribution or the curved characters cannot be identified in the prior art.

Description

Training method, training device, training equipment and training readable storage medium for character recognition model

Technical Field

The present invention relates to the field of artificial intelligence, and in particular, to a training method, apparatus, device, and readable storage medium for a text recognition model.

Background

In the prior art, a text recognition method in an open scene exists, but because the recognized objects are relatively regular, only a few correction methods such as rotation angles are needed to successfully recognize the objects. The existing technology has poor or even unavailable recognition effect on a large quantity of curved and irregular text arrangement in an open scene. At the same time, words such as those on the dashboard also exhibit a large pitch, and identifying them means identifying individual characters, which is not possible with current methods. On the other hand, the present word recognition method divides the task of word recognition into two steps of recognition and detection, so that the recognition effect is reduced. In the case of deep-learning word recognition, this also means complexity in the implementation of the method.

In view of this, the present application is presented.

Disclosure of Invention

The invention discloses a training method, device and equipment of a character recognition model and a readable storage medium, aiming at solving the problem that irregularly distributed or bent characters cannot be recognized in the prior art.

The first embodiment of the invention provides a training method of a character recognition model, which comprises the following steps:

invoking the synthetic data set to train the initial learning model to obtain an intermediate model;

and invoking a real data set to train the intermediate model to obtain a character recognition model, wherein the character recognition model is used for receiving an image acquired by an image acquisition device and generating recognition information of a character object and corresponding position information of the character object according to the image.

Preferably, the training of the initial learning model by calling the synthetic data set is specifically:

inputting the image information in the synthetic data set into the initial learning model, generating a text frame, and filling a Gaussian diagram into the text frame;

the text border filled with the gaussian image is expanded into a plurality of channels in the form of one-hot so as to recognize different characters.

Preferably, the training of the intermediate model by the calling real data set is specifically:

inputting image information of the real dataset into the intermediate model;

receiving an output result of the intermediate model, calling a watershed algorithm to operate the output result, and generating a character label;

judging whether the area of the high-resolution area of the character tag and the number of the divided characters meet preset conditions or not;

if yes, reserving the character label;

and if not, deleting the character tag.

Preferably, the text recognition model is configured to receive an image acquired by an image acquisition device, and generate recognition information of a text object and corresponding position information thereof according to the image specifically includes:

inputting the images acquired by the image acquisition device into the character recognition model, and generating a plurality of category images;

acquiring a first image with the maximum position value of each area in a plurality of category images;

and identifying each first image to generate identification information of the text object and corresponding position information thereof.

The second embodiment of the invention provides a training method device of a character recognition model, which comprises the following steps:

the first training unit is used for calling the synthetic data set to train the initial learning model so as to obtain an intermediate model;

and the second training unit is used for calling the real data set to train the intermediate model so as to obtain a character recognition model, wherein the character recognition model is used for receiving the image acquired by the image acquisition device and generating recognition information of the character object and corresponding position information of the character object according to the image.

Preferably, the first training unit is specifically configured to:

Preferably, the second training unit is specifically configured to:

inputting image information of the real dataset into the intermediate model;

if yes, reserving the character label;

and if not, deleting the character tag.

Preferably, the second training unit is further configured to:

A third embodiment of the present invention provides a training device for a word recognition model, including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor executing the computer program to implement a training method for a word recognition model as described in any one of the above.

A fourth embodiment of the present invention provides a readable storage medium, in which a computer program is stored, where the computer program can be executed by a processor of a device in which the computer readable storage medium is located, to implement a training method for a word recognition model according to any one of the above.

According to the training method, the training device, the training equipment and the readable storage medium for the character recognition model, firstly, an initial learning model is trained by calling a large-scale synthetic data set of an open source to modify the weight of the initial learning model, and the intermediate model is generated, wherein the intermediate model can distinguish the position information of characters and how to distinguish different characters, then, a real data set is called to train the intermediate model, the training is carried out by adopting data lacking character labeling, and meanwhile, an optimal label retaining strategy is adopted in the training process, so that the vibration of the model is reduced, the precision of the model is improved, and the problem that irregularly distributed or bent characters cannot be recognized in the prior art is solved.

Drawings

FIG. 1 is a schematic flow chart of a training method of a character recognition model according to a first embodiment of the present invention;

FIG. 2 is a schematic diagram of an identification channel provided by the present invention;

FIG. 3 is a schematic diagram of label generation provided by the present invention;

FIG. 4 is a schematic block diagram of a training device for a character recognition model according to a second embodiment of the present invention;

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

For a better understanding of the technical solution of the present invention, the following detailed description of the embodiments of the present invention refers to the accompanying drawings.

It should be understood that the described embodiments are merely some, but not all, embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be understood that the term "and/or" as used herein is merely one relationship describing the association of the associated objects, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

Depending on the context, the word "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to detection". Similarly, the phrase "if determined" or "if detected (stated condition or event)" may be interpreted as "when determined" or "in response to determination" or "when detected (stated condition or event)" or "in response to detection (stated condition or event), depending on the context.

References to "first\second" in the embodiments are merely to distinguish similar objects and do not represent a particular ordering for the objects, it being understood that "first\second" may interchange a particular order or precedence where allowed. It is to be understood that the "first\second" distinguishing objects may be interchanged where appropriate to enable the embodiments described herein to be implemented in sequences other than those illustrated or described herein.

Specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

Referring to fig. 1, a first embodiment of the present invention provides a training method of a text recognition model, which may be executed by a training device of the text recognition model (hereinafter referred to as training device), and in particular, by one or more processors in the training device, so as to implement the following steps:

s101, invoking a synthetic data set to train an initial learning model so as to obtain an intermediate model;

in this embodiment, the training device may be a server or a terminal device (such as a smart phone, a smart printer, or other smart devices) located at a cloud end, where data for training an initial learning model may be stored in the training device, so as to implement training the initial model into a text recognition model capable of recognizing and recognizing irregularly distributed or curved text.

Specifically, in this embodiment, the image information in the synthetic dataset is input into the initial learning model, a text frame is generated, and a gaussian image is filled in the text frame;

it should be noted that the synthetic data set may be a large synthetic data set synth80k, where an image in the synthetic data is labeled with characters, a text frame may be quickly generated, and after the text frame is filled with a gaussian chart, the text frame may be used to represent position information of the characters.

As shown in fig. 2, the text border filled with gaussian is expanded into a plurality of channels in one-hot form to recognize different characters.

The number of the channels expanded by the character frame filled with the gaussian image is determined by the type of the character to be tested, and the character of the type is identified after expansion.

S102, invoking a real data set to train the intermediate model to obtain a character recognition model, wherein the character recognition model is used for receiving an image acquired by an image acquisition device and generating recognition information of a character object and corresponding position information of the character object according to the image.

In this embodiment, the training the intermediate model by calling the real dataset is specifically:

inputting image information of the real dataset into the intermediate model;

if yes, reserving the character label;

and if not, deleting the character tag.

It should be noted that, since the synthesized data set is still different from the real data set, further learning is necessary on the real picture data, but unlike the synthesized data, the real data set often lacks the labeling of characters. This is because the labeling of characters consumes a lot of manpower and material resources, and in this embodiment, weak supervision training of the model is performed on the basis that only text lines are labeled. Specifically, a method of training and manufacturing character labels is adopted, the output result is subjected to a watershed algorithm to obtain a text frame, and then a Gaussian label diagram for training is manufactured, as shown in fig. 3. Meanwhile, in the embodiment, in the process of manufacturing the character labels, an optimal label retaining strategy is adopted, and only high-quality labels can participate in subsequent model training, specifically, the areas of the high-resolution areas and the number of the segmented characters are reserved and continuously trained when the preset requirements are met, for example, when the areas of the high-resolution areas are as large as possible, the number of the characters of the high-resolution areas or the number of the same characters are required to be seen, and the large areas of the high-resolution areas are seen, so that the high-quality labels are screened out, vibration of the model can be greatly reduced, and the precision of the model is improved.

In this embodiment, the text recognition model is configured to receive an image acquired by an image acquisition device, and generate recognition information of a text object and corresponding position information thereof according to the image specifically includes:

After the image collected by the image collecting device is input into the text recognition model, the background and the foreground can be distinguished first, specifically, the background and the foreground can be distinguished by setting a threshold value, the part higher than the threshold value by 0.6 is set as 1, and the rest is set as 0, so that the foreground and the background can be distinguished. For the foreground part, the channel with the highest foreground position score is taken as the character category in the recognition channel, and it can be understood that an original image is predicted into tens of category images, and the characters in the original image, such as 0123456, are predicted, so that the area value corresponding to the first image 0 is the largest, and the position value of the characters in other category images at 0 is higher than that of the first image.

Compared with the prior art, the embodiment has a plurality of advantages and beneficial effects, and is specifically embodied in the following aspects:

1. the position of the character may be identified using only line or word level labels.

2. In the embodiment, through accurate positioning of the single character, the single character can be directly predicted, so that the recognition scheme is different from other character strings, and the recognition flexibility is greatly improved.

3. By detecting and identifying the end-to-end combination of the network, the method is more convenient to realize, and the two parts are mutually promoted to form a closed-loop identification scheme. The recognition accuracy is greatly improved.

4. By detecting and identifying the single characters, irregularly distributed or bent characters can be identified, the recall number of the characters in the picture is increased, and characters which cannot be identified by using a character string identification method are identified.

Referring to fig. 4, a training method apparatus for a text recognition model according to a second embodiment of the present invention includes:

a first training sheet 201, configured to invoke a synthetic data set to train the initial learning model, so as to obtain an intermediate model;

the second training unit 202 is configured to invoke the real data set to train the intermediate model to obtain a text recognition model, where the text recognition model is configured to receive an image acquired by the image acquisition device, and generate recognition information of a text object and corresponding position information of the text object according to the image.

Preferably, the first training unit is specifically configured to:

Preferably, the second training unit is specifically configured to:

inputting image information of the real dataset into the intermediate model;

if yes, reserving the character label;

and if not, deleting the character tag.

Preferably, the second training unit is further configured to:

Illustratively, the computer programs described in the third and fourth embodiments of the present invention may be divided into one or more modules, which are stored in the memory and executed by the processor to complete the present invention. The one or more modules may be a series of computer program instruction segments capable of performing particular functions for describing the execution of the computer program in the training device implementing a word recognition model. For example, the device described in the second embodiment of the present invention.

The processor may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which is a control center of the training method of a word recognition model, and which connects the various parts of the training method of a word recognition model using various interfaces and lines.

The memory may be used to store the computer program and/or the module, and the processor may implement various functions of a training method of a word recognition model by running or executing the computer program and/or the module stored in the memory and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, a text conversion function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, text message data, etc.) created according to the use of the cellular phone, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.

Wherein the modules may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as a stand alone product. Based on this understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and the computer program may implement the steps of each method embodiment described above when executed by a processor. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

It should be noted that the above-described apparatus embodiments are merely illustrative, and the units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the invention, the connection relation between the modules represents that the modules have communication connection, and can be specifically implemented as one or more communication buses or signal lines. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

The present invention is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

1. A training method for a character recognition model, comprising:

invoking a synthetic data set to train an initial learning model, specifically: inputting the image information in the synthetic data set into the initial learning model, generating a text frame, and filling a Gaussian diagram into the text frame; expanding the text frame filled with the Gaussian diagram into a plurality of channels in a one-hot mode to identify different characters so as to obtain an intermediate model;

invoking a real data set to train the intermediate model, and inputting image information of the real data set into the intermediate model; receiving an output result of the intermediate model, calling a watershed algorithm to operate the output result, and generating a character label; judging whether the area of the high-resolution area of the character tag and the number of the divided characters meet preset conditions or not; if yes, reserving the character label; if not, deleting the character label to obtain a character recognition model;

the character recognition model is used for receiving the image acquired by the image acquisition device and generating recognition information of a character object and corresponding position information thereof according to the image, and specifically comprises the following steps: inputting the images acquired by the image acquisition device into the character recognition model, and generating a plurality of category images; acquiring a first image with the maximum position value of each area in a plurality of category images; and identifying each first image to generate identification information of the text object and corresponding position information thereof.

2. The training method and device for the character recognition model is characterized by comprising the following steps of:

the first training unit is used for calling a synthetic data set to train an initial learning model, and is particularly used for inputting image information in the synthetic data set into the initial learning model, generating a text frame and filling a Gaussian diagram into the text frame;

expanding the text frame filled with the Gaussian diagram into a plurality of channels in a one-hot mode to identify different characters so as to obtain an intermediate model;

the second training unit is used for calling the real data set to train the intermediate model, and is specifically used for: inputting image information of the real dataset into the intermediate model; receiving an output result of the intermediate model, calling a watershed algorithm to operate the output result, and generating a character label; judging whether the area of the high-resolution area of the character tag and the number of the divided characters meet preset conditions or not; if yes, reserving the character label; if not, deleting the character label to obtain a character recognition model,

3. A training device for a word recognition model, comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor executing the computer program to implement a training method for a word recognition model according to claim 1.

4. A readable storage medium storing a computer program executable by a processor of a device in which the readable storage medium is located to implement a method of training a word recognition model according to claim 1.