CN111612081B

CN111612081B - Training method, device, equipment and storage medium for recognition model

Info

Publication number: CN111612081B
Application number: CN202010453100.5A
Authority: CN
Inventors: 张�杰; 邹雨晗; 徐倩
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2020-05-25
Filing date: 2020-05-25
Publication date: 2024-04-02
Anticipated expiration: 2040-05-25
Also published as: CN111612081A

Abstract

The invention discloses a training method, a device, equipment and a storage medium of an identification model, which relate to the field of financial science and technology, wherein the training method of the identification model comprises the following steps: acquiring an image to be trained, and constructing a simulation image according to the image to be trained; determining a training data set according to the image to be trained and the simulation image; based on the training data set, training the generated countermeasure network and the identification network in the preset neural network model to obtain an identification model. The invention realizes that the training data set is obtained by constructing the simulation image through the acquired image, and avoids the condition that the recognition accuracy of the recognition model obtained by training is low due to insufficient samples of the training data set, namely the embodiment improves the recognition accuracy of the recognition model obtained by training.

Description

Training method, device, equipment and storage medium for recognition model

Technical Field

The present invention relates to the field of artificial intelligence technologies of financial technologies (Fintech), and in particular, to a training method, apparatus, device, and storage medium for an identification model.

Background

With the development of computer technology, more and more technologies are applied in the financial field, and the traditional financial industry is gradually changed to financial technology (Fintech), and artificial intelligence technology is not exceptional, but due to the requirements of safety and real-time performance of the financial industry, the requirements of the artificial intelligence technology are also higher.

The deep learning based character recognition OCR (Optical Character Recognition ) method relies on a large amount of annotation data. In practical application, on one hand, real data (such as financial certificate data) in a specific field is very expensive and rare, and sometimes even suffers from privacy protection, so that a large amount of real data is often difficult to obtain; on the other hand, the time and the labor are consumed for marking the data, the cost is high, and the problem of marking the data is sometimes wrong. According to Zipf's law, the word frequency of a word in a corpus in natural language is inversely proportional to its rank in the word frequency table, so even if a large amount of labeling data is possessed, the occupancy of rarely used words is still far from sufficient. Some rarely used words encountered when OCR systems are actually online even do not appear in the training set. The improvement of the sparsity of the rarely used word data to the accuracy of the OCR system is mainly focused on 2 aspects: a. the uncommon words which do not appear in the training set cannot be identified; b. for the rare words in the training set, the training is insufficient due to the extreme unbalance of the number of samples, so that the recognition result is inaccurate. Particularly, the rare word is easily recognized as a common word having a near shape, for example, "the" web page "is easily recognized as a near word" having a large number of samples because of the extremely small number of samples.

Therefore, the recognition accuracy of the conventional recognition model is low.

Disclosure of Invention

The invention mainly aims to provide a training method, device and equipment for an identification model and a storage medium, and aims to solve the technical problem that the identification accuracy of the existing identification model is low.

In order to achieve the above object, the present invention provides a training method of an identification model, the training method of an identification model comprising the steps of:

acquiring an image to be trained, and constructing a simulation image according to the image to be trained;

determining a training data set according to the image to be trained and the simulation image;

based on the training data set, training the generated countermeasure network and the identification network in the preset neural network model to obtain an identification model.

Optionally, the recognition model is a text recognition model, and the step of constructing a simulation image according to the image to be trained includes:

acquiring tag characters corresponding to the image to be trained, and generating a target corpus containing rarely used words according to the tag characters;

and obtaining a background image corresponding to the image to be trained, and constructing a simulation image according to the target corpus and the background image.

Optionally, the step of obtaining the tag text corresponding to the image to be trained and generating the target corpus containing the uncommon words according to the tag text includes:

Acquiring tag characters corresponding to the image to be trained, and constructing an original corpus according to the tag characters;

determining the uncommon words corresponding to the original corpus and acquiring the contexts corresponding to the uncommon words;

and adding the rarely used words and the contexts corresponding to the rarely used words into the original corpus to obtain a target corpus.

Optionally, after the step of determining the rarely used word corresponding to the original corpus, the method further includes:

and inserting the uncommon words into the tag words to update the tag words in the original corpus to obtain a target corpus.

Optionally, the step of obtaining the background image corresponding to the image to be trained and constructing a simulation image according to the target corpus and the background image includes:

obtaining a background image corresponding to no text in the image to be trained, and obtaining a target text string corresponding to the background image in the target corpus;

determining the simulation fonts of the target characters in the simulation image according to the character fonts corresponding to the target character strings;

and embedding the target text strings into the corresponding background images in the form of the simulation fonts to construct simulation images.

Optionally, the step of embedding the target text string into the corresponding background image in the form of the simulation font to construct a simulation image includes:

embedding the target text string into the background image in the form of the simulation font to obtain an initial image;

and carrying out noise adding processing on the initial image to construct a simulation image.

Optionally, the step of training the recognition model by generating the countermeasure network and the recognition network in the preset neural network model based on the training data set includes:

the method comprises the steps of fixing a generated countermeasure network in a preset neural network model, optimizing an identification network in the neural network model by adopting a gradient descent algorithm based on the training data set, and enabling the generated countermeasure network to judge hidden layer data obtained after a simulation image in the training data set passes through the generated countermeasure network as real data, wherein the generated countermeasure network is a branch of the identification network;

and fixing the recognition network, adopting a gradient descent algorithm to enable the generation countermeasure network to judge the simulation image in the training data set as simulation data, and judging the image to be trained as real data so as to train and obtain a recognition model.

Optionally, the recognition model is an optical character recognition OCR recognition model, and after the step of training to obtain the recognition model through generating the countermeasure network and the recognition network in the preset neural network model based on the training data set, the method further includes:

when an image to be recognized is received, inputting the image to be recognized into the OCR recognition model;

and determining the characters in the image to be recognized according to the output result of the OCR recognition model.

In addition, in order to achieve the above object, the present invention also provides a training device for an identification model, the training device for an identification model including:

the acquisition module is used for acquiring the image to be trained;

the construction module is used for constructing a simulation image according to the image to be trained;

the determining module is used for determining a training data set according to the image to be trained and the simulation image;

the training module is used for training the recognition model through the generation countermeasure network and the recognition network in the preset neural network model based on the training data set.

In addition, in order to achieve the above object, the present invention also provides a training device for an identification model, where the training device for an identification model includes a memory, a processor, and a training program for an identification model stored in the memory and executable on the processor, and the training program for an identification model, when executed by the processor, implements steps of a training method for an identification model corresponding to a federal learning server.

In addition, in order to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a training program of an identification model, which when executed by a processor, implements the steps of the training method of an identification model as described above.

According to the invention, the training data set is determined according to the image to be trained and the simulation image, and the recognition model is obtained by training the generated countermeasure network and the recognition network in the preset neural network model based on the training data set. The training data set is obtained by constructing the simulation image through the obtained image, and the situation that the recognition accuracy of the recognition model obtained by training is low due to insufficient samples of the training data set is avoided, namely the recognition accuracy of the recognition model obtained by training is improved.

Drawings

FIG. 1 is a flow chart of a first embodiment of a training method of the recognition model of the present invention;

FIG. 2 is a schematic flow chart of the method for obtaining an identification model in an embodiment of the invention;

FIG. 3 is a flow chart of a third embodiment of a training method of the recognition model of the present invention;

FIG. 4 is a functional schematic block diagram of a preferred embodiment of the training apparatus of the recognition model of the present invention;

FIG. 5 is a schematic diagram of a hardware operating environment according to an embodiment of the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The invention provides a training method of an identification model, referring to fig. 1, fig. 1 is a flow chart of a first embodiment of the training method of an identification model of the invention.

Embodiments of the present invention provide embodiments of a training method for recognition models, it being noted that although a logic sequence is shown in the flow diagrams, in some cases the steps shown or described may be performed in a different order than that shown or described herein.

The training method of the recognition model is applied to a server or a terminal, and the terminal may include a mobile terminal such as a mobile phone, a tablet computer, a notebook computer, a palm computer, a personal digital assistant (Personal Digital Assistant, PDA), and a fixed terminal such as a digital TV, a desktop computer, and the like. In various embodiments of a training method of an identification model, the execution subject is omitted for convenience of description to illustrate the various embodiments. The training method of the identification model comprises the following steps:

Step S10, obtaining an image to be trained, and constructing a simulation image according to the image to be trained.

And acquiring an image to be trained, and constructing a simulation image according to the image to be trained. Wherein, relative to the simulation image, the image to be trained is a real image. The images to be trained can be pre-stored or can be acquired from other terminal devices in the process of training the recognition model. In this embodiment, there is at least one image to be trained, and in this embodiment, the number of images to be trained is not specifically limited, and each image to be trained has at least one corresponding simulation image. The specific image to be trained is determined by the recognition model to be trained, if the recognition model is an OCR recognition model, the image to be trained is a character image containing characters, and the simulation image is also a character image; when the recognition model is a face recognition model, the image to be trained is a face image, and the simulation image is also a face image.

Further, according to Zipf's law, the word frequency of a word in a corpus in natural language is inversely proportional to its rank in the word frequency table, so even if there is a lot of labeling data, the occupancy of rarely used words is still far from sufficient. Some rare words encountered when OCR recognition models are actually online do not even appear in the training set. The improvement of the sparsity of the rarely used word data to the accuracy of the OCR recognition model is mainly focused on 2 aspects: a. the uncommon words which do not appear in the training set cannot be identified; b. for the rare words in the training set, the training is insufficient due to the extreme unbalance of the number of samples, so that the recognition result is inaccurate. Particularly, the rare word is easily recognized as a common word having a near shape, for example, "the" web page "is easily recognized as a near word" having a large number of samples because of the extremely small number of samples. Therefore, the current character recognition model has low recognition accuracy for the rare words. Therefore, in order to improve the recognition accuracy of the word recognition model for the rarely used word and improve the field adaptability of the word recognition model, the method of the embodiment may be applicable to more scenes even if the word recognition model is a word recognition model, and the step of constructing the simulation image according to the image to be trained includes:

And a step a of obtaining the tag characters corresponding to the image to be trained and generating a target corpus containing rarely used words according to the tag characters.

Further, if the recognition model is a character recognition model, in the process of constructing a simulation image according to the image to be trained, firstly acquiring the tag characters corresponding to the image to be trained, namely acquiring the characters in the image to be trained, and generating a target corpus containing rarely used words according to the acquired tag characters. It should be noted that, the tag text corresponding to the image to be trained is represented in the form of a character string, each image to be trained has the corresponding tag text, and at least one character exists in the tag text. The rarely used word can be the rarely used word existing in the tag text, or can be the rarely used word not existing in the tag text. Specifically, after the tag text in the image to be trained is obtained, the language of the tag text is determined, and the language includes, but is not limited to, chinese, english and Japanese. After determining the language of the tag text, crawling text data in a preset database according to the language, so as to find out the rarely used word corresponding to the tag text, wherein the preset database can be Wikipedia or other databases containing a large amount of text.

Further, step a comprises:

and a1, acquiring the label characters corresponding to the image to be trained, and constructing an original corpus according to the label characters.

And a2, determining the uncommon words corresponding to the original corpus, and acquiring the contexts corresponding to the uncommon words.

And a3, adding the rarely used words and the contexts corresponding to the rarely used words into the original corpus to obtain a target corpus.

Further, the tag characters corresponding to the image to be trained are obtained, all the obtained tag characters are used as the corpus in the original corpus, and therefore the original corpus is constructed. And determining the rarely used words corresponding to the original corpus, namely determining the rarely used words corresponding to the label words in the original corpus, and obtaining the context corresponding to the rarely used words. It should be noted that, the method for obtaining the context corresponding to the rarely used word is the same as the method for obtaining the rarely used word, and the detailed description is not repeated here. In this embodiment, the number of the characters in the context of the rarely used word is not limited, for example, the context corresponding to the rarely used word is 5 characters in front of the rarely used word and 5 characters behind the rarely used word; or 3 characters in front of the rare word and 4 characters behind the rare word. After the context corresponding to the uncommon word is determined, the context corresponding to the uncommon word is added into the original corpus, and the corpus in the original corpus is updated to obtain the target corpus. In the process of adding the rarely used word and the context corresponding to the rarely used word to the original corpus, the rarely used word and the context corresponding to the rarely used word can be added to the original corpus as a character string to obtain the target corpus.

Further, the training method of the identification model further comprises the following steps:

and a4, inserting the uncommon words into the tag words to update the tag words in the original corpus to obtain a target corpus.

Further, after the rarely used word is determined, the rarely used word is inserted into the tag characters of the original corpus, so that the tag characters in the original corpus are updated, and the target corpus is obtained. It should be noted that, in the target corpus, at least one uncommon word exists in each tagged word. In the process of inserting the uncommon words into the tag characters, the uncommon words can be randomly inserted into the tag characters without limiting the insertion positions of the uncommon words. The rarely used words inserted into the characters of different labels can be the same or different.

And b, obtaining a background image corresponding to the image to be trained, and constructing a simulation image according to the target corpus and the background image.

When the images to be trained are acquired, the background images corresponding to the images to be trained are acquired, and it can be understood that each image to be trained has a corresponding background image. And after the target corpus is obtained, constructing according to the target corpus stock and the background image to obtain simulation images, wherein at least one simulation image exists in each simulation image, namely, selecting the target corpus from the target corpus, and embedding the target corpus into the background image to obtain the simulation images.

And step S20, determining a training data set according to the image to be trained and the simulation image.

After the simulation image is obtained, a training data set is determined according to the image to be trained and the simulation image, and the training data set contains the image to be trained and the simulation image. In this embodiment, the number of images to be trained and simulation images in the training data set is not specifically limited, for example, 80 images to be trained and 20 simulation images may be present; or 45 images to be trained and 55 simulation images.

And step S30, training to obtain a recognition model through generating an countermeasure network and a recognition network in a preset neural network model based on the training data set.

And after the training data set is obtained, training the generated countermeasure network and the recognition network in the preset neural network model based on the training data set to obtain the recognition model. Among them, the neural network model may be DFF (Deep Feed Forword, deep feed forward neural network), RNN (Recurrent Neural Network ), LSTM (Long Short Term Memory, long short term memory network), etc.

Further, step S30 includes:

and c, fixing a generated countermeasure network in a preset neural network model, optimizing an identification network in the neural network model by adopting a gradient descent algorithm based on the training data set, and judging hidden layer data obtained after the generated countermeasure network passes through a simulation image in the training data set as real data by the generated countermeasure network, wherein the generated countermeasure network is a branch of the identification network.

Further, after the training data set is obtained, the images to be trained and the simulation images in the training data set are mixed and input into the neural network model, specifically, the images to be trained and the simulation images in the training data set can be mixed and input into the neural network model in batch (batch), namely, a plurality of images to be trained and the simulation images are input at one time, and the number of the images to be trained and the simulation images to be input each time is not particularly limited in the embodiment. It should be noted that before the images to be trained and the simulation images are input into the neural network model, the corresponding type labels are carried, and the type labels can be used for determining which of the data in the input neural network model are the images to be trained and which are the simulation images. The present embodiment does not particularly limit the expression form of the type tag.

After the image to be trained and the simulation image are input into the neural network model, the image to be trained and the simulation image pass through a feature extraction layer of the neural network model to obtain corresponding feature data, and the feature data pass through a full connection layer corresponding to the hidden layer to obtain corresponding hidden layer data, wherein the feature extraction layer can be a ResNet (Residual Neural Network) network structure or a VGG (Oxford Visual Geometry Group) network structure and the like. In this embodiment, there are specifically 3 kinds of hidden layer data, and in other embodiments, there may be 4 kinds of hidden layer data or 2 kinds of hidden layer data. For example, for a face image, the first hidden layer data is related data of eyes, the second hidden layer data is related data of a nose, and the third hidden layer data is related data of a mouth. It will be appreciated that different hidden layer data represent feature data of different dimensions, such as different positions, or different scales, of the simulation image and the image to be trained, and in this embodiment, the feature data are represented in the form of a matrix.

According to the embodiment, hidden layer data obtained after the simulation image and the image to be trained pass through the feature extraction layer of the neural network model are focused, so that the distribution of the image to be trained and the simulation image is distinguished at different scales and different levels.

After hidden layer data is obtained, generating an countermeasure network in a fixed neural network model, optimizing and optimizing an identification network in the neural network model based on a training data set after the countermeasure network is fixedly generated, and enabling the generated countermeasure network to judge a simulation image in the training data set as real data through the hidden layer data obtained after the countermeasure network is generated, wherein the generated countermeasure network is a branch of the identification network, namely judging the simulation image as an image to be trained. Gradient descent is an iterative method that can be used to solve the least squares problem (both linear and nonlinear), and the gradient descent algorithm in this embodiment is a random gradient descent algorithm. The embodiment reduces the loss of the images to be trained and the simulated images in recognition CTC (Connectionist Temporal Classification) through a random gradient descent algorithm, reduces MMDs (Maximum mean discrepancy, maximum mean value differences) of the images to be trained and the simulated images in various hidden layer data representations in one batch, enables the generated countermeasure network to judge the simulated images in the training data set as real data, and reduces the cross entropy of the two classes according to the mode. It should be noted that, the MMD represents not a difference between one image to be trained and one simulation image, but a difference between all images to be trained and all simulation images, that is, a difference between a whole sample of an image to be trained and a whole sample of a simulation image.

Specifically, the MMD calculation formula can be expressed as:

wherein F represents a function space, and X represents an image to be trained in a batch, namely, represents real data in the batch; y represents a simulation image in the batch, namely represents simulation data in the batch, and k represents a kernel function, specifically, in the embodiment, k can be a Gaussian kernel function; m is the number of images to be trained in one batch; n is the number of simulated images in one batch; x is x _i Represents hidden layer data, x after corresponding optimization of the ith image to be trained _j Representing hidden layer data after corresponding optimization of the j-th image to be trained; y is _i Representing hidden layer data after corresponding optimization of ith simulation image, y _j And representing the hidden layer data after the j-th simulation image corresponds to the optimization.

It should be noted that, through MMD, the distribution of the real data and the simulation data in the neural network model can be made to be as consistent as possible, and the simulation data is determined to be the real data through the fixed generation countermeasure network, that is, the simulation image is determined to be the image to be trained, and the distribution of the real data and the simulation data in the neural network model is further made to be consistent, so that the classifier in the neural network model cannot distinguish the real data and the simulation data. In this embodiment, the generation of the countermeasure network is a branching of the identification network, and the identification network is a backbone with respect to the generation of the countermeasure network. It will be appreciated that the parameters of the generated countermeasure network are unchanged after the generation of the countermeasure network is fixed.

And d, fixing the identification network, adopting a gradient descent algorithm to enable the generation countermeasure network to judge the simulation image in the training data set as simulation data, and judging the image to be trained as real data so as to train and obtain an identification model.

When the countermeasure network is generated, the simulated image in the training data set can be judged to be real data, then the countermeasure network is not fixedly generated, the recognition network is fixed, the gradient descent algorithm is adopted to enable the generated countermeasure network to judge the simulated image in the training data set to be the simulated data, the image to be trained is judged to be the real data, and the discriminator with stronger recognition capability in the branches of the generated countermeasure network is obtained, so that the recognition model is obtained through training in the steps. It should be noted that, the gradient descent algorithm adopted after the network is fixed is a random gradient descent algorithm, and the random gradient descent algorithm may be the same as the gradient descent algorithm in the step c or may be different from the gradient descent algorithm in the step c. It should be noted that, compared with the step c and the step d, the step e improves the discrimination capability of the classifier in the neural network model, and if the simulation data in the step c and the step d can be identified as the real data in the corresponding representation of each hidden layer, it is indicated that the distribution of the image to be trained and the simulation image in the neural network model is more and more similar, i.e. the distribution of the simulation data and the real data in the neural network model is more and more similar.

It can be understood that the distribution difference of the simulation image and the image to be trained in the neural network model is reduced, so that the knowledge learned from the simulation image can be transferred, and the recognition accuracy of the recognition model can be better improved; in addition, the method of the embodiment embeds the generated countermeasure network as a branch into the recognition network for countermeasure learning, and compared with the conventional two-stage method of generating countermeasure network generated data and inputting the generated data into a new recognition network for retraining the recognition network, the method of the embodiment is simpler, namely the training difficulty of the recognition model is reduced.

Further, as the real data and the simulation data in the neural network model are distributed very close to each other, the difference between the real data and the simulation data in the neural network model is reduced, and the situation that the recognition accuracy of the recognition model is reduced when the recognition model is obtained by training the simulation data of the neural network model is reduced; furthermore, because the rarely used words exist in the simulation image, the character recognition model obtained through training has the capability of recognizing the rarely used words, and the rarely used words are simulated, but because the difference between the simulation image and the image to be trained in the neural network model is small, namely, the difference of the functions of the rarely used words in the neural network model and the real rarely used words is small, the rarely used words are simulated.

Specifically, referring to fig. 2, fig. 2 is a schematic flow chart of obtaining a recognition model according to an embodiment of the present invention. Firstly, generating a simulation picture A (simulation image) based on a specific rule through an acquired text label containing the rarely used word, then, identifying a characteristic extraction layer of a network through a neural network model through the simulation picture A and a real picture B (to-be-trained image) to obtain characteristic data, and then, carrying out full connection (fc 1, fc 2) on the characteristic data through corresponding hidden layers to obtain corresponding hidden layer data. In fig. 2, two sets of hidden layer data are shown, where hidden layer a and hidden layer B are fully connected corresponding to the same hidden layer, and hidden layer A1 and hidden layer B1 are obtained by corresponding full connection, and only the corresponding processing procedure is different, where hidden layer data corresponding to hidden layer A1 is input to the corresponding discriminator 1 to obtain the corresponding loss value (loss), and hidden layer B1 obtains the corresponding MMD. The hidden layer A2 and the hidden layer A3 are similar to the hidden layer A1 in terms of processing procedures, and detailed description thereof will not be repeated, while the hidden layer B2 and the hidden layer B2 are similar to the hidden layer B1 in terms of processing procedures, and detailed description thereof will not be repeated. When the hidden layer data is obtained, CTC loss is obtained. If the identification model is to be obtained, the loss, CTC loss and MMD corresponding to the discriminator are smaller than the corresponding preset values, and if one of the loss, CTC loss and MMD corresponding to the discriminator is not smaller than the corresponding preset value, the iterative training of the neural network model is continued, so that the loss, CTC loss and MMD corresponding to the discriminator are smaller than the corresponding preset values, wherein the magnitude of the preset values corresponding to the loss, CTC loss and MMD corresponding to the discriminator is not limited in this embodiment.

According to the embodiment, the image to be trained is obtained, the simulation image is constructed according to the image to be trained, the training data set is determined according to the image to be trained and the simulation image, and the recognition model is obtained through training of the generation countermeasure network and the recognition network in the preset neural network model based on the training data set. The training data set is obtained by constructing the simulation image through the obtained image, and the situation that the recognition accuracy of the recognition model obtained by training is low due to insufficient samples of the training data set is avoided, namely the recognition accuracy of the recognition model obtained by training is improved.

Further, a second embodiment of the training method of the recognition model of the present invention is presented. The second embodiment of the training method of the recognition model differs from the first embodiment of the training method of the recognition model in that step b comprises:

and b1, acquiring a background image corresponding to no text in the image to be trained, and acquiring a target text string corresponding to the background image in the target corpus.

After the image to be trained is obtained, a background image corresponding to no text in the image to be trained is obtained, and a target text string corresponding to the background image is obtained in a target corpus, wherein the target text string is the corpus in the target corpus, and the obtained target text string contains the uncommon words. It should be noted that, in order to ensure the similarity between the constructed simulation image and the image to be trained, the obtained target text string corresponds to the image to be trained, that is, the obtained target text string contains the tag text corresponding to the image to be trained.

And b2, determining the simulation fonts of the target characters in the simulation image according to the character fonts corresponding to the target character strings.

After the target text string is determined, determining the simulated fonts of the tag characters in the simulated image according to the text fonts corresponding to the target text string, specifically, determining the text fonts corresponding to the target text string as the fonts of the tag characters in the target text string, and determining the simulated fonts as regular script if the fonts of the tag characters corresponding to the image to be trained are regular script. In the process of determining the simulated fonts according to the text fonts corresponding to the target text strings, if the fonts which are the same as the text fonts corresponding to the target text strings exist, the simulated fonts are the same as the text fonts corresponding to the target text strings; and if the fonts which are the same as the fonts corresponding to the target text strings do not exist, selecting the fonts with the maximum similarity to the fonts corresponding to the target text strings from all the fonts, and determining the fonts as simulation fonts. Further, if there is no font identical to the font corresponding to the target text string, a font similar to the font corresponding to the target text string may be randomly selected and determined as a simulated font, where the font similar to the font corresponding to the target text string is a font having a similarity greater than a specific similarity between the fonts corresponding to the target text string, and the specific similarity may be set according to specific needs, and the embodiment does not specifically limit the specific similarity.

And b3, embedding the target text strings into the corresponding background images in the form of the simulation fonts to construct and obtain simulation images.

After the simulation fonts, the target text strings and the background images are determined, the target text strings are embedded into the corresponding background images in the form of the simulation fonts, so that the simulation images are constructed. Further, the size of the simulation image is the same as the corresponding size of the image to be trained, and the positions of the target character strings in the simulation image and the positions of the target characters in the image to be trained are kept consistent as much as possible in the process of embedding the target character strings into the corresponding background image. It will be appreciated that the target text string is the tag text of the simulated image.

Further, the step of embedding the target text string into the background image in the form of the simulation font to construct a simulation image includes:

and b31, embedding the target text strings into the background image in the form of the simulation fonts to obtain an initial image.

And b32, carrying out noise adding processing on the initial image to construct a simulation image.

Further, the target text string is embedded into the background image in the form of a simulated font to obtain an initial image, noise is added to the initial image, namely noise is added to the initial image to blur the initial image, the simulated image is constructed, specifically, in the process of noise adding to the initial image, the degree of the added noise is determined by the definition of the corresponding image to be trained, and the definition of the simulated image is similar or identical to that of the corresponding image to be trained through the noise adding process.

According to the embodiment, the fonts of the label characters in the simulation image are the same as or similar to the fonts of the label characters in the corresponding image to be trained in the process of constructing the simulation image, and/or the noise conditions of the simulation image and the image to be trained are the same as or similar to each other, so that the simulation image and the image to be trained are ensured to be the same as much as possible in the processing process of the neural network model, and the recognition accuracy of the obtained recognition model is improved.

Further, a third embodiment of the training method of the recognition model of the present invention is provided. The third embodiment of the training method of the recognition model is different from the first and/or second embodiments of the training method of the recognition model in that the recognition model is an optical character recognition OCR recognition model, the OCR recognition model is one of character recognition models, and referring to fig. 3, the training method of the recognition model further includes:

step S40, after receiving the image to be recognized, inputting the image to be recognized into the OCR recognition model.

And S50, determining characters in the image to be recognized according to the output result of the OCR recognition model.

In this embodiment, the recognition model is an OCR recognition model. After receiving the image to be recognized, inputting the image to be recognized into an OCR recognition model to obtain an output result of the OCR recognition model, and determining characters in the image to be recognized according to the output result. In the process of training the recognition model, the rarely used words exist in the constructed simulation image, namely, the rarely used words exist in the tag characters of the simulation image, so that when the rarely used words exist in the image to be recognized, the rarely used words in the image to be recognized can be recognized through the OCR model. The output result of the OCR recognition model is specific characters in the image to be recognized.

In the embodiment, after the image to be recognized is received, the image to be recognized is input into the OCR recognition model, so that characters in the image to be recognized can be accurately recognized through the OCR recognition model.

In addition, the invention also provides a training device for the recognition model, referring to fig. 4, the training device for the recognition model comprises:

an acquisition module 10, configured to acquire an image to be trained;

a construction module 20, configured to construct a simulation image according to the image to be trained;

a determining module 30, configured to determine a training data set according to the image to be trained and the simulation image;

the training module 40 is configured to train to obtain the recognition model through generating the countermeasure network and the recognition network in the preset neural network model based on the training data set.

Further, the recognition model is a text recognition model, and the construction module 20 includes:

the acquisition unit is used for acquiring the label characters corresponding to the image to be trained;

the generating unit is used for generating a target corpus containing rarely used words according to the tag characters;

the acquisition unit is also used for acquiring a background image corresponding to the image to be trained;

and the construction unit is used for constructing and obtaining a simulation image according to the target corpus and the background image.

Further, the generating unit includes:

the construction subunit is used for constructing and obtaining an original corpus according to the tag characters;

the first determining subunit is used for determining the rarely used words corresponding to the original corpus;

the first acquisition subunit is used for acquiring the context corresponding to the uncommon word;

and the adding subunit is used for adding the rarely used words and the contexts corresponding to the rarely used words into the original corpus to obtain a target corpus.

Further, the construction module 20 further includes:

the inserting unit is used for inserting the uncommon words into the tag words so as to update the tag words in the original corpus and obtain a target corpus.

Further, the obtaining unit is further configured to obtain a background image corresponding to no text in the image to be trained;

the construction unit includes:

the second acquisition subunit is used for acquiring a target text string corresponding to the background image from the target corpus;

the second determining subunit is used for determining the simulation fonts of the target characters in the simulation image according to the character fonts corresponding to the target character strings;

and the embedding subunit is used for embedding the target text strings into the corresponding background images in the form of the simulation fonts so as to construct and obtain simulation images.

Further, the embedding subunit is further configured to embed the target text string into the background image in the form of the simulated font, so as to obtain an initial image; and carrying out noise adding processing on the initial image to construct a simulation image.

Further, the training module 40 includes:

the fixed unit is used for fixing the generated countermeasure network in the preset neural network model;

the optimization unit is used for optimizing the identification network in the neural network model by adopting a gradient descent algorithm based on the training data set, so that the generated countermeasure network judges hidden layer data obtained after the simulation image in the training data set passes through the generated countermeasure network as real data, wherein the generated countermeasure network is a branch of the identification network;

the fixing unit is also used for fixing the identification network;

the judging unit is further used for enabling the generating countermeasure network to judge the simulation image in the training data set as simulation data by adopting a gradient descent algorithm, and judging the image to be trained as real data so as to train and obtain an identification model.

Further, the training device of the identification model further comprises:

the input module is used for inputting the image to be recognized into the OCR recognition model after receiving the image to be recognized;

The determining module 30 is further configured to determine a text in the image to be recognized according to the output result of the OCR recognition model.

The specific implementation manner of the training device of the recognition model is basically the same as that of each embodiment of the training method of the recognition model, and is not repeated here.

In addition, the invention also provides training equipment for the recognition model. As shown in fig. 5, fig. 5 is a schematic structural diagram of a hardware running environment according to an embodiment of the present invention.

It should be noted that fig. 5 is a schematic structural diagram of a hardware running environment of the training device for identifying a model. The training device of the recognition model of the embodiment of the invention can be a PC, a portable computer and other terminal devices.

As shown in fig. 5, the training apparatus of the recognition model may include: a processor 1001, such as a CPU, memory 1005, user interface 1003, network interface 1004, communication bus 1002. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a stable memory (non-volatile memory), such as a disk memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.

It will be appreciated by those skilled in the art that the training device configuration of the recognition model shown in FIG. 5 does not constitute a limitation of the training device of the recognition model, and may include more or fewer components than illustrated, or may combine certain components, or may be a different arrangement of components.

As shown in fig. 5, an operating system, a network communication module, a user interface module, and a training program for identifying a model may be included in a memory 1005 as one type of computer storage medium. The operating system is a program for managing and controlling hardware and software resources of the training device of the recognition model, and supports the training program of the recognition model and other software or running of the program.

In the training device of the recognition model shown in fig. 5, the user interface 1003 is mainly used for connecting to a terminal device, and performs data communication with the terminal device, for example, receiving an image to be recognized or an image to be trained sent by the terminal device; the network interface 1004 is mainly used for a background server and is in data communication with the background server; the processor 1001 may be configured to invoke the training program of the recognition model stored in the memory 1005 and perform the steps of the training method of the recognition model as described above.

In addition, the embodiment of the invention also provides a computer readable storage medium, wherein the computer readable storage medium stores a training program of the identification model, and the training program of the identification model realizes the steps of the training method of the identification model when being executed by a processor.

The specific implementation manner of the computer readable storage medium of the present invention is basically the same as the above embodiments of the training method of the identification model, and will not be described herein.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. A training method of an identification model, characterized in that the training method of an identification model comprises the following steps:

training to obtain an identification model through generating an countermeasure network and an identification network in a preset neural network model based on the training data set;

the step of training the generated countermeasure network and the recognition network in the preset neural network model based on the training data set to obtain the recognition model comprises the following steps:

the method comprises the steps of fixing a generated countermeasure network in a preset neural network model, optimizing an identification network in the neural network model by adopting a gradient descent algorithm based on the training data set, and enabling the generated countermeasure network to judge hidden layer data obtained after a simulation image in the training data set passes through the generated countermeasure network as real data, wherein the generated countermeasure network is a branch network embedded into the identification network;

2. The method for training a recognition model according to claim 1, wherein the recognition model is a character recognition model, and the step of constructing a simulation image from the image to be trained comprises:

3. The method for training the recognition model according to claim 2, wherein the step of obtaining the tag text corresponding to the image to be trained and generating the target corpus containing the rarely used words according to the tag text comprises the steps of:

4. The method of training the recognition model of claim 3, wherein after the step of determining the rarely used words corresponding to the original corpus, further comprises:

5. The method for training the recognition model according to claim 2, wherein the step of obtaining the background image corresponding to the image to be trained and constructing a simulation image according to the target corpus and the background image comprises the steps of:

6. The method of training the recognition model of claim 5, wherein the step of embedding the target text string in the corresponding background image in the form of the simulated font to construct a simulated image comprises:

7. The method according to any one of claims 1 to 6, wherein the recognition model is an optical character recognition OCR recognition model, and the step of training the recognition model by generating an countermeasure network and a recognition network in a preset neural network model based on the training data set further comprises:

8. A training device for an identification model, characterized in that the training device for an identification model comprises:

the acquisition module is used for acquiring the image to be trained;

the training module is used for training the generated countermeasure network and the recognition network in the preset neural network model based on the training data set to obtain a recognition model;

the training module comprises:

The fixed unit is used for fixing the generation countermeasure network in the preset neural network model and is also used for fixing the identification network;

the optimization unit is used for optimizing the identification network in the neural network model by adopting a gradient descent algorithm based on the training data set, so that the generated countermeasure network judges hidden layer data obtained after the simulation image in the training data set passes through the generated countermeasure network as real data, wherein the generated countermeasure network is a branch network embedded into the identification network; the optimizing unit is also used for enabling the generating countermeasure network to judge the simulation image in the training data set as simulation data by adopting a gradient descent algorithm, and judging the image to be trained as real data so as to train and obtain an identification model.

9. Training device for an identification model, characterized in that it comprises a memory, a processor and a training program for an identification model stored on the memory and executable on the processor, which training program for an identification model, when executed by the processor, implements the steps of the training method for an identification model as claimed in any one of claims 1 to 7.

10. A computer-readable storage medium, on which a training program of an identification model is stored, which training program of an identification model, when being executed by a processor, implements the steps of the training method of an identification model according to any one of claims 1 to 7.