CN112270316A

CN112270316A - Character recognition method, character recognition model training method, character recognition device, and electronic equipment

Info

Publication number: CN112270316A
Application number: CN202011012497.0A
Authority: CN
Inventors: 张婕蕾; 万昭祎; 姚聪
Original assignee: Beijing Kuangshi Technology Co Ltd
Current assignee: Beijing Kuangshi Technology Co Ltd
Priority date: 2020-09-23
Filing date: 2020-09-23
Publication date: 2021-01-26
Anticipated expiration: 2040-09-23
Also published as: CN112270316B

Abstract

The invention provides a character recognition method, a character recognition model training method, a character recognition device and electronic equipment, which relate to the technical field of image processing and comprise the following steps: processing the feature vector of the image to be recognized through an attention model to obtain an attention weight value of each recurrent neural network; determining target input parameters of each recurrent neural network, wherein the target input parameters comprise: the method comprises the steps that a feature vector of an image to be recognized is obtained, or the feature vector of the image to be recognized and a character recognition result output by a last recurrent neural network of a current recurrent neural network are obtained; the target input parameters and the attention weight values are input into each cyclic neural network to be processed, character recognition results are obtained, the character recognition result output by the last cyclic neural network is determined as the character recognition result of the image to be recognized, and the technical problem that an existing scene character recognition model is low in recognition accuracy due to the fact that the existing scene character recognition model is easily affected by corpus of a training set is solved.

Description

Character recognition method, character recognition model training method, character recognition device, and electronic equipment

Technical Field

The invention relates to the technical field of image processing, in particular to a method and a device for training a character recognition model, and electronic equipment.

Background

In recent years, scene character recognition is more and more widely applied to the field of pattern recognition, and the scene character recognition can be applied to the fields of image retrieval, intelligent transportation, man-machine interaction and the like.

Scene character recognition is widely researched in recent decades, more and more scene character recognition methods are adopted, and the accuracy of the scene character recognition method is continuously improved. However, the existing scene character recognition method has vocabulary dependency, that is, the output of the scene character recognition model is often influenced by corpus of the training set. For example, as shown in fig. 1, the two left graphs are training set corpora respectively, and the two right graphs are to-be-recognized pictures respectively. As can be seen from the right-hand graph, the model identifies "UNIVERSITI" as "UNIVERSITY", and the identification process can indicate that the model is affected by the corpus of the training set, resulting in recognition errors.

Disclosure of Invention

In view of the above, the present invention provides a method, an apparatus, and an electronic device for training a character recognition model to alleviate the technical problem of low recognition accuracy of the existing scene character recognition model due to being easily affected by corpus of a training set.

In a first aspect, an embodiment of the present invention provides a text recognition method, which is applied to a text recognition model, where the text recognition model includes: the attention model is connected with each recurrent neural network, and the recurrent neural networks are connected in series, wherein the input data of part or all of the recurrent neural networks in the recurrent neural networks does not contain the output data of the recurrent neural network connected with the part or all of the recurrent neural networks; the method comprises the following steps: processing the feature vector of the image to be recognized through an attention model to obtain an attention weight value of each recurrent neural network; determining target input parameters of each recurrent neural network, wherein the target input parameters comprise: the feature vector of the image to be recognized, or the feature vector of the image to be recognized and a character recognition result output by a last recurrent neural network of the current recurrent neural network; and inputting the target input parameters and the attention weight values into each recurrent neural network for processing to obtain character recognition results, and determining the character recognition result output by the last recurrent neural network as the character recognition result of the image to be recognized, wherein the character recognition result represents the probability that the character to be recognized belongs to each preset character.

Further, determining the target input parameters for each recurrent neural network includes: if it is determined that a corresponding target probability is preset for each recurrent neural network, judging whether the target probability is greater than or equal to a preset probability threshold value; the target probability is used for determining whether the target input parameters contain the character recognition result output by the last recurrent neural network; and if the target probability is greater than or equal to a preset probability threshold, determining that the target input parameters of each cyclic neural network comprise the character recognition result output by the previous cyclic neural network and the feature vector of the image to be recognized.

Further, determining the target probability corresponding to each recurrent neural network includes: randomly generating the target probability for each recurrent neural network by a probability generator; or; randomly generating the target probability for each recurrent neural network through a target neural network, wherein input parameters of the target neural network include: position information of each recurrent neural network in the plurality of recurrent neural networks, attention weight value of each recurrent neural network, and feature vector of the image to be recognized.

Further, if the input data of all the first recurrent neural networks does not include the output data of the last first recurrent neural network connected thereto, the character recognition model further includes: a target language model; the target language model includes: and the input data of all the second recurrent neural networks in the plurality of second recurrent neural networks comprises the output data of the second recurrent neural network connected with the input data, and the plurality of second recurrent neural networks are connected with the plurality of first recurrent neural networks in a one-to-one correspondence manner.

In a second aspect, an embodiment of the present invention provides a method for training a character recognition model, where the character recognition model includes: the attention model is connected with each first recurrent neural network, and the first recurrent neural networks are connected in series, wherein the input data of part or all of the first recurrent neural networks in the first recurrent neural networks does not contain the output data of the last first recurrent neural network connected with the input data; the method comprises the following steps: processing the feature vectors of the corpus of the training set through an attention model to obtain an attention weight value of each first circulation neural network; determining target input parameters for each first recurrent neural network, wherein the target input parameters include: the feature vector of the image to be recognized, or the feature vector of the training set corpus and the character recognition result output by the last first cyclic neural network of the target first cyclic neural network; and training the character recognition model by using the target input parameters, the attention weight values and target label information to obtain the trained character recognition model, wherein the target label information is an actual character sequence contained in the corpus of the training set.

Further, determining the target input parameters of each first recurrent neural network comprises: if it is determined that a corresponding target probability is preset for each first recurrent neural network, judging whether the target probability is greater than or equal to a preset probability threshold value; wherein, the target probability is used for determining whether the target input parameter contains the character recognition result output by the last first recurrent neural network; and if the target probability is greater than or equal to a preset probability threshold, determining that the target input parameters of the target first cyclic neural network comprise the character recognition result output by the last first cyclic neural network and the feature vector of the training set corpus.

Further, the method further comprises: randomly generating the target probability for each first recurrent neural network by a probability generator; or; randomly generating the target probability for each first recurrent neural network through a target neural network, wherein input parameters of the target neural network include: the position information of the target first recurrent neural network in the plurality of first recurrent neural networks, the attention weight value of the target first recurrent neural network, and the feature vector of the training corpus.

Further, if the input data of all the first recurrent neural networks in the plurality of first recurrent neural networks does not include the output data of the last connected first recurrent neural network, the character recognition model further includes: a target language model; the target language model includes: and the input data of all the second recurrent neural networks in the plurality of second recurrent neural networks comprises the output data of the second recurrent neural network connected with the input data, and the plurality of second recurrent neural networks are connected with the plurality of first recurrent neural networks in a one-to-one correspondence manner.

Further, the method further comprises: acquiring a character recognition result output by the last first recurrent neural network in the plurality of first recurrent neural networks to obtain a first output result; acquiring a character recognition result output by the last second recurrent neural network in the plurality of second recurrent neural networks to obtain a second output result; calculating a target loss value by using the first output result and the second output result; and training the character recognition model through the target loss value.

Further, the recurrent neural network is a long-short term memory network LSTM.

In a third aspect, an embodiment of the present invention provides a text recognition apparatus, which is applied to a text recognition model, where the text recognition model includes: the attention model is connected with each recurrent neural network, and the recurrent neural networks are connected in series, wherein the input data of part or all of the recurrent neural networks in the recurrent neural networks does not contain the output data of the recurrent neural network connected with the part or all of the recurrent neural networks; the device comprises: the first processing unit is used for processing the feature vectors of the image to be recognized through the attention model to obtain an attention weight value of each recurrent neural network; a first determining unit, configured to determine a target input parameter of each recurrent neural network, wherein the target input parameter includes: the feature vector of the image to be recognized, or the feature vector of the image to be recognized and a character recognition result output by a last recurrent neural network of the current recurrent neural network; and the second processing unit is used for inputting the target input parameters and the attention weight values into each cyclic neural network for processing to obtain character recognition results, and determining the character recognition result output by the last cyclic neural network as the character recognition result of the image to be recognized, wherein the character recognition result represents the probability that the character to be recognized belongs to each preset character.

In a fourth aspect, an embodiment of the present invention provides a training apparatus for a character recognition model, where the character recognition model includes: the attention model is connected with each first recurrent neural network, and the first recurrent neural networks are connected in series, wherein the input data of part or all of the first recurrent neural networks in the first recurrent neural networks does not contain the output data of the last first recurrent neural network connected with the input data; the device comprises: the third processing unit is used for processing the feature vectors of the corpus of the training set through the attention model to obtain the attention weight value of each first cyclic neural network; a second determining unit, configured to determine a target input parameter of each first recurrent neural network, where the target input parameter includes: the feature vector of the image to be recognized, or the feature vector of the training set corpus and the character recognition result output by the last first cyclic neural network of the target first cyclic neural network; and the training unit is used for training the character recognition model by using the target input parameters, the attention weight values and target label information to obtain the trained character recognition model, wherein the target label information is an actual character sequence contained in the corpus of the training set.

In a fifth aspect, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the method according to any one of the first aspect or the second aspect when executing the computer program.

In a sixth aspect, an embodiment of the present invention provides a computer-readable medium having non-volatile program code executable by a processor, where the program code causes the processor to perform the steps of the method of any one of the first or second aspects.

The inventor finds that output data of each recognition step of the scene character recognition model is input in a next step, so that the existing scene character recognition model has a certain sequence modeling, namely the establishment of a language model. The structure of the language model in the existing scene character recognition model causes the model to have strong performance on vocabulary dependency. Based on this, the present application proposes a character recognition method.

In the character recognition method provided by the embodiment of the invention, the character recognition model is adopted to perform character recognition on the image to be recognized, and the input data of part or all of the cyclic neural networks in the plurality of cyclic neural networks in the character recognition model does not contain the output data of the last cyclic neural network connected with the cyclic neural network, so that the technical effect of reducing the vocabulary dependency of the plurality of cyclic neural networks in the character recognition process is achieved, and the technical problem of low recognition accuracy of the existing scene character recognition model due to the fact that the existing scene character recognition model is easily influenced by the corpus of the training set is solved.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a diagram illustrating a character recognition result in the prior art;

FIG. 2 is a schematic structural diagram of an electronic device according to an embodiment of the invention;

FIG. 3 is a flow chart of a method of text recognition according to an embodiment of the present invention;

FIG. 4 is a diagram of a first text recognition model according to an embodiment of the present invention;

FIG. 5 is a diagram of a second text recognition model according to an embodiment of the present invention;

FIG. 6 is a flow chart of a method for training a character recognition model according to an embodiment of the present invention;

FIG. 7 is a diagram of a third text recognition model according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of a text recognition apparatus according to an embodiment of the present invention;

FIG. 9 is a diagram illustrating an apparatus for training a character recognition model according to an embodiment of the present invention.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1:

first, an electronic device 100 for implementing an embodiment of the present invention, which may be used to run a text recognition method or a training method of a text recognition model according to embodiments of the present invention, is described with reference to fig. 2.

As shown in fig. 2, electronic device 100 includes one or more processors 102, one or more memories 104. Optionally, the electronic device 100 may further include an input device 106, an output device 108, and an image capture device 110, which may be interconnected via a bus system 112 and/or other form of connection mechanism (not shown). It should be noted that the components and structure of the electronic device 100 shown in fig. 2 are exemplary only, and not limiting, and the electronic device may also have some of the components shown in fig. 2, or other components and structures not shown in fig. 2, as desired.

The processor 102 may be implemented in at least one hardware form of a Digital Signal Processor (DSP), a Field-Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), and an asic (application Specific Integrated circuit), and the processor 102 may be a Central Processing Unit (CPU) or other form of Processing Unit having data Processing capability and/or instruction execution capability, and may control other components in the electronic device 100 to perform desired functions.

The memory 104 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. On which one or more computer program instructions may be stored that may be executed by processor 102 to implement client-side functionality (implemented by the processor) and/or other desired functionality in embodiments of the invention described below. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.

The input device 106 may be a device used by a user to input instructions and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like.

The output device 108 may output various information (e.g., images or sounds) to the outside (e.g., a user), and may include one or more of a display, a speaker, and the like.

The image acquisition device 110 is configured to acquire an image to be recognized, where the image acquired by the image acquisition device is subjected to the character recognition method to obtain a character recognition result.

Example 2:

in accordance with an embodiment of the present invention, there is provided an embodiment of a text recognition method, it should be noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.

It should be noted that, in the present application, the method may be applied to a text recognition model, where the text recognition model includes: the attention model is connected with each recurrent neural network, and the recurrent neural networks are connected in series, wherein the input data of part or all of the recurrent neural networks in the recurrent neural networks does not contain the output data of the recurrent neural network connected with the part or all of the recurrent neural networks.

FIG. 4 shows a character recognitionStructure diagram of model as can be seen from fig. 4, the character recognition model includes an attention model attentive, a plurality of recurrent neural networks LSTM Long-Short Term Memory networks (Long Short-Term Memory), as can be seen from fig. 4, the plurality of recurrent neural networks are connected in series. In FIG. 4, ht is the feature vector of the image to be recognized, α_tAttention weight value, s, output for attention model attentional_t-1As output of the recurrent neural network LSTM, g_tAnd performing multiplication operation on the feature vector of the image to be recognized and the attention weight value.

The inventor finds that in the character recognition model shown in fig. 4, if the output of each recurrent neural network except the last recurrent neural network is used as the input of the next recurrent neural network, then the recurrent neural networks have certain sequence modeling, namely the establishment of the language model. This approach between recurrent neural networks will result in a word recognition model that performs strongly on vocabulary dependency. Therefore, in the present application, the input data of some or all of the recurrent neural networks in the plurality of recurrent neural networks is set to not include the output data of the recurrent neural network connected to the previous recurrent neural network. That is, for some or all of the recurrent neural networks, the output result of the last recurrent neural network connected to the preceding recurrent neural network is discarded.

Fig. 3 is a flow chart of a text recognition method according to an embodiment of the invention. As shown in fig. 3, the method comprises the steps of:

step S302, processing the feature vector of the image to be recognized through the attention model to obtain the attention weight value of each recurrent neural network.

As shown in fig. 4, for the recurrent neural network LSTM2, the attention weight value is generated for LSTM2 by the attention model Attend, and the specific generation process may be described as follows:

obtaining the output result s of the last recurrent neural network LSTM1 of the recurrent neural network LSTM2_t-1Then, obtaining the characteristic vector ht of the image to be identified, and outputting the result s_t-1And features of the image to be recognizedThe sign vector ht is processed to obtain the attention weight value alpha of the recurrent neural network LSTM2_t。

Step S304, determining target input parameters of each recurrent neural network, wherein the target input parameters comprise: the feature vector of the image to be recognized, or the feature vector of the image to be recognized and a character recognition result output by a last recurrent neural network of the current recurrent neural network.

It should be noted that, in the present application, some or all of the recurrent neural networks in the plurality of recurrent neural networks are set to no longer receive the output result of the last recurrent neural network. Thus, in the present application, it is necessary to determine the target input parameters for each recurrent neural network. For example, the target input parameters of the LSTM1 are determined to be the feature vector of the image to be recognized, the target input parameters of the LSTM2 are determined to be the feature vector of the image to be recognized, the target input parameters of the LSTM3 are determined to be the feature vector of the image to be recognized, the character recognition result output by the recurrent neural network LSTM2, and the like.

Step S306, inputting the target input parameter and the attention weight value into each recurrent neural network for processing, obtaining a character recognition result, and determining a character recognition result output by the last recurrent neural network as a character recognition result of the image to be recognized, where the character recognition result represents a probability that the character to be recognized belongs to each preset character.

In an alternative embodiment of the present application, the step S304 of determining the target input parameter of each recurrent neural network includes the following processes:

firstly, whether a corresponding target probability is set for each recurrent neural network in advance can be judged;

if it is determined that a corresponding target probability is set for each recurrent neural network in advance, continuously judging whether the target probability is greater than or equal to a preset probability threshold value; the target probability is used for determining whether the target input parameters contain the character recognition result output by the last recurrent neural network. And if the fact that the corresponding target probability is not set for each cyclic neural network in advance is determined, determining that the target input parameters do not contain the character recognition result output by the previous cyclic neural network.

And secondly, if the target probability is greater than or equal to a preset probability threshold, determining that the target input parameters of each recurrent neural network comprise the character recognition result output by the last recurrent neural network and the feature vector of the image to be recognized.

In the present application, the preset probability threshold may be set to 0.2, and other thresholds may be set, which is not specifically limited in the present application, and the user may select the threshold according to actual needs.

In the present application, as shown in fig. 5, for each recurrent neural network, in addition to the first recurrent neural network, a corresponding target sequence rand (1, 0) may be set at the input position thereof.

Assuming that the preset probability threshold is 0.2, in fig. 5, the target probability corresponding to LSTM2 is 0.5. As can be seen from the comparison, the target probability corresponding to the LSTM2 is greater than the preset probability, and at this time, the output result of the LSTM1 is multiplied by 1 in the target sequence rand, and the multiplied result is transmitted to the input end of the LSTM 2.

Assuming that the preset probability threshold is 0.2, in fig. 5, the target probability corresponding to LSTM2 is 0.1. By comparison, the target probability corresponding to the LSTM2 is smaller than the preset probability, and at this time, the output result of the LSTM1 may be multiplied by 0 in the target sequence rand, so that the output result of the LSTM1 is not transmitted to the input end of the LSTM 2.

As can be seen from the above description, in the present application, a corresponding target probability is preset for each recurrent neural network, and then, whether to serve as input data for the next recurrent neural network is determined with a certain probability for each output result (e.g., output character) of each recurrent neural network. The method is equal to discarding the characters of the corpus of the training set, so that the method can relieve the dependence of a plurality of recurrent neural networks on the corpus of the training set.

In an optional embodiment, the target probability corresponding to each recurrent neural network may be determined in several ways, including:

in a first way,

The target probability is randomly generated for each recurrent neural network by a probability generator.

In one approach, a probability generator may be preset, and the probability generator may randomly generate a corresponding target probability for each recurrent neural network in advance.

In one mode, a probability generator may be preset, and in the training phase of the character recognition model, the probability generator may also randomly generate a corresponding initial probability for each recurrent neural network in advance. Then, in the process of training the character recognition model, the numerical value of the initial probability can be adjusted, so that the precision of the character recognition model meets the preset requirement, and the initial probability meeting the preset requirement is determined as the target probability.

The second way,

Randomly generating the target probability for each recurrent neural network through a target neural network, wherein input parameters of the target neural network include: position information of each recurrent neural network in the plurality of recurrent neural networks, attention weight value of each recurrent neural network, and feature vector of the image to be recognized.

In another alternative embodiment, a target neural network may be preset, the output data of the target neural network is the target probability of a plurality of recurrent neural networks, and the input data of the target neural network may be one or more of the following data: the position information of each recurrent neural network in the plurality of recurrent neural networks, the attention weight value of each recurrent neural network and the feature vector of the image to be identified.

As shown in fig. 4, the plurality of recurrent neural networks are connected in sequence, and the position information of each recurrent neural network in the plurality of recurrent neural networks can be understood as the position where each recurrent neural network is located in the plurality of recurrent neural networks connected in sequence. For example, the position information of the LSTM1 in the plurality of recurrent neural networks is "1", and for example, the position information of the LSTM2 in the plurality of recurrent neural networks is "2", and the like. It should be noted that the cyclic neural network located at different positions has different data processing procedures and different importance degrees, and therefore, the position information can be used as an input of the target neural network.

In another optional embodiment of the present application, if the input data of all of the first recurrent neural networks in the plurality of first recurrent neural networks does not include the output data of the last connected first recurrent neural network, the character recognition model further includes a target language model. As shown in fig. 7, the target language model includes: and the input data of all the second recurrent neural networks in the plurality of second recurrent neural networks comprises the output data of the second recurrent neural network connected with the input data, and the plurality of second recurrent neural networks are connected with the plurality of first recurrent neural networks in a one-to-one correspondence manner.

In the present application, the output result of each first recurrent neural network is not sent to the next first recurrent neural network, and based on this, in the present application, each first recurrent neural network is followed by a second recurrent neural network, so as to perform auxiliary training on the plurality of first recurrent neural networks through the plurality of second recurrent neural networks, and the specific training process is described in the following embodiments.

Example 3:

according to the embodiment of the invention, the embodiment of the training method of the character recognition model is provided.

In the present application, the character recognition model includes: the attention model is connected with each first recurrent neural network, and the first recurrent neural networks are connected in series, wherein the input data of part or all of the first recurrent neural networks in the first recurrent neural networks does not contain the output data of the last first recurrent neural network connected with the input data. In the present application, the recurrent neural network may be a long-short term memory network LSTM.

FIG. 6 is a flowchart of a method for training a character recognition model according to an embodiment of the invention. As shown in fig. 6, the method includes the steps of:

step S602, processing the feature vectors of the corpus of the training set through the attention model to obtain the attention weight value of each first-cycle neural network.

As shown in fig. 4 or fig. 7, for the recurrent neural network LSTM2, the attention weight value is generated for LSTM2 by the attention model attentive, and the specific generation process may be described as follows:

obtaining the output result s of the last recurrent neural network LSTM1 of the recurrent neural network LSTM2_t-1Then, obtaining the characteristic vector ht of the image to be identified, and outputting the result s_t-1And the feature vector ht of the image to be identified are processed to obtain the attention weight value alpha of the recurrent neural network LSTM2_t。

Step S604, determining target input parameters of each first recurrent neural network, wherein the target input parameters include: and the feature vector of the image to be recognized, or the feature vector of the training set corpus and the character recognition result output by the last first cyclic neural network of the target first cyclic neural network.

It should be noted that, in the present application, some or all of the first recurrent neural networks in the plurality of first recurrent neural networks are configured to no longer receive the output result of the last first recurrent neural network. Thus, in the present application, the target input parameters for each first recurrent neural network need to be determined. For example, the target input parameters of the LSTM1 are determined to be the feature vector of the image to be recognized, the target input parameters of the LSTM2 are determined to be the feature vector of the image to be recognized, the target input parameters of the LSTM3 are determined to be the feature vector of the image to be recognized, the character recognition result output by the recurrent neural network LSTM2, and the like.

Step S606, training the character recognition model by using the target input parameter, the attention weight value, and target tag information to obtain the trained character recognition model, where the target tag information is an actual character sequence included in the corpus of the training set.

In the invention, a character recognition model is adopted to perform character recognition on an image to be recognized, and input data of part or all of the cyclic neural networks in the plurality of cyclic neural networks in the character recognition model no longer comprises output data of the last cyclic neural network connected with the cyclic neural network, so that the technical effect of reducing vocabulary dependency of the plurality of cyclic neural networks in the character recognition process is obtained, and the technical problem of low recognition accuracy of the existing scene character recognition model due to the fact that the existing scene character recognition model is easily influenced by linguistic data of a training set is solved.

In an alternative embodiment of the present application, the step S604 of determining the target input parameter of each first recurrent neural network includes the following processes:

in the present application, it may be first determined whether a corresponding target probability is set for each first recurrent neural network in advance. If the fact that the corresponding target probability is set for each first cyclic neural network in advance is determined, whether the target probability is larger than or equal to a preset probability threshold value or not is continuously judged; wherein, the target probability is used for determining whether the target input parameter contains the character recognition result output by the last first-cycle neural network. And if the fact that the corresponding target probability is not set for each first cyclic neural network in advance is determined, determining that the target input parameters do not contain the character recognition result output by the last first cyclic neural network.

And if the target probability is greater than or equal to a preset probability threshold, determining that the target input parameters of the target first cyclic neural network comprise the character recognition result output by the last first cyclic neural network and the feature vector of the training set corpus.

In the present application, as shown in fig. 5, in addition to the first recurrent neural network, for each first recurrent neural network, a corresponding target sequence rand (1, 0) may be set at the position of its input.

In the present application, after the target input parameter of each first-cycle neural network is determined in the manner described above, the character recognition model may be trained by using the target input parameter, the attention weight value, and the target label information, so as to obtain the trained character recognition model. It should be noted that, in the present application, the target tag information may be understood as an actual text sequence contained in the corpus of the training set.

As can be seen from the above description, in the present application, a corresponding target probability is preset for each first recurrent neural network, and then, whether to serve as input data of the next first recurrent neural network is determined with a certain probability for each output result (e.g., output character) of each first recurrent neural network. In this way, the characters of the corpus of the training set are discarded, so that the method can reduce the dependence of the first recurrent neural networks on the corpus of the training set.

In an optional embodiment, the target probability corresponding to each first recurrent neural network may be determined in several ways, including:

the first method is as follows:

the target probability is randomly generated for each first recurrent neural network by a probability generator.

In one approach, a probability generator may be preset, and the probability generator may randomly generate a corresponding target probability for each first-cycle neural network in advance.

The second method comprises the following steps:

randomly generating the target probability for each first recurrent neural network through a target neural network, wherein input parameters of the target neural network include: the position information of the target first recurrent neural network in the plurality of first recurrent neural networks, the attention weight value of the target first recurrent neural network, and the feature vector of the training corpus.

In another alternative embodiment, a target neural network may be preset, the output data of the target neural network is the target probabilities of the first recurrent neural networks, and the input data of the target neural network may be one or more of the following data: the position information of each first recurrent neural network in a plurality of first recurrent neural networks, the attention weight value of each first recurrent neural network and the feature vector of the image to be recognized.

As shown in fig. 4, the plurality of first recurrent neural networks are connected in sequence, and the position information of each first recurrent neural network in the plurality of first recurrent neural networks can be understood as the position where each first recurrent neural network is located in the plurality of first recurrent neural networks connected in sequence. For example, the position information of the LSTM1 in the plurality of first recurrent neural networks is "1", and for example, the position information of the LSTM2 in the plurality of first recurrent neural networks is "2", and so on. It should be noted that the cyclic neural network located at different positions has different data processing procedures and different importance degrees, and therefore, the position information can be used as an input of the target neural network.

In the present application, the output result of each first recurrent neural network is not sent to the next first recurrent neural network, and based on this, in the present application, each first recurrent neural network is followed by a second recurrent neural network, so as to perform training assistance on the plurality of first recurrent neural networks through the plurality of second recurrent neural networks.

As shown in fig. 7, the process of the character recognition model is introduced, which may be specifically described as:

for each neural network of the first plurality of recurrent neural networks, the process is described as follows:

firstly, the attention model obtains the feature vector of the corpus of the training set, then obtains the output result of the last first recurrent neural network, and determines the output result and the feature vector of the corpus of the training set as the input data of the current first recurrent neural network. Next, the current first recurrent neural network processes the input data to obtain an output result (e.g., a character recognition result), and then the current first recurrent neural network inputs the output result into the attention model and the corresponding second recurrent neural network for processing.

For each neural network of the plurality of second recurrent neural networks, the process is described as follows:

the current second recurrent neural network obtains the output result of the last second recurrent neural network and obtains the output result of the first recurrent neural network connected with the current second recurrent neural network; and simultaneously, the output result of the current second recurrent neural network is transmitted to the next second recurrent neural network for processing.

According to the above-described process, in the present application, the character recognition result output by the last first recurrent neural network of the plurality of first recurrent neural networks may be obtained first, and the first output result may be obtained. Then, obtaining a character recognition result output by the last second recurrent neural network in the plurality of second recurrent neural networks to obtain a second output result; next, calculating a target loss value by using the first output result and the second output result; and finally, training the character recognition model through the target loss value.

As can be seen from the above description, in the present application, a sequence model (i.e., a plurality of first recurrent neural networks) is separately created, which focuses on image features while using a target language model as an aid to the sequence model. The method weakens the language modeling capacity of the output (namely, a plurality of first recurrent neural networks) of the previous branch, and effectively relieves the dependency of the model on the vocabulary.

Example 4:

the embodiment of the present invention further provides a character recognition apparatus, which is mainly used for executing the character recognition method provided by the above-mentioned content of the embodiment of the present invention, and the following describes the character recognition apparatus provided by the embodiment of the present invention in detail.

Fig. 8 is a schematic diagram of a character recognition apparatus according to an embodiment of the invention. The device is applied to a character recognition model, and the character recognition model comprises: the attention model is connected with each recurrent neural network, and the recurrent neural networks are connected in series, wherein the input data of part or all of the recurrent neural networks in the recurrent neural networks does not contain the output data of the recurrent neural network connected with the part or all of the recurrent neural networks.

As shown in fig. 8, the character recognition apparatus mainly includes a first processing unit 81, a first determining unit 82, and a second processing unit 83, wherein:

the first processing unit 81 is configured to process the feature vector of the image to be recognized through an attention model to obtain an attention weight value of each recurrent neural network;

a first determining unit 82, configured to determine a target input parameter of each recurrent neural network, where the target input parameter includes: the feature vector of the image to be recognized, or the feature vector of the image to be recognized and a character recognition result output by a last recurrent neural network of the current recurrent neural network;

the second processing unit 83 is configured to input the target input parameter and the attention weight value to each recurrent neural network for processing, obtain a character recognition result, and determine the character recognition result output by the last recurrent neural network as the character recognition result of the image to be recognized, where the character recognition result indicates a probability that the character to be recognized belongs to each preset character.

Optionally, the first determining unit is configured to: if it is determined that a corresponding target probability is preset for each recurrent neural network, judging whether the target probability is greater than or equal to a preset probability threshold value; the target probability is used for determining whether the target input parameters contain the character recognition result output by the last recurrent neural network; and if the target probability is greater than or equal to a preset probability threshold, determining that the target input parameters of each cyclic neural network comprise the character recognition result output by the previous cyclic neural network and the feature vector of the image to be recognized.

Optionally, the first determining unit is further configured to: randomly generating the target probability for each recurrent neural network by a probability generator; or; randomly generating the target probability for each recurrent neural network through a target neural network, wherein input parameters of the target neural network include: position information of each recurrent neural network in the plurality of recurrent neural networks, attention weight value of each recurrent neural network, and feature vector of the image to be recognized.

Optionally, the apparatus is further configured to: if the input data of all the first recurrent neural networks does not include the output data of the last first recurrent neural network connected to it, the character recognition model further includes: a target language model; the target language model includes: and the input data of all the second recurrent neural networks in the plurality of second recurrent neural networks comprises the output data of the second recurrent neural network connected with the input data, and the plurality of second recurrent neural networks are connected with the plurality of first recurrent neural networks in a one-to-one correspondence manner.

Example 5:

FIG. 9 is a diagram illustrating an apparatus for training a character recognition model according to an embodiment of the present invention. The character recognition model comprises: the attention model is connected with each first recurrent neural network, and the first recurrent neural networks are connected in series, wherein the input data of part or all of the first recurrent neural networks in the first recurrent neural networks does not contain the output data of the last first recurrent neural network connected with the input data.

As shown in fig. 9, the training apparatus for character recognition model mainly includes a third processing unit 91, a second determining unit 92 and a training unit 93, wherein:

the third processing unit 91 is configured to process the feature vectors of the corpus of the training set through the attention model to obtain an attention weight value of each first recurrent neural network;

a second determining unit 92, configured to determine a target input parameter of each first recurrent neural network, where the target input parameter includes: the feature vector of the image to be recognized, or the feature vector of the training set corpus and the character recognition result output by the last first cyclic neural network of the target first cyclic neural network;

a training unit 93, configured to train the character recognition model by using the target input parameter, the attention weight value, and target tag information to obtain the trained character recognition model, where the target tag information is an actual character sequence included in the corpus of the training set.

Optionally, the second determining unit is configured to: if it is determined that a corresponding target probability is preset for each first recurrent neural network, judging whether the target probability is greater than or equal to a preset probability threshold value; wherein, the target probability is used for determining whether the target input parameter contains the character recognition result output by the last first recurrent neural network; and if the target probability is greater than or equal to a preset probability threshold, determining that the target input parameters of the target first cyclic neural network comprise the character recognition result output by the last first cyclic neural network and the feature vector of the training set corpus.

Optionally, the second determining unit is configured to: randomly generating the target probability for each first recurrent neural network by a probability generator; or; randomly generating the target probability for each first recurrent neural network through a target neural network, wherein input parameters of the target neural network include: the position information of the target first recurrent neural network in the plurality of first recurrent neural networks, the attention weight value of the target first recurrent neural network, and the feature vector of the training corpus.

Optionally, if the input data of all the first recurrent neural networks in the plurality of first recurrent neural networks does not include the output data of the last connected first recurrent neural network, the text recognition model further includes: a target language model; the target language model includes: and the input data of all the second recurrent neural networks in the plurality of second recurrent neural networks comprises the output data of the second recurrent neural network connected with the input data, and the plurality of second recurrent neural networks are connected with the plurality of first recurrent neural networks in a one-to-one correspondence manner.

Optionally, the apparatus is further configured to: acquiring a character recognition result output by the last first recurrent neural network in the plurality of first recurrent neural networks to obtain a first output result; acquiring a character recognition result output by the last second recurrent neural network in the plurality of second recurrent neural networks to obtain a second output result; calculating a target loss value by using the first output result and the second output result; and training the character recognition model through the target loss value.

Optionally, the recurrent neural network is a long-short term memory network LSTM.

The device provided by the embodiment of the present invention has the same implementation principle and technical effect as the method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the method embodiments without reference to the device embodiments.

In addition, in the description of the embodiments of the present invention, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method for recognizing a character, the method being applied to a character recognition model, the character recognition model comprising: the attention model is connected with each recurrent neural network, and the recurrent neural networks are connected in series, wherein the input data of part or all of the recurrent neural networks in the recurrent neural networks does not contain the output data of the recurrent neural network connected with the part or all of the recurrent neural networks; the method comprises the following steps:

processing the feature vector of the image to be recognized through an attention model to obtain an attention weight value of each recurrent neural network;

determining target input parameters of each recurrent neural network, wherein the target input parameters comprise: the feature vector of the image to be recognized, or the feature vector of the image to be recognized and a character recognition result output by a last recurrent neural network of the current recurrent neural network;

and inputting the target input parameters and the attention weight values into each recurrent neural network for processing to obtain character recognition results, and determining the character recognition result output by the last recurrent neural network as the character recognition result of the image to be recognized, wherein the character recognition result represents the probability that the character to be recognized belongs to each preset character.

2. The method of claim 1, wherein determining the target input parameters for each recurrent neural network comprises:

if it is determined that a corresponding target probability is preset for each recurrent neural network, judging whether the target probability is greater than or equal to a preset probability threshold value; the target probability is used for determining whether the target input parameters contain the character recognition result output by the last recurrent neural network;

and if the target probability is greater than or equal to a preset probability threshold, determining that the target input parameters of each cyclic neural network comprise the character recognition result output by the previous cyclic neural network and the feature vector of the image to be recognized.

3. The method of claim 2, wherein determining the target probability for each recurrent neural network comprises:

randomly generating the target probability for each recurrent neural network by a probability generator;

or;

4. The method of claim 1, wherein if the input data of all first recurrent neural networks does not include the output data of the last first recurrent neural network connected thereto, the text recognition model further comprises: a target language model; the target language model includes: and the input data of all the second recurrent neural networks in the plurality of second recurrent neural networks comprises the output data of the second recurrent neural network connected with the input data, and the plurality of second recurrent neural networks are connected with the plurality of first recurrent neural networks in a one-to-one correspondence manner.

5. A method for training a character recognition model, wherein the character recognition model comprises: the attention model is connected with each first recurrent neural network, and the first recurrent neural networks are connected in series, wherein the input data of part or all of the first recurrent neural networks in the first recurrent neural networks does not contain the output data of the last first recurrent neural network connected with the input data; the method comprises the following steps:

processing the feature vectors of the corpus of the training set through an attention model to obtain an attention weight value of each first circulation neural network;

determining target input parameters for each first recurrent neural network, wherein the target input parameters include: feature vectors of the image to be recognized, or feature vectors of the training set corpus and character recognition results output by a last first cyclic neural network of the target first cyclic neural network;

and training the character recognition model by using the target input parameters, the attention weight values and target label information to obtain the trained character recognition model, wherein the target label information is an actual character sequence contained in the corpus of the training set.

6. The method of claim 5, wherein determining the target input parameters for each first recurrent neural network comprises:

if it is determined that a corresponding target probability is preset for each first recurrent neural network, judging whether the target probability is greater than or equal to a preset probability threshold value; wherein, the target probability is used for determining whether the target input parameter contains the character recognition result output by the last first recurrent neural network;

7. The method of claim 6, further comprising:

randomly generating the target probability for each first recurrent neural network by a probability generator;

or;

8. The method of claim 5, wherein if the input data of all of the first recurrent neural networks in the plurality of first recurrent neural networks does not include the output data of the last connected first recurrent neural network, the text recognition model further comprises: a target language model; the target language model includes: and the input data of all the second recurrent neural networks in the plurality of second recurrent neural networks comprises the output data of the second recurrent neural network connected with the input data, and the plurality of second recurrent neural networks are connected with the plurality of first recurrent neural networks in a one-to-one correspondence manner.

9. The method of claim 8, further comprising:

acquiring a character recognition result output by the last first recurrent neural network in the plurality of first recurrent neural networks to obtain a first output result;

acquiring a character recognition result output by the last second recurrent neural network in the plurality of second recurrent neural networks to obtain a second output result;

calculating a target loss value by using the first output result and the second output result;

and training the character recognition model through the target loss value.

10. The method according to any one of claims 5 to 9, wherein the recurrent neural network is a long-short term memory network (LSTM).

11. A character recognition apparatus, applied to a character recognition model, the character recognition model comprising: the attention model is connected with each recurrent neural network, and the recurrent neural networks are connected in series, wherein the input data of part or all of the recurrent neural networks in the recurrent neural networks does not contain the output data of the recurrent neural network connected with the part or all of the recurrent neural networks; the device comprises:

the first processing unit is used for processing the feature vectors of the image to be recognized through the attention model to obtain an attention weight value of each recurrent neural network;

a first determining unit, configured to determine a target input parameter of each recurrent neural network, wherein the target input parameter includes: the feature vector of the image to be recognized, or the feature vector of the image to be recognized and a character recognition result output by a last recurrent neural network of the current recurrent neural network;

and the second processing unit is used for inputting the target input parameters and the attention weight values into each cyclic neural network for processing to obtain character recognition results, and determining the character recognition result output by the last cyclic neural network as the character recognition result of the image to be recognized, wherein the character recognition result represents the probability that the character to be recognized belongs to each preset character.

12. An apparatus for training a character recognition model, the character recognition model comprising: the attention model is connected with each first recurrent neural network, and the first recurrent neural networks are connected in series, wherein the input data of part or all of the first recurrent neural networks in the first recurrent neural networks does not contain the output data of the last first recurrent neural network connected with the input data; the method comprises the following steps:

the third processing unit is used for processing the feature vectors of the corpus of the training set through the attention model to obtain the attention weight value of each first cyclic neural network;

a second determining unit, configured to determine a target input parameter of each first recurrent neural network, where the target input parameter includes: feature vectors of the image to be recognized, or feature vectors of the training set corpus and character recognition results output by a last first cyclic neural network of the target first cyclic neural network;

and the training unit is used for training the character recognition model by using the target input parameters, the attention weight values and target label information to obtain the trained character recognition model, wherein the target label information is an actual character sequence contained in the corpus of the training set.

13. An electronic device comprising a memory, a processor and a computer program stored on the memory and operable on the processor, wherein the processor implements the steps of the method for word recognition according to any of the preceding claims 1 to 4 or the method for training a word recognition model according to any of the preceding claims 5 to 10 when executing the computer program.

14. A computer-readable medium having non-volatile program code executable by a processor, the program code causing the processor to perform the steps of the method of text recognition according to any one of claims 1 to 4 or the method of training a text recognition model according to any one of claims 5 to 10.