WO2023273516A1

WO2023273516A1 - Character recognition method and apparatus, neural network training method and apparatus, and neural network, storage medium and electronic device

Info

Publication number: WO2023273516A1
Application number: PCT/CN2022/086989
Authority: WO
Inventors: 张正夫; 梁鼎
Original assignee: 上海商汤智能科技有限公司
Priority date: 2021-06-28
Filing date: 2022-04-15
Publication date: 2023-01-05
Also published as: CN113298188A

Abstract

Provided in the embodiments of the present disclosure are a character recognition method and apparatus, a neural network training method and apparatus, and a neural network, a storage medium and an electronic device. A character is recognized from a target image by using a convolutional neural network and a classifier. Since there is no need to use a recurrent neural network, the character recognition efficiency is improved; in addition, a convolutional neural network is established by means of selecting a relatively good target hyper-parameter from a plurality of groups of candidate hyper-parameters, so as to ensure a receptive field of the convolutional neural network, such that an extracted feature can include more effective information. Therefore, a target neural network comprising a convolutional neural network and a classifier can generally meet a preset constraint condition, such that the target neural network can obtain a relatively high character recognition accuracy.

Description

Character recognition and neural network training method and device, neural network, storage medium and electronic equipment

cross-reference statement

This application claims the priority of the Chinese Patent Application No. 202110719983.4 filed with the China Patent Office on June 28, 2021, the entire contents of which are incorporated herein by reference.

technical field

The present disclosure relates to the technical field of character recognition, in particular to character recognition and neural network training methods and devices, neural networks, computer-readable storage media and electronic equipment.

Background technique

Character recognition is an important research direction in the field of computer vision and has a wide range of application scenarios. Taking the recognition of characters arranged in the horizontal direction as an example, the character recognition method in the related technology needs to use the cyclic neural network to extract the connection between the image features in the horizontal direction, so that the extracted features can contain more effective information. Improve the accuracy of character recognition. However, the processing time of the recurrent neural network is long, resulting in low efficiency of character recognition.

Contents of the invention

The disclosure provides a character recognition and neural network training method and device, a computer-readable storage medium and electronic equipment.

According to the first aspect of an embodiment of the present disclosure, there is provided a method for character recognition, the method comprising: performing feature extraction on a target image through a convolutional neural network of the target neural network to obtain a feature sequence of the target image; through the The classifier of the target neural network classifies the characters corresponding to the feature sequence to obtain the category of the character corresponding to the feature sequence; character recognition is performed on the target image based on the category of the character; wherein, by selecting the volume hyperparameters of the convolutional neural network, and determine the convolutional neural network from multiple groups of candidate hyperparameters based on the constraints of the target neural network.

In some embodiments, the classifying the character corresponding to the feature sequence to obtain the category of the character corresponding to the feature sequence includes: first classifying the character corresponding to the feature sequence to determine the feature A non-blank character in the characters corresponding to the sequence; performing a second classification on the non-blank character to determine the category of the non-blank character.

In some embodiments, the first classifying the characters corresponding to the feature sequence to determine the non-blank characters in the characters corresponding to the feature sequence includes: determining that the characters corresponding to the feature sequence belong to non-blank characters Probability of ; Characters with a probability greater than a preset probability threshold are determined as non-blank characters.

In some embodiments, the method further includes: searching the multiple sets of candidate hyperparameters from the search space of the hyperparameters; for each set of candidate hyperparameters, establishing an initial candidate convolutional neural network based on the set of candidate hyperparameters network; the initial candidate neural network comprising the initial candidate convolutional neural network and the initial classifier is trained to obtain a candidate neural network, and the candidate neural network comprises a candidate convolutional neural network and a candidate classifier; in the candidate neural network When the network satisfies the constraints, the candidate convolutional neural network is determined as the convolutional neural network of the target neural network, and the candidate classifier is determined as the target neural network. Classifier.

In some embodiments, the classifier includes a first sub-classifier and a second sub-classifier; the first sub-classifier is used to first classify characters corresponding to the feature sequence to determine the feature sequence Non-blank characters in the corresponding characters; the second sub-classifier is used to perform a second classification on the non-blank characters to determine the category of the non-blank characters; the pair includes the initial candidate convolutional neural network And the initial candidate neural network of the initial classifier is trained, including: based on the first sample image, the initial candidate convolutional neural network and the initial second sub-classifier are first trained to obtain the candidate convolutional neural network , the intermediate candidate neural network of the initial first sub-classifier and the candidate second sub-classifier; the network parameters of the candidate convolutional neural network and the candidate second sub-classifier in the intermediate candidate neural network are fixed, and the The output result of the candidate second sub-classifier is used as supervision information, and the second training is performed on the initial first sub-classifier based on the second sample image to obtain the candidate convolutional neural network, the candidate first sub-classifier and said candidate neural network of said candidate second sub-classifier.

In some embodiments, the constraints include an upper limit of the character recognition time of the target neural network, and the method further includes: acquiring the time spent by the candidate neural network for character recognition on the test image; If it is less than the upper limit of the duration, it is determined that the candidate neural network satisfies the constraint condition.

In some embodiments, the constraints include the lower limit of the character recognition accuracy of the target neural network; the method further includes: obtaining the recognition accuracy of the character recognition of the candidate neural network on the test image; When the recognition accuracy is higher than the lower limit of accuracy, it is determined that the candidate neural network satisfies the constraint condition.

In some embodiments, the hyperparameters include at least one of the following: the depth of the convolutional neural network, the number of channels, the size of the convolution kernel, the number of downsampling times, the downsampling method, and the convolutional neural network. The location of the downsampled network layer, the resolution of the target image input to the convolutional neural network.

According to a second aspect of an embodiment of the present disclosure, a neural network training method is provided, the method comprising: using a sample image, respectively training each of a plurality of initial candidate neural networks to obtain a plurality of candidate neural networks, wherein Each initial candidate neural network includes: an initial candidate convolutional neural network for feature extraction of the sample image to obtain a feature sequence of the sample image; candidate hyperparameters of each initial candidate convolutional neural network are at least partially different; an initial classifier, configured to classify the characters corresponding to the feature sequence to obtain the category of the character corresponding to the feature sequence, and the category of the character is used to perform character recognition on the sample image; from the plurality of candidate The target neural network that satisfies the constraints is screened out from the neural network.

In some embodiments, the initial classifier includes a first initial sub-classifier and a second initial sub-classifier; the first initial sub-classifier is used to first classify characters corresponding to the feature sequence to determine The non-blank characters in the character corresponding to the feature sequence of the sample image; the second initial sub-classifier is used to perform a second classification on the non-blank characters to determine the category of the non-blank characters; image, respectively training each of the plurality of initial candidate neural networks, including: performing first training on the initial candidate convolutional neural network and the initial second sub-classifier based on the first sample image, and obtaining the first training comprising The intermediate candidate neural network of the candidate convolutional neural network, the initial first subclassifier and the candidate second subclassifier; fixing the candidate convolutional neural network and the candidate second subclassifier in the intermediate candidate neural network Network parameters, using the output of the candidate second sub-classifier as supervisory information, and performing second training on the initial first sub-classifier based on the second sample image, to obtain the candidate convolutional neural network, candidate The candidate neural network of the first sub-classifier and the candidate second sub-classifier.

According to a third aspect of an embodiment of the present disclosure, a neural network is provided, and the neural network includes: a convolutional neural network, configured to extract features of a target image, to obtain a feature sequence of the target image; and a classifier, used To classify the characters corresponding to the feature sequence to obtain the category of the character corresponding to the feature sequence, and the category of the character is used to perform character recognition on the target image; wherein, by selecting the convolutional neural network Hyperparameters, based on the constraints of the neural network, determine the convolutional neural network from multiple sets of candidate hyperparameters.

According to a fourth aspect of an embodiment of the present disclosure, there is provided a character recognition device, the device comprising: a feature extraction module, configured to perform feature extraction on a target image through a convolutional neural network of the target neural network to obtain a feature of the target image A feature sequence; a classification module, configured to classify the character corresponding to the feature sequence through a classifier of the target neural network, to obtain the category of the character corresponding to the feature sequence; a recognition module, used to classify based on the character Performing character recognition on the target image; wherein, by selecting the hyperparameters of the convolutional neural network and based on the constraints of the target neural network, the convolutional neural network is determined from multiple groups of candidate hyperparameters.

In some embodiments, the classification module includes: a first classification unit, configured to first classify the characters corresponding to the feature sequence, so as to determine non-blank characters in the characters corresponding to the feature sequence; second classification A unit, configured to perform a second classification on the non-blank characters to determine the category of the non-blank characters.

In some embodiments, the first classification unit includes: a probability determination subunit, configured to determine the probability that the character corresponding to the feature sequence belongs to a non-blank character; a classification subunit, configured to classify characters whose probability is greater than a preset probability threshold Characters are determined to be non-blank characters.

In some embodiments, the apparatus further includes: a search module, configured to search the multiple sets of candidate hyperparameters from the hyperparameter search space; a network building module, configured to, for each set of candidate hyperparameters, based on the The set of candidate hyperparameters establishes an initial candidate convolutional neural network; the training module is used to train an initial candidate neural network comprising the initial candidate convolutional neural network and an initial classifier to obtain a candidate neural network, and the candidate neural network Including a candidate convolutional neural network and a candidate classifier; a first determination module, configured to determine the candidate convolutional neural network as the target neural network when the candidate neural network satisfies the constraint condition the convolutional neural network, and determine the candidate classifier as the classifier of the target neural network.

In some embodiments, the classifier includes a first sub-classifier and a second sub-classifier; the first sub-classifier is used to first classify characters corresponding to the feature sequence to determine the feature sequence The non-blank character in the corresponding character; The second subclassifier is used to carry out the second classification to the non-blank character to determine the category of the non-blank character; the training module includes: a first training unit, using Based on the first sample image, the initial candidate convolutional neural network and the initial second subclassifier are first trained to obtain the candidate convolutional neural network, the initial first subclassifier and the candidate second subclassifier. The intermediate candidate neural network of the device; the second training unit is used to fix the network parameters of the candidate convolutional neural network and the candidate second sub-classifier in the intermediate candidate neural network, and classify the candidate second sub-classifier The output result of the detector is used as the supervisory information, and the second training is performed on the initial first sub-classifier based on the second sample image, and the candidate convolutional neural network, the candidate first sub-classifier and the candidate second sub-classifier are obtained. The candidate neural network for the sub-classifier.

In some embodiments, the constraint condition includes an upper limit of the character recognition time of the target neural network, and the device further includes: a first acquisition module, configured to acquire the time spent by the candidate neural network to perform character recognition on the test image duration; a second determination module, configured to determine that the candidate neural network satisfies the constraint condition when the duration is less than the upper limit of the duration.

In some embodiments, the constraints include the lower limit of the accuracy of character recognition performed by the target neural network; the device further includes: a second acquisition module, configured to acquire the character recognition accuracy of the candidate neural network for the test image. Recognition accuracy; a third determining module, configured to determine that the candidate neural network satisfies the constraints when the recognition accuracy is higher than the lower limit of accuracy.

According to a fifth aspect of an embodiment of the present disclosure, there is provided a neural network training device, the device comprising: a training module, configured to use a sample image to train each of a plurality of initial candidate neural networks to obtain a plurality of Candidate neural networks, wherein each initial candidate neural network includes: an initial candidate convolutional neural network for feature extraction of the sample image to obtain a feature sequence of the sample image; candidate superstructures of each initial candidate convolutional neural network The parameters are at least partially different; the initial classifier is used to classify the characters corresponding to the feature sequence to obtain the category of the character corresponding to the feature sequence, and the category of the character is used to perform character recognition on the sample image; screening A module, configured to screen out a target neural network satisfying constraints from the plurality of candidate neural networks.

In some embodiments, the initial classifier includes a first initial sub-classifier and a second initial sub-classifier; the first initial sub-classifier is used to first classify characters corresponding to the feature sequence to determine The non-blank character in the character corresponding to the feature sequence of the sample image; the second initial sub-classifier is used to perform a second classification on the non-blank character to determine the category of the non-blank character; the training module Including: a first training unit, which is used to perform first training on the initial candidate convolutional neural network and the initial second sub-classifier based on the first sample image, to obtain the candidate convolutional neural network, the initial first sub-classifier and the intermediate candidate neural network of the candidate second sub-classifier; the second training unit is used to fix the network parameters of the candidate convolutional neural network and the candidate second sub-classifier in the intermediate candidate neural network, and the The output result of the candidate second sub-classifier is used as supervision information, and the second training is performed on the initial first sub-classifier based on the second sample image to obtain the candidate convolutional neural network, the candidate first sub-classifier and said candidate neural network of said candidate second sub-classifier.

According to a sixth aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method described in any embodiment is implemented.

According to a seventh aspect of the embodiments of the present disclosure, there is provided an electronic device, including a memory, a processor, and a computer program stored in the memory and operable on the processor, when the processor executes the program, any implementation The method described in the example.

According to an eighth aspect of the embodiments of the present disclosure, a computer program is provided, the computer program includes computer readable code, and when the computer readable code is executed by a processor, the method described in any embodiment is implemented.

Embodiments of the present disclosure use a convolutional neural network and a classifier to identify characters from target images. On the one hand, the efficiency of character recognition is improved due to the absence of a cyclic neural network; Select better target hyperparameters to build a convolutional neural network to ensure the receptive field of the convolutional neural network, so that the extracted features can contain more effective information. Therefore, the target neural network including the convolutional neural network and the classifier can generally meet the preset constraint conditions, so that the target neural network can obtain higher character recognition accuracy.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Description of drawings

The accompanying drawings here are incorporated into the specification and constitute a part of the specification. These drawings show embodiments consistent with the present disclosure, and are used together with the description to explain the technical solutions of the present disclosure.

FIG. 1 is a schematic diagram of a neural network used for character recognition in the related art.

FIG. 2 is a flowchart of a character recognition method according to an embodiment of the present disclosure.

FIG. 3 is a schematic diagram of an overall process of character recognition in the related art.

Fig. 4 is a schematic diagram of a classification method of a classifier according to an embodiment of the present disclosure.

FIG. 5 is a schematic diagram of a two-stage training method of an embodiment of the present disclosure.

Fig. 6 is a flowchart of a neural network training method according to an embodiment of the present disclosure.

FIG. 7 is a schematic diagram of a neural network of an embodiment of the present disclosure.

FIG. 8 is a block diagram of a character recognition device according to an embodiment of the present disclosure.

FIG. 9 is a block diagram of a neural network training device according to an embodiment of the present disclosure.

FIG. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

detailed description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatuses and methods consistent with aspects of the present disclosure as recited in the appended claims.

The terminology used in the present disclosure is for the purpose of describing particular embodiments only, and is not intended to limit the present disclosure. As used in this disclosure and the appended claims, the singular forms "a", "the", and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It should also be understood that the term "and/or" as used herein refers to and includes any and all possible combinations of one or more of the associated listed items. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality.

It should be understood that although the terms first, second, third, etc. may be used in the present disclosure to describe various information, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of the present disclosure, first information may also be called second information, and similarly, second information may also be called first information. Depending on the context, the word "if" as used herein may be interpreted as "at" or "when" or "in response to a determination."

In order to enable those skilled in the art to better understand the technical solutions in the embodiments of the present disclosure, and to make the above-mentioned purposes, features and advantages of the embodiments of the present disclosure more obvious and understandable, the technical solutions in the embodiments of the present disclosure are described below in conjunction with the accompanying drawings The program is described in further detail.

Character recognition generally refers to recognizing characters from images, and the characters may include text (for example, numbers, Chinese characters, English letters, etc.) and symbols (for example, arithmetic operation symbols, logical operation symbols, etc.). In related technologies, character recognition is generally performed through a neural network. A neural network for character recognition is shown in FIG. 1 , which may include a convolutional neural network 101 , a recurrent neural network 102 , a classifier 103 and a decoder 104 . Among them, the convolutional neural network 101 is used to extract features from the image to obtain a feature sequence, the cyclic neural network 102 is used to encode the feature sequence, the classifier 103 is used to classify the encoded feature sequence, and the decoder is used to classify The classification result of the device 103 is decoded to recognize the characters in the image. However, the processing process of the recurrent neural network takes a long time, resulting in low efficiency of character recognition.

Based on this, an embodiment of the present disclosure provides a character recognition method, as shown in FIG. 2 , the method includes:

Step 201: performing feature extraction on the target image through the convolutional neural network of the target neural network to obtain a feature sequence of the target image;

Step 202: Classify the characters corresponding to the feature sequence of the target image by the classifier of the target neural network to obtain the category of the character corresponding to the feature sequence of the target image;

Step 203: Perform character recognition on the target image based on the character category corresponding to the feature sequence of the target image;

Wherein, by selecting hyperparameters of the convolutional neural network, the convolutional neural network is determined from multiple groups of candidate hyperparameters based on constraints of the target neural network.

In step 201, the target image may be an image including characters. For example, images of objects such as billboards and certificates collected from real scenes, images generated by screen recording, images generated by format conversion, or images generated by other methods. The characters in the target image may be characters in various fonts such as handwriting and printing, which is not limited in the present disclosure. The characters may include at least one of numerals, Chinese characters, kana, symbols, and the like. The target image may include one or more characters, and the multiple characters may be arranged on the target image regularly or irregularly, for example, may be arranged on the target image along a horizontal direction.

Feature extraction can be performed on the target image to obtain a feature map F1. Assuming that the size of the feature map F1 is 32×512×128, that is, including 32 rows, 512 columns, and 128 channels, the feature map F1 can be down-sampled first, for example, the size of the down-sampled feature map F2 is 4×64 ×128. For characters written horizontally (that is, arranged on the target image along the horizontal direction), generally more attention is paid to the relationship between horizontal features. Therefore, the downsampled feature map F2 can be pooled in the vertical direction ( For example, maximum pooling processing or average pooling processing, etc.) to obtain a feature map F3 of 1×64×128. The features of each channel of each horizontal pixel position in the feature map F3 are determined as a feature sequence, and a total of 64 128-dimensional feature sequences t are obtained.

In some embodiments, target hyperparameters can be determined for the convolutional neural network based on the constraints of the entire target neural network including the convolutional neural network and the classifier, thereby enabling the convolutional neural network to replace loops The function of the neural network, and then remove the recurrent neural network from the entire target neural network, and make the performance of the entire target neural network equal to the performance before removing the recurrent neural network.

The hyperparameters of the convolutional neural network are related to the receptive field of the convolutional neural network, so that the features extracted by the convolutional neural network can contain enough effective information by reasonably selecting the hyperparameters of the convolutional neural network. The hyperparameters may include at least one of the following: the depth of the convolutional neural network, the number of channels, the size of the convolution kernel, the number of downsampling times, the downsampling method, and the network layer for downsampling in the convolutional neural network The location, the resolution of the target image input to the convolutional neural network. Among them, the depth of the convolutional neural network is the number of layers of the convolutional neural network. The more layers, the stronger the feature extraction ability of the convolutional neural network, and the more effective information can be contained in the features extracted at one time, but the cost of feature extraction is The time is also longer. The number of channels of the convolutional neural network is also called the width of the convolutional neural network. The more the number of channels, the larger the receptive field of the convolutional neural network, and the more effective information can be contained in the features extracted at one time. The larger the size of the convolution kernel, the larger the receptive field of the convolutional neural network. The downsampling manner may include a downsampling manner with parameters (for example, performing downsampling through convolution processing) and a downsampling manner without parameters (for example, performing downsampling through pooling processing). The position of the network layer for downsampling in the convolutional neural network is located, that is, which layers of the convolutional neural network are downsampled, that is, the timing of downsampling. The timing of downsampling is earlier, which can reduce the amount of calculation in the character recognition process, but it will also lead to insufficient feature extraction; relatively, the timing of downsampling is later, it can fully extract features, but it will increase the amount of calculation. In addition, the higher the resolution of the target image input to the convolutional neural network, the more effective information obtained by one feature extraction. Therefore, it is necessary to reasonably select the target hyperparameters of the convolutional neural network, so as to ensure the receptive field of the convolutional neural network, so that the extracted features can contain enough effective information, and make the entire target neural network meet certain constraints . The constraints may include at least one of the following: an upper limit of the character recognition time of the target neural network; a lower limit of the character recognition accuracy of the target neural network.

The target hyperparameters may be selected from multiple groups of candidate hyperparameters, for example, may be obtained by searching in a grid search manner. Among them, the search space and search step size of each of the above hyperparameters are first defined. For example, the search space defining the depth of the convolutional neural network is N1 to N2, and the search space defining the number of channels of the convolutional neural network is C1 to C2. The search space that defines the number of downsampling is from 3 to 8 times. Then, determine the various combinations of the above hyperparameters, for example, the combination scheme 1 is {the depth of the convolutional neural network is N1, the number of channels is C1, and the number of downsampling is 3 times}, and the combination scheme 2 is {convolutional neural network The depth is N1, the number of channels is C1, the number of downsampling is 4}, and so on. Multiple combination schemes can be determined first, and then the combination scheme that makes the entire target neural network meet the constraint conditions is selected from the multiple combination schemes as the target hyperparameter. Alternatively, one combination scheme can be selected each time, and then it is determined whether the combination scheme can make the entire target neural network meet the constraint conditions, if not, continue the search, and if yes, stop the search. Alternatively, all combination schemes can be traversed, and the optimal combination scheme that makes the entire target neural network meet the constraint conditions can be selected as the target hyperparameter.

The hyperparameters corresponding to a combined scheme are called a set of candidate hyperparameters. An initial candidate convolutional neural network can be established based on the group of candidate hyperparameters, and the initial candidate neural network including the initial candidate convolutional neural network and the initial classifier is trained to obtain a candidate convolutional neural network and the classifier. candidate neural network, and then determine whether the candidate neural network meets the constraints. Wherein, the candidate neural network may be tested based on the test image to determine whether the candidate neural network satisfies the constraint condition. For example, in a case where the duration is less than the upper limit of the duration, it is determined that the candidate neural network satisfies the constraint condition. Or, in a case where the recognition accuracy is higher than the lower limit of accuracy, it is determined that the candidate neural network satisfies the constraint condition. Alternatively, in a case where the duration is less than the upper limit of the duration and the recognition accuracy is higher than the lower limit of accuracy, it is determined that the candidate neural network satisfies the constraint condition.

In step 202, a classifier composed of a fully connected layer may be used to classify characters corresponding to each feature sequence t. Certainly, the type of the classifier may also be other types, which is not limited in the present disclosure. Each character corresponds to a category, for example, the Chinese character "you" corresponds to category 1, "good" corresponds to category 2, the number "1" corresponds to category 3, the number "2" corresponds to category 4, the symbol "+" corresponds to category 5, etc. .

In related technologies, since there is no explicit alignment relationship between the character sequence obtained after decoding and the feature sequence before decoding, for example, assuming that the dimension of the feature sequence t is 5, including the 5 features t0 to t4, the decoded character If the sequence is "al", the length of the decoded character sequence is 2, which cannot be aligned with the feature sequence. In order to solve this problem, the features at multiple pixel positions corresponding to the same character will be recognized as the same character, for example, the above feature sequence will be recognized as a character sequence including multiple consecutive repeated characters such as "aaall" or "aalll" . However, the above-mentioned method cannot recognize the situation that the character originally includes two or more characters that are repeated, for example, the two "l" in the English word "hello", or the two "l" in the idiom "smug" ". In order to solve the above situation, a blank character can be inserted between characters. A blank character is a special character used to insert between characters. Assuming that the symbol "-" is used to represent a blank character, several blank characters can be inserted before "a", after "l" and between "a" and/or "l" of the character sequence "al", for example, to obtain the characters The sequence "--aaa---ll-". Then, through CTC (Connectionist Temporal Classification) decoding, the repeated characters in the character sequence are merged to obtain the merged character sequence "-a-l-", and the blank character is removed from the merged characters to obtain the character sequence "al" , the above process is shown in Figure 3.

In practical applications, most of the characters in a character sequence are blank characters, and only a few characters are characters in the target image. However, when CTC decoding is performed, blank characters and non-blank characters are not distinguished, but all characters in the character sequence are classified in the same manner. Taking the Chinese character recognition process as an example, the number of Chinese characters is about 20,000. Every time a character is classified, it is necessary to determine the category of the character to be classified from the 20,000 Chinese characters. Therefore, the efficiency of the above character classification process is low.

In order to solve the above problem, as shown in FIG. 4 , the embodiment of the present disclosure first classifies the characters corresponding to the feature sequence of the target image to determine the non-blank characters in the characters corresponding to the feature sequence of the target image; A second classification is performed on the non-blank characters to determine the category of the non-blank characters. The above process first performs a binary classification to distinguish blank characters and non-blank characters, and then classifies 20,000 Chinese characters for non-blank characters, while blank characters do not need to be classified. Since blank characters do not need to be classified, the efficiency of the classification process is effectively improved.

When performing binary classification of blanks and non-blanks, the probability of characters corresponding to the feature sequence of the target image belonging to non-blanks can be determined; characters with a probability greater than a preset probability threshold are determined as non-blanks. Since the time spent on the binary classification is relatively short, the above process can effectively improve the efficiency of character classification and save time in the character classification process.

Further, under the condition that the upper limit of the character recognition time of the target neural network is used as a constraint, since the time consumption in the classification process is reduced, more time can be allocated to the convolutional neural network, so that the convolution The product neural network can adopt better parameters (for example, greater depth, more number of channels) to ensure the feature extraction ability, thereby further improving the accuracy of character recognition.

The above process can be realized by using two sub-classifiers, wherein the first sub-classifier is used for the first classification, and the second sub-classifier is used for the second classification. In this case, the initial candidate neural network including the initial candidate convolutional neural network and the initial classifier may be trained in a two-stage training manner. Specifically, as shown in Figure 5, the initial candidate neural network before training includes an initial candidate convolutional neural network, an initial classifier, and a CTC decoder, where the initial classifier includes an initial first sub-classifier and an initial second sub-classifier. Classifier, CTC decoder is a parameter-free decoder without optimization. First, the network parameters of the initial first sub-classifier can be fixed, and the initial candidate neural network, that is, the initial candidate convolutional neural network and the initial second sub-classifier, is first trained to obtain the candidate convolutional neural network, Intermediate candidate neural network for initial first sub-classifier and candidate second sub-classifier. Then, the network parameters of the candidate convolutional neural network and the candidate second sub-classifier in the intermediate candidate neural network are fixed, and the output result of the candidate second sub-classifier is used as supervision information, and the initial The first sub-classifier performs second training to obtain the candidate neural network including the candidate convolutional neural network, the candidate first sub-classifier, and the candidate second sub-classifier.

Each initial candidate neural network can be trained in the above manner to obtain the corresponding candidate neural network, and then determine the target neural network satisfying the constraints from each candidate neural network, and the candidate convolutional neural network in the selected candidate neural network The network, the candidate first sub-classifier and the candidate second sub-classifier are respectively the convolutional neural network, the first sub-classifier and the second sub-classifier in the target neural network.

In the first training process, sample images with labeled information can be used. The annotation information can be calibrated manually or in other ways in advance, and the annotation information is used to determine the ground truth of the characters in the sample image, including the annotation information used to indicate blank characters and the character category used to indicate non-blank characters Callout information. In this training phase, there is no need to train the first sub-classifier for binary classification. After the first training, the second sub-classifier can classify specific characters of blanks and non-blanks. In the second training process, the output result of the second sub-classifier (probability of blank/non-blank) is used as supervisory information to train the first sub-classifier.

The input and output of the entire character recognition process are as follows:

(1) Feature extraction

Input: the target image to be recognized (such as a text image);

Output: sequence of features.

In order to improve the reasoning speed of the model, the design is based on lightweight convolutional neural networks such as MobileNet, and the hyperparameters such as the depth, width, and downsampling strategy of the convolutional neural network are searched carefully to obtain the optimal speed and accuracy. Neural Networks.

(2) Blank prediction

input: feature sequence;

Output: The probability of a blank character at each position (such as a character) in the sequence.

Use a linear classifier with 2 categories to predict whether each position in the feature sequence is a blank character.

(3) Classification of non-blank characters

Input: feature sequence, the probability that each position in the sequence is a blank character;

Output: Character classification confidence for non-whitespace characters in the sequence.

According to the probability that each position in the sequence is a blank character, the feature vector of the non-blank character position in the sequence is taken out, and the fully connected layer is used for character classification.

(4) CTC decoding

Input: the probability of a blank character at each position in the sequence, the character classification confidence of non-blank characters in the sequence;

Output: Predicted string.

According to the probability of blank characters at each position in the sequence and the character classification confidence of non-blank characters in the sequence, the character classification confidence of the feature sequence is restored and decoded using CTC.

As shown in Figure 6, the embodiment of the present disclosure also provides a neural network training method, the method comprising:

Step 601: Using sample images, train each of the multiple initial candidate neural networks respectively to obtain multiple candidate neural networks, where each initial candidate neural network includes:

The initial candidate convolutional neural network is used to perform feature extraction on the sample image to obtain a feature sequence of the sample image; the candidate hyperparameters of each initial candidate convolutional neural network are at least partially different;

The initial classifier is used to classify the characters corresponding to the feature sequence of the sample image to obtain the category of the character corresponding to the feature sequence of the sample image, and the category of the character corresponding to the feature sequence of the sample image is used to classify the characters corresponding to the feature sequence of the sample image. The above sample image is used for character recognition;

Step 602: Screen out a target neural network satisfying the constraints from the plurality of candidate neural networks.

In some embodiments, the initial classifier includes a first initial sub-classifier, configured to first classify the characters corresponding to the feature sequence of the sample image, so as to determine the characters corresponding to the feature sequence of the sample image The non-blank characters, and the second initial sub-classifier are used to perform a second classification on the non-blank characters to determine the category of the non-blank characters; the sample image is used to perform a plurality of initial candidate neural networks respectively Each initial candidate neural network in is trained, including: performing the first training on the initial candidate convolutional neural network and the initial second sub-classifier based on the first sample image, to obtain the candidate convolutional neural network, the initial first sub-classifier An intermediate candidate neural network of a sub-classifier and a candidate second sub-classifier; fixing the network parameters of the candidate convolutional neural network and the candidate second sub-classifier in the intermediate candidate neural network, the candidate first sub-classifier The output result of the second sub-classifier is used as supervisory information, and the second training is performed on the initial first sub-classifier based on the second sample image to obtain the candidate convolutional neural network, the candidate first sub-classifier and the Said candidate neural network is a candidate second sub-classifier.

Those skilled in the art can understand that in the above method of specific implementation, the writing order of each step does not mean a strict execution order and constitutes any limitation on the implementation process. The specific execution order of each step should be based on its function and possible The inner logic is OK.

As shown in FIG. 7, an embodiment of the present disclosure also provides a neural network, and the neural network includes:

A convolutional neural network 701, configured to extract features from the target image to obtain a feature sequence of the target image; and

A classifier 702, configured to classify the characters corresponding to the feature sequence of the target image to obtain the category of the character corresponding to the feature sequence of the target image, and the category of the character corresponding to the feature sequence of the target image is used to classify the characters corresponding to the feature sequence of the target image The target image is used for character recognition;

Wherein, by selecting hyperparameters of the convolutional neural network 701, the convolutional neural network 701 is determined from multiple groups of candidate hyperparameters based on constraints of the neural network.

In some embodiments, the classifier 702 includes a first sub-classifier 7021, configured to perform a first classification on the characters corresponding to the feature sequence of the target image, so as to determine the characters that are not in the characters corresponding to the feature sequence of the target image. a blank character; and a second subclassifier 7022, configured to perform a second classification on the non-blank character to determine the category of the non-blank character.

The neural network in the embodiments of the present disclosure can be trained by using the neural network training method described in any of the above embodiments. The trained neural network can be used to implement the character recognition method described in any of the foregoing embodiments.

In some embodiments, the neural network may further include a decoder 703, which is a function module without parameters, and may be decoded in a CTC decoding manner.

As shown in FIG. 8 , an embodiment of the present disclosure also provides a character recognition device, which includes:

The feature extraction module 801 is used to extract the features of the target image through the convolutional neural network of the target neural network to obtain the feature sequence of the target image;

A classification module 802, configured to classify the character corresponding to the feature sequence of the target image through the classifier of the target neural network, to obtain the category of the character corresponding to the feature sequence of the target image;

A recognition module 803, configured to perform character recognition on the target image based on the character category corresponding to the feature sequence of the target image;

In some embodiments, the classification module includes: a first classification unit, configured to perform a first classification on the characters corresponding to the feature sequence of the target image, so as to determine the characters that are not in the characters corresponding to the feature sequence of the target image Blank characters; a second classification unit, configured to perform a second classification on the non-blank characters to determine the category of the non-blank characters.

In some embodiments, the first classification unit includes: a probability determination subunit, configured to determine the probability that a character corresponding to the feature sequence of the target image belongs to a non-blank character; a classification subunit, configured to set the probability greater than a preset Characters for the probability threshold are determined to be non-blank characters.

In some embodiments, the apparatus further includes: a search module, configured to search multiple sets of candidate hyperparameters from the hyperparameter search space; a network building module, configured to establish an initial candidate convolution based on each set of candidate hyperparameters Neural network; a training module, for training the initial candidate neural network comprising the initial candidate convolutional neural network and the initial classifier to obtain a candidate neural network, the candidate neural network comprising a candidate convolutional neural network and a candidate classifier ; a first determination module, configured to determine the candidate convolutional neural network as the convolutional neural network of the target neural network and classify the candidate when the candidate neural network satisfies the constraint condition The classifier is determined as the target neural network.

In some embodiments, the classifier includes a first sub-classifier, configured to perform a first classification on characters corresponding to the feature sequence of the target image, so as to determine characters that are not in the characters corresponding to the feature sequence of the target image. Blank characters, and a second sub-classifier, used to perform a second classification on the non-blank characters to determine the category of the non-blank characters; the training module includes: a first training unit, used to fix the initial first The network parameters of the sub-classifier, based on the first sample image, the initial candidate neural network, that is, the initial candidate convolutional neural network and the initial second sub-classifier are first trained to obtain the candidate convolutional neural network The intermediate candidate neural network of the network, the initial first sub-classifier and the candidate second sub-classifier; the second training unit is used to fix the candidate convolutional neural network and the candidate second sub-classifier in the intermediate candidate neural network. The network parameters of the classifier, using the output of the candidate second sub-classifier as supervisory information, and performing second training on the initial first sub-classifier based on the second sample image, to obtain the candidate convolutional neural network parameters network, a candidate first sub-classifier, and said candidate neural network for said candidate second sub-classifier.

As shown in Figure 9, the embodiment of the present disclosure also provides a neural network training device, the device includes:

The training module 901 is configured to use sample images to train each of the multiple initial candidate neural networks respectively to obtain multiple candidate neural networks, wherein each initial candidate neural network includes:

A screening module 902, configured to screen out a target neural network satisfying constraints from the plurality of candidate neural networks.

In some embodiments, the initial classifier includes a first initial sub-classifier, configured to first classify the characters corresponding to the feature sequence of the sample image, so as to determine the characters corresponding to the feature sequence of the sample image The non-blank character, and the second initial sub-classifier are used to carry out the second classification to the non-blank character, to determine the category of the non-blank character; the training module includes: a first training unit, for based on The first sample image performs the first training on the initial candidate convolutional neural network and the initial second sub-classifier to obtain intermediate candidates including the candidate convolutional neural network, the initial first sub-classifier and the candidate second sub-classifier Neural network; the second training unit is used to fix the network parameters of the candidate convolutional neural network and the candidate second sub-classifier in the intermediate candidate neural network, and output the output result of the candidate second sub-classifier As supervision information, and based on the second sample image, the initial first sub-classifier is trained for the second time to obtain the candidate convolutional neural network, the candidate first sub-classifier and the candidate second sub-classifier. The candidate neural network.

In some embodiments, the functions or modules included in the device provided by the embodiments of the present disclosure can be used to execute the methods described in the method embodiments above, and its specific implementation can refer to the description of the method embodiments above. For brevity, here No longer.

The embodiment of this specification also provides an electronic device, which at least includes a memory, a processor, and a computer program stored in the memory and operable on the processor, wherein, when the processor executes the program, the computer program described in any of the preceding embodiments is implemented. described method.

FIG. 10 shows a schematic diagram of a more specific hardware structure of an electronic device provided by the embodiment of this specification. The device may include: a processor 1001 , a memory 1002 , an input/output interface 1003 , a communication interface 1004 and a bus 1005 . The processor 1001 , the memory 1002 , the input/output interface 1003 and the communication interface 1004 are connected to each other within the device through the bus 1005 .

The processor 1001 can be implemented by a general-purpose CPU (Central Processing Unit, central processing unit), a microprocessor, an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute related program to implement the character recognition method or the neural network training method provided in the embodiment of this specification. The processor 1001 may also include a graphics card, and the graphics card may be an Nvidia titan X graphics card or a 1080Ti graphics card.

The memory 1002 can be implemented in the form of ROM (Read Only Memory, read-only memory), RAM (Random Access Memory, random access memory), static storage device, dynamic storage device, etc. The memory 1002 can store an operating system and other application programs. When implementing the technical solutions provided by the embodiments of this specification through software or firmware, the relevant program codes are stored in the memory 1002 and invoked by the processor 1001 for execution.

The input/output interface 1003 is used to connect the input/output module to realize information input and output. The input/output module can be configured in the device as a component (not shown in the figure), or can be externally connected to the device to provide corresponding functions. The input module may include a keyboard, mouse, touch screen, microphone, various sensors, etc., and the output module may include a display, a speaker, a vibrator, an indicator light, and the like.

The communication interface 1004 is used to connect with a communication module (not shown in the figure), so as to send the information of the own device to the communication module of other devices, or receive the information sent by the communication modules of other devices. The communication module can realize communication through wired methods (such as USB, network cable, etc.), and can also realize communication through wireless methods (such as mobile network, WIFI, Bluetooth, etc.).

Bus 1005 includes a path for transferring information between various components of the device (eg, processor 1001, memory 1002, input/output interface 1003, and communication interface 1004).

It should be noted that although the above device only shows the processor 1001, the memory 1002, the input/output interface 1003, the communication interface 1004, and the bus 1005, in the specific implementation process, the device may also include other components. In addition, those skilled in the art can understand that the above-mentioned device may only include components necessary to implement the solutions of the embodiments of this specification, and does not necessarily include all the components shown in the figure.

An embodiment of the present disclosure further provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method described in any one of the foregoing embodiments is implemented.

Computer-readable media, including both permanent and non-permanent, removable and non-removable media, can be implemented by any method or technology for storage of information. Information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Flash memory or other memory technology, Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cartridge, tape magnetic disk storage or other magnetic storage device or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer-readable media excludes transitory computer-readable media, such as modulated data signals and carrier waves.

It can be known from the above description of the implementation manners that those skilled in the art can clearly understand that the embodiments of this specification can be implemented by means of software plus a necessary general hardware platform. Based on this understanding, the essence of the technical solutions of the embodiments of this specification or the part that contributes to the prior art can be embodied in the form of software products, and the computer software products can be stored in storage media, such as ROM/RAM, A magnetic disk, an optical disk, etc., include several instructions to make an electronic device (which may be a personal computer, a server, or a network device, etc.) execute the methods described in various embodiments or some parts of the embodiments of this specification.

The systems, devices, modules, or units described in the above embodiments can be specifically implemented by computer chips or entities, or by products with certain functions. A typical implementing device is a computer, which may take the form of a personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media player, navigation device, e-mail device, game control device, etc. desktops, tablets, wearables, or any combination of these.

Each embodiment in this specification is described in a progressive manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, as for the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for relevant parts, please refer to part of the description of the method embodiment. The device embodiments described above are only illustrative, and the modules described as separate components may or may not be physically separated, and the functions of each module may be integrated in the same or multiple software and/or hardware implementations. Part or all of the modules can also be selected according to actual needs to achieve the purpose of the solution of this embodiment. It can be understood and implemented by those skilled in the art without creative effort.

The above is only the specific implementation of the embodiment of this specification. It should be pointed out that for those of ordinary skill in the art, without departing from the principle of the embodiment of this specification, some improvements and modifications can also be made. These Improvements and modifications should also be regarded as the scope of protection of the embodiments of this specification.

Claims

A character recognition method, comprising:

Carrying out feature extraction to the target image through the convolutional neural network of the target neural network to obtain a feature sequence of the target image;

Classifying the character corresponding to the feature sequence through the classifier of the target neural network to obtain the category of the character corresponding to the feature sequence;

performing character recognition on the target image based on the category of the character;

Wherein, by selecting hyperparameters of the convolutional neural network, the convolutional neural network is determined from multiple groups of candidate hyperparameters based on constraints of the target neural network.
The method according to claim 1, wherein the classifying the character corresponding to the feature sequence to obtain the category of the character corresponding to the feature sequence comprises:

Performing a first classification on the characters corresponding to the feature sequence to determine non-blank characters in the characters corresponding to the feature sequence;

A second classification is performed on the non-blank characters to determine the category of the non-blank characters.
The method according to claim 2, wherein the first classifying the characters corresponding to the feature sequence to determine non-blank characters in the characters corresponding to the feature sequence includes:

Determine the probability that the character corresponding to the feature sequence belongs to a non-blank character;

A character whose probability is greater than a preset probability threshold is determined as a non-blank character.
The method according to any one of claims 1-3, wherein the method further comprises:

searching the sets of candidate hyperparameters from the hyperparameter search space;

For each set of candidate hyperparameters:

establishing an initial candidate convolutional neural network based on the set of candidate hyperparameters;

Training the initial candidate neural network comprising the initial candidate convolutional neural network and the initial classifier to obtain a candidate neural network, the candidate neural network comprising a candidate convolutional neural network and a candidate classifier;

When the candidate neural network satisfies the constraint condition, determine the candidate convolutional neural network as the convolutional neural network of the target neural network, and determine the candidate classifier as the target The classifier of the neural network.
The method according to claim 4, wherein the classifier comprises a first sub-classifier and a second sub-classifier; the first sub-classifier is used to first classify characters corresponding to the feature sequence , to determine the non-blank characters in the character corresponding to the feature sequence; the second sub-classifier is used to perform a second classification on the non-blank characters, so as to determine the category of the non-blank characters; the pair includes the The initial candidate convolutional neural network and the initial candidate neural network of the initial classifier are trained, including:

Based on the first sample image, the initial candidate convolutional neural network and the initial second sub-classifier are first trained to obtain the candidate convolutional neural network, the initial first sub-classifier and the candidate second sub-classifier. The intermediate candidate neural network of ;

Fixing the network parameters of the candidate convolutional neural network and the candidate second sub-classifier in the intermediate candidate neural network, using the output of the candidate second sub-classifier as supervisory information, and based on the second sample image Performing second training on the initial first sub-classifier to obtain the candidate neural network including the candidate convolutional neural network, the candidate first sub-classifier, and the candidate second sub-classifier.
The method according to claim 4 or 5, wherein the constraints include an upper limit on the character recognition time of the target neural network, and the method also includes:

Obtain the time spent by the candidate neural network for character recognition on the test image;

If the duration is less than the upper limit of the duration, it is determined that the candidate neural network satisfies the constraint condition.
The method according to claim 4 or 5, wherein the constraints include a lower limit of accuracy for character recognition by the target neural network; the method also includes:

Acquiring the recognition accuracy of character recognition performed by the candidate neural network on the test image;

If the recognition accuracy is higher than the lower limit of accuracy, it is determined that the candidate neural network satisfies the constraint condition.
The method according to any one of claims 1-7, wherein the hyperparameters include at least one of the following: the depth of the convolutional neural network, the number of channels, the size of the convolution kernel, the number of times of downsampling, the number of downsampling The sampling method, the location of the down-sampled network layer in the convolutional neural network, and the resolution of the target image input to the convolutional neural network.
A neural network training method, comprising:

Using sample images, each of the multiple initial candidate neural networks is trained separately to obtain multiple candidate neural networks, wherein each initial candidate neural network includes:

The initial candidate convolutional neural network is used to perform feature extraction on the sample image to obtain a feature sequence of the sample image; the candidate hyperparameters of each initial candidate convolutional neural network are at least partially different;

an initial classifier, configured to classify the characters corresponding to the feature sequence to obtain the category of the character corresponding to the feature sequence, and the category of the character is used to perform character recognition on the sample image;

A target neural network satisfying the constraint condition is selected from the plurality of candidate neural networks.
The method according to claim 9, wherein the initial classifier comprises a first initial sub-classifier and a second initial sub-classifier; the first initial sub-classifier is used to characterize characters corresponding to the feature sequence performing a first classification to determine the non-blank characters in the character corresponding to the feature sequence of the sample image; the second initial subclassifier is used to perform a second classification on the non-blank characters to determine the non-blank characters The category of; said sample image is used to train each of the plurality of initial candidate neural networks respectively, including:

Based on the first sample image, the initial candidate convolutional neural network and the initial second sub-classifier are first trained to obtain the intermediate candidate convolutional neural network, the initial first sub-classifier and the candidate second sub-classifier. candidate neural network;

Fixing the network parameters of the candidate convolutional neural network and the candidate second sub-classifier in the intermediate candidate neural network, using the output of the candidate second sub-classifier as supervisory information, and based on the second sample image Performing second training on the initial first sub-classifier to obtain the candidate neural network including the candidate convolutional neural network, the candidate first sub-classifier, and the candidate second sub-classifier.
A neural network comprising:

A convolutional neural network is used for feature extraction of the target image to obtain a feature sequence of the target image;

A classifier, configured to classify the characters corresponding to the feature sequence to obtain the category of the character corresponding to the feature sequence, wherein the category of the character is used to perform character recognition on the target image;

Wherein, by selecting hyperparameters of the convolutional neural network, the convolutional neural network is determined from multiple groups of candidate hyperparameters based on constraints of the neural network.
A character recognition device, comprising:

The feature extraction module is used to extract the features of the target image through the convolutional neural network of the target neural network to obtain the feature sequence of the target image;

A classification module, configured to classify the character corresponding to the feature sequence through the classifier of the target neural network, to obtain the category of the character corresponding to the feature sequence;

a recognition module, configured to perform character recognition on the target image based on the type of the character;

Wherein, by selecting hyperparameters of the convolutional neural network, the convolutional neural network is determined from multiple groups of candidate hyperparameters based on constraints of the target neural network.
A neural network training device, comprising:

The training module is used to use the sample image to train each of the multiple initial candidate neural networks respectively to obtain multiple candidate neural networks, wherein each initial candidate neural network includes:

The initial candidate convolutional neural network is used to perform feature extraction on the sample image to obtain a feature sequence of the sample image; the candidate hyperparameters of each initial candidate convolutional neural network are at least partially different;

an initial classifier, configured to classify the characters corresponding to the feature sequence to obtain the category of the character corresponding to the feature sequence, and the category of the character is used to perform character recognition on the sample image;

A screening module, configured to screen out a target neural network satisfying constraints from the plurality of candidate neural networks.
A computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method according to any one of claims 1 to 10 is implemented.
An electronic device, comprising a memory, a processor, and a computer program stored in the memory and operable on the processor, the processor implementing the method according to any one of claims 1 to 10 when executing the program.
A computer program, the computer program comprising computer readable codes, when the computer readable codes are executed by a processor, the method according to any one of claims 1 to 10 is realized.