WO2023273516A1 - Character recognition method and apparatus, neural network training method and apparatus, and neural network, storage medium and electronic device - Google Patents

Character recognition method and apparatus, neural network training method and apparatus, and neural network, storage medium and electronic device Download PDF

Info

Publication number
WO2023273516A1
WO2023273516A1 PCT/CN2022/086989 CN2022086989W WO2023273516A1 WO 2023273516 A1 WO2023273516 A1 WO 2023273516A1 CN 2022086989 W CN2022086989 W CN 2022086989W WO 2023273516 A1 WO2023273516 A1 WO 2023273516A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
candidate
classifier
initial
convolutional neural
Prior art date
Application number
PCT/CN2022/086989
Other languages
French (fr)
Chinese (zh)
Inventor
张正夫
梁鼎
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2023273516A1 publication Critical patent/WO2023273516A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • the present disclosure relates to the technical field of character recognition, in particular to character recognition and neural network training methods and devices, neural networks, computer-readable storage media and electronic equipment.
  • Character recognition is an important research direction in the field of computer vision and has a wide range of application scenarios. Taking the recognition of characters arranged in the horizontal direction as an example, the character recognition method in the related technology needs to use the cyclic neural network to extract the connection between the image features in the horizontal direction, so that the extracted features can contain more effective information. Improve the accuracy of character recognition. However, the processing time of the recurrent neural network is long, resulting in low efficiency of character recognition.
  • the disclosure provides a character recognition and neural network training method and device, a computer-readable storage medium and electronic equipment.
  • a method for character recognition comprising: performing feature extraction on a target image through a convolutional neural network of the target neural network to obtain a feature sequence of the target image; through the The classifier of the target neural network classifies the characters corresponding to the feature sequence to obtain the category of the character corresponding to the feature sequence; character recognition is performed on the target image based on the category of the character; wherein, by selecting the volume hyperparameters of the convolutional neural network, and determine the convolutional neural network from multiple groups of candidate hyperparameters based on the constraints of the target neural network.
  • the classifying the character corresponding to the feature sequence to obtain the category of the character corresponding to the feature sequence includes: first classifying the character corresponding to the feature sequence to determine the feature A non-blank character in the characters corresponding to the sequence; performing a second classification on the non-blank character to determine the category of the non-blank character.
  • the first classifying the characters corresponding to the feature sequence to determine the non-blank characters in the characters corresponding to the feature sequence includes: determining that the characters corresponding to the feature sequence belong to non-blank characters Probability of ; Characters with a probability greater than a preset probability threshold are determined as non-blank characters.
  • the method further includes: searching the multiple sets of candidate hyperparameters from the search space of the hyperparameters; for each set of candidate hyperparameters, establishing an initial candidate convolutional neural network based on the set of candidate hyperparameters network; the initial candidate neural network comprising the initial candidate convolutional neural network and the initial classifier is trained to obtain a candidate neural network, and the candidate neural network comprises a candidate convolutional neural network and a candidate classifier; in the candidate neural network When the network satisfies the constraints, the candidate convolutional neural network is determined as the convolutional neural network of the target neural network, and the candidate classifier is determined as the target neural network. Classifier.
  • the classifier includes a first sub-classifier and a second sub-classifier; the first sub-classifier is used to first classify characters corresponding to the feature sequence to determine the feature sequence Non-blank characters in the corresponding characters; the second sub-classifier is used to perform a second classification on the non-blank characters to determine the category of the non-blank characters; the pair includes the initial candidate convolutional neural network And the initial candidate neural network of the initial classifier is trained, including: based on the first sample image, the initial candidate convolutional neural network and the initial second sub-classifier are first trained to obtain the candidate convolutional neural network , the intermediate candidate neural network of the initial first sub-classifier and the candidate second sub-classifier; the network parameters of the candidate convolutional neural network and the candidate second sub-classifier in the intermediate candidate neural network are fixed, and the The output result of the candidate second sub-classifier is used as supervision information, and the second training is performed on the initial first sub-classifier based on the second sample image to obtain the candidate
  • the constraints include an upper limit of the character recognition time of the target neural network
  • the method further includes: acquiring the time spent by the candidate neural network for character recognition on the test image; If it is less than the upper limit of the duration, it is determined that the candidate neural network satisfies the constraint condition.
  • the constraints include the lower limit of the character recognition accuracy of the target neural network; the method further includes: obtaining the recognition accuracy of the character recognition of the candidate neural network on the test image; When the recognition accuracy is higher than the lower limit of accuracy, it is determined that the candidate neural network satisfies the constraint condition.
  • the hyperparameters include at least one of the following: the depth of the convolutional neural network, the number of channels, the size of the convolution kernel, the number of downsampling times, the downsampling method, and the convolutional neural network. The location of the downsampled network layer, the resolution of the target image input to the convolutional neural network.
  • a neural network training method comprising: using a sample image, respectively training each of a plurality of initial candidate neural networks to obtain a plurality of candidate neural networks, wherein
  • Each initial candidate neural network includes: an initial candidate convolutional neural network for feature extraction of the sample image to obtain a feature sequence of the sample image; candidate hyperparameters of each initial candidate convolutional neural network are at least partially different; an initial classifier, configured to classify the characters corresponding to the feature sequence to obtain the category of the character corresponding to the feature sequence, and the category of the character is used to perform character recognition on the sample image; from the plurality of candidate
  • the target neural network that satisfies the constraints is screened out from the neural network.
  • the initial classifier includes a first initial sub-classifier and a second initial sub-classifier; the first initial sub-classifier is used to first classify characters corresponding to the feature sequence to determine The non-blank characters in the character corresponding to the feature sequence of the sample image; the second initial sub-classifier is used to perform a second classification on the non-blank characters to determine the category of the non-blank characters; image, respectively training each of the plurality of initial candidate neural networks, including: performing first training on the initial candidate convolutional neural network and the initial second sub-classifier based on the first sample image, and obtaining the first training comprising The intermediate candidate neural network of the candidate convolutional neural network, the initial first subclassifier and the candidate second subclassifier; fixing the candidate convolutional neural network and the candidate second subclassifier in the intermediate candidate neural network Network parameters, using the output of the candidate second sub-classifier as supervisory information, and performing second training on the initial first sub-classifier based on the second sample image, to obtain the candidate convolutional
  • a neural network includes: a convolutional neural network, configured to extract features of a target image, to obtain a feature sequence of the target image; and a classifier, used To classify the characters corresponding to the feature sequence to obtain the category of the character corresponding to the feature sequence, and the category of the character is used to perform character recognition on the target image; wherein, by selecting the convolutional neural network Hyperparameters, based on the constraints of the neural network, determine the convolutional neural network from multiple sets of candidate hyperparameters.
  • a character recognition device comprising: a feature extraction module, configured to perform feature extraction on a target image through a convolutional neural network of the target neural network to obtain a feature of the target image A feature sequence; a classification module, configured to classify the character corresponding to the feature sequence through a classifier of the target neural network, to obtain the category of the character corresponding to the feature sequence; a recognition module, used to classify based on the character Performing character recognition on the target image; wherein, by selecting the hyperparameters of the convolutional neural network and based on the constraints of the target neural network, the convolutional neural network is determined from multiple groups of candidate hyperparameters.
  • the classification module includes: a first classification unit, configured to first classify the characters corresponding to the feature sequence, so as to determine non-blank characters in the characters corresponding to the feature sequence; second classification A unit, configured to perform a second classification on the non-blank characters to determine the category of the non-blank characters.
  • the first classification unit includes: a probability determination subunit, configured to determine the probability that the character corresponding to the feature sequence belongs to a non-blank character; a classification subunit, configured to classify characters whose probability is greater than a preset probability threshold Characters are determined to be non-blank characters.
  • the apparatus further includes: a search module, configured to search the multiple sets of candidate hyperparameters from the hyperparameter search space; a network building module, configured to, for each set of candidate hyperparameters, based on the The set of candidate hyperparameters establishes an initial candidate convolutional neural network; the training module is used to train an initial candidate neural network comprising the initial candidate convolutional neural network and an initial classifier to obtain a candidate neural network, and the candidate neural network Including a candidate convolutional neural network and a candidate classifier; a first determination module, configured to determine the candidate convolutional neural network as the target neural network when the candidate neural network satisfies the constraint condition the convolutional neural network, and determine the candidate classifier as the classifier of the target neural network.
  • a search module configured to search the multiple sets of candidate hyperparameters from the hyperparameter search space
  • a network building module configured to, for each set of candidate hyperparameters, based on the The set of candidate hyperparameters establishes an initial candidate convolutional
  • the classifier includes a first sub-classifier and a second sub-classifier; the first sub-classifier is used to first classify characters corresponding to the feature sequence to determine the feature sequence The non-blank character in the corresponding character; The second subclassifier is used to carry out the second classification to the non-blank character to determine the category of the non-blank character; the training module includes: a first training unit, using Based on the first sample image, the initial candidate convolutional neural network and the initial second subclassifier are first trained to obtain the candidate convolutional neural network, the initial first subclassifier and the candidate second subclassifier.
  • the intermediate candidate neural network of the device; the second training unit is used to fix the network parameters of the candidate convolutional neural network and the candidate second sub-classifier in the intermediate candidate neural network, and classify the candidate second sub-classifier
  • the output result of the detector is used as the supervisory information, and the second training is performed on the initial first sub-classifier based on the second sample image, and the candidate convolutional neural network, the candidate first sub-classifier and the candidate second sub-classifier are obtained.
  • the candidate neural network for the sub-classifier is used to fix the network parameters of the candidate convolutional neural network and the candidate second sub-classifier in the intermediate candidate neural network, and classify the candidate second sub-classifier
  • the output result of the detector is used as the supervisory information, and the second training is performed on the initial first sub-classifier based on the second sample image, and the candidate convolutional neural network, the candidate first sub-classifier and the candidate second sub-classifier are obtained.
  • the candidate neural network for the sub-classifier is used to fix the network parameters
  • the constraint condition includes an upper limit of the character recognition time of the target neural network
  • the device further includes: a first acquisition module, configured to acquire the time spent by the candidate neural network to perform character recognition on the test image duration; a second determination module, configured to determine that the candidate neural network satisfies the constraint condition when the duration is less than the upper limit of the duration.
  • the constraints include the lower limit of the accuracy of character recognition performed by the target neural network; the device further includes: a second acquisition module, configured to acquire the character recognition accuracy of the candidate neural network for the test image. Recognition accuracy; a third determining module, configured to determine that the candidate neural network satisfies the constraints when the recognition accuracy is higher than the lower limit of accuracy.
  • the hyperparameters include at least one of the following: the depth of the convolutional neural network, the number of channels, the size of the convolution kernel, the number of downsampling times, the downsampling method, and the convolutional neural network. The location of the downsampled network layer, the resolution of the target image input to the convolutional neural network.
  • a neural network training device comprising: a training module, configured to use a sample image to train each of a plurality of initial candidate neural networks to obtain a plurality of Candidate neural networks, wherein each initial candidate neural network includes: an initial candidate convolutional neural network for feature extraction of the sample image to obtain a feature sequence of the sample image; candidate superstructures of each initial candidate convolutional neural network The parameters are at least partially different; the initial classifier is used to classify the characters corresponding to the feature sequence to obtain the category of the character corresponding to the feature sequence, and the category of the character is used to perform character recognition on the sample image; screening A module, configured to screen out a target neural network satisfying constraints from the plurality of candidate neural networks.
  • the initial classifier includes a first initial sub-classifier and a second initial sub-classifier; the first initial sub-classifier is used to first classify characters corresponding to the feature sequence to determine The non-blank character in the character corresponding to the feature sequence of the sample image; the second initial sub-classifier is used to perform a second classification on the non-blank character to determine the category of the non-blank character; the training module Including: a first training unit, which is used to perform first training on the initial candidate convolutional neural network and the initial second sub-classifier based on the first sample image, to obtain the candidate convolutional neural network, the initial first sub-classifier and the intermediate candidate neural network of the candidate second sub-classifier; the second training unit is used to fix the network parameters of the candidate convolutional neural network and the candidate second sub-classifier in the intermediate candidate neural network, and the The output result of the candidate second sub-classifier is used as supervision information, and the second training is performed on the initial first sub-classifier based on the second
  • a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the method described in any embodiment is implemented.
  • an electronic device including a memory, a processor, and a computer program stored in the memory and operable on the processor, when the processor executes the program, any implementation The method described in the example.
  • a computer program includes computer readable code, and when the computer readable code is executed by a processor, the method described in any embodiment is implemented.
  • Embodiments of the present disclosure use a convolutional neural network and a classifier to identify characters from target images.
  • the efficiency of character recognition is improved due to the absence of a cyclic neural network; Select better target hyperparameters to build a convolutional neural network to ensure the receptive field of the convolutional neural network, so that the extracted features can contain more effective information. Therefore, the target neural network including the convolutional neural network and the classifier can generally meet the preset constraint conditions, so that the target neural network can obtain higher character recognition accuracy.
  • FIG. 1 is a schematic diagram of a neural network used for character recognition in the related art.
  • FIG. 2 is a flowchart of a character recognition method according to an embodiment of the present disclosure.
  • FIG. 3 is a schematic diagram of an overall process of character recognition in the related art.
  • Fig. 4 is a schematic diagram of a classification method of a classifier according to an embodiment of the present disclosure.
  • FIG. 5 is a schematic diagram of a two-stage training method of an embodiment of the present disclosure.
  • Fig. 6 is a flowchart of a neural network training method according to an embodiment of the present disclosure.
  • FIG. 7 is a schematic diagram of a neural network of an embodiment of the present disclosure.
  • FIG. 8 is a block diagram of a character recognition device according to an embodiment of the present disclosure.
  • FIG. 9 is a block diagram of a neural network training device according to an embodiment of the present disclosure.
  • FIG. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
  • first, second, third, etc. may be used in the present disclosure to describe various information, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of the present disclosure, first information may also be called second information, and similarly, second information may also be called first information. Depending on the context, the word “if” as used herein may be interpreted as “at” or “when” or “in response to a determination.”
  • Character recognition generally refers to recognizing characters from images, and the characters may include text (for example, numbers, Chinese characters, English letters, etc.) and symbols (for example, arithmetic operation symbols, logical operation symbols, etc.). In related technologies, character recognition is generally performed through a neural network.
  • a neural network for character recognition is shown in FIG. 1 , which may include a convolutional neural network 101 , a recurrent neural network 102 , a classifier 103 and a decoder 104 .
  • the convolutional neural network 101 is used to extract features from the image to obtain a feature sequence
  • the cyclic neural network 102 is used to encode the feature sequence
  • the classifier 103 is used to classify the encoded feature sequence
  • the decoder is used to classify The classification result of the device 103 is decoded to recognize the characters in the image.
  • the processing process of the recurrent neural network takes a long time, resulting in low efficiency of character recognition.
  • an embodiment of the present disclosure provides a character recognition method, as shown in FIG. 2 , the method includes:
  • Step 201 performing feature extraction on the target image through the convolutional neural network of the target neural network to obtain a feature sequence of the target image;
  • Step 202 Classify the characters corresponding to the feature sequence of the target image by the classifier of the target neural network to obtain the category of the character corresponding to the feature sequence of the target image;
  • Step 203 Perform character recognition on the target image based on the character category corresponding to the feature sequence of the target image;
  • the convolutional neural network is determined from multiple groups of candidate hyperparameters based on constraints of the target neural network.
  • the target image may be an image including characters.
  • characters For example, images of objects such as billboards and certificates collected from real scenes, images generated by screen recording, images generated by format conversion, or images generated by other methods.
  • the characters in the target image may be characters in various fonts such as handwriting and printing, which is not limited in the present disclosure.
  • the characters may include at least one of numerals, Chinese characters, kana, symbols, and the like.
  • the target image may include one or more characters, and the multiple characters may be arranged on the target image regularly or irregularly, for example, may be arranged on the target image along a horizontal direction.
  • Feature extraction can be performed on the target image to obtain a feature map F1.
  • the feature map F1 can be down-sampled first, for example, the size of the down-sampled feature map F2 is 4 ⁇ 64 ⁇ 128.
  • the downsampled feature map F2 can be pooled in the vertical direction ( For example, maximum pooling processing or average pooling processing, etc.) to obtain a feature map F3 of 1 ⁇ 64 ⁇ 128.
  • the features of each channel of each horizontal pixel position in the feature map F3 are determined as a feature sequence, and a total of 64 128-dimensional feature sequences t are obtained.
  • target hyperparameters can be determined for the convolutional neural network based on the constraints of the entire target neural network including the convolutional neural network and the classifier, thereby enabling the convolutional neural network to replace loops The function of the neural network, and then remove the recurrent neural network from the entire target neural network, and make the performance of the entire target neural network equal to the performance before removing the recurrent neural network.
  • the hyperparameters of the convolutional neural network are related to the receptive field of the convolutional neural network, so that the features extracted by the convolutional neural network can contain enough effective information by reasonably selecting the hyperparameters of the convolutional neural network.
  • the hyperparameters may include at least one of the following: the depth of the convolutional neural network, the number of channels, the size of the convolution kernel, the number of downsampling times, the downsampling method, and the network layer for downsampling in the convolutional neural network The location, the resolution of the target image input to the convolutional neural network. Among them, the depth of the convolutional neural network is the number of layers of the convolutional neural network.
  • the downsampling manner may include a downsampling manner with parameters (for example, performing downsampling through convolution processing) and a downsampling manner without parameters (for example, performing downsampling through pooling processing).
  • the position of the network layer for downsampling in the convolutional neural network is located, that is, which layers of the convolutional neural network are downsampled, that is, the timing of downsampling.
  • the timing of downsampling is earlier, which can reduce the amount of calculation in the character recognition process, but it will also lead to insufficient feature extraction; relatively, the timing of downsampling is later, it can fully extract features, but it will increase the amount of calculation.
  • the higher the resolution of the target image input to the convolutional neural network the more effective information obtained by one feature extraction.
  • the constraints may include at least one of the following: an upper limit of the character recognition time of the target neural network; a lower limit of the character recognition accuracy of the target neural network.
  • the target hyperparameters may be selected from multiple groups of candidate hyperparameters, for example, may be obtained by searching in a grid search manner.
  • the search space and search step size of each of the above hyperparameters are first defined.
  • the search space defining the depth of the convolutional neural network is N1 to N2
  • the search space defining the number of channels of the convolutional neural network is C1 to C2.
  • the search space that defines the number of downsampling is from 3 to 8 times.
  • the combination scheme 1 is ⁇ the depth of the convolutional neural network is N1, the number of channels is C1, and the number of downsampling is 3 times ⁇
  • the combination scheme 2 is ⁇ convolutional neural network
  • the depth is N1
  • the number of channels is C1
  • the number of downsampling is 4 ⁇ , and so on.
  • Multiple combination schemes can be determined first, and then the combination scheme that makes the entire target neural network meet the constraint conditions is selected from the multiple combination schemes as the target hyperparameter.
  • one combination scheme can be selected each time, and then it is determined whether the combination scheme can make the entire target neural network meet the constraint conditions, if not, continue the search, and if yes, stop the search.
  • all combination schemes can be traversed, and the optimal combination scheme that makes the entire target neural network meet the constraint conditions can be selected as the target hyperparameter.
  • the hyperparameters corresponding to a combined scheme are called a set of candidate hyperparameters.
  • An initial candidate convolutional neural network can be established based on the group of candidate hyperparameters, and the initial candidate neural network including the initial candidate convolutional neural network and the initial classifier is trained to obtain a candidate convolutional neural network and the classifier. candidate neural network, and then determine whether the candidate neural network meets the constraints.
  • the candidate neural network may be tested based on the test image to determine whether the candidate neural network satisfies the constraint condition. For example, in a case where the duration is less than the upper limit of the duration, it is determined that the candidate neural network satisfies the constraint condition.
  • the candidate neural network in a case where the recognition accuracy is higher than the lower limit of accuracy, it is determined that the candidate neural network satisfies the constraint condition.
  • the duration is less than the upper limit of the duration and the recognition accuracy is higher than the lower limit of accuracy, it is determined that the candidate neural network satisfies the constraint condition.
  • a classifier composed of a fully connected layer may be used to classify characters corresponding to each feature sequence t.
  • the type of the classifier may also be other types, which is not limited in the present disclosure.
  • Each character corresponds to a category, for example, the Chinese character "you” corresponds to category 1, "good” corresponds to category 2, the number “1” corresponds to category 3, the number “2” corresponds to category 4, the symbol "+” corresponds to category 5, etc. .
  • the decoded character since there is no explicit alignment relationship between the character sequence obtained after decoding and the feature sequence before decoding, for example, assuming that the dimension of the feature sequence t is 5, including the 5 features t0 to t4, the decoded character If the sequence is "al", the length of the decoded character sequence is 2, which cannot be aligned with the feature sequence.
  • the features at multiple pixel positions corresponding to the same character will be recognized as the same character, for example, the above feature sequence will be recognized as a character sequence including multiple consecutive repeated characters such as "aaall" or "aalll” .
  • a blank character can be inserted between characters.
  • a blank character is a special character used to insert between characters. Assuming that the symbol "-" is used to represent a blank character, several blank characters can be inserted before “a”, after “l” and between "a” and/or "l” of the character sequence "al”, for example, to obtain the characters The sequence "--aaa---ll-".
  • the embodiment of the present disclosure first classifies the characters corresponding to the feature sequence of the target image to determine the non-blank characters in the characters corresponding to the feature sequence of the target image; A second classification is performed on the non-blank characters to determine the category of the non-blank characters.
  • the above process first performs a binary classification to distinguish blank characters and non-blank characters, and then classifies 20,000 Chinese characters for non-blank characters, while blank characters do not need to be classified. Since blank characters do not need to be classified, the efficiency of the classification process is effectively improved.
  • the probability of characters corresponding to the feature sequence of the target image belonging to non-blanks can be determined; characters with a probability greater than a preset probability threshold are determined as non-blanks. Since the time spent on the binary classification is relatively short, the above process can effectively improve the efficiency of character classification and save time in the character classification process.
  • the product neural network can adopt better parameters (for example, greater depth, more number of channels) to ensure the feature extraction ability, thereby further improving the accuracy of character recognition.
  • the above process can be realized by using two sub-classifiers, wherein the first sub-classifier is used for the first classification, and the second sub-classifier is used for the second classification.
  • the initial candidate neural network including the initial candidate convolutional neural network and the initial classifier may be trained in a two-stage training manner.
  • the initial candidate neural network before training includes an initial candidate convolutional neural network, an initial classifier, and a CTC decoder, where the initial classifier includes an initial first sub-classifier and an initial second sub-classifier.
  • Classifier, CTC decoder is a parameter-free decoder without optimization.
  • the network parameters of the initial first sub-classifier can be fixed, and the initial candidate neural network, that is, the initial candidate convolutional neural network and the initial second sub-classifier, is first trained to obtain the candidate convolutional neural network, Intermediate candidate neural network for initial first sub-classifier and candidate second sub-classifier. Then, the network parameters of the candidate convolutional neural network and the candidate second sub-classifier in the intermediate candidate neural network are fixed, and the output result of the candidate second sub-classifier is used as supervision information, and the initial The first sub-classifier performs second training to obtain the candidate neural network including the candidate convolutional neural network, the candidate first sub-classifier, and the candidate second sub-classifier.
  • Each initial candidate neural network can be trained in the above manner to obtain the corresponding candidate neural network, and then determine the target neural network satisfying the constraints from each candidate neural network, and the candidate convolutional neural network in the selected candidate neural network
  • the network, the candidate first sub-classifier and the candidate second sub-classifier are respectively the convolutional neural network, the first sub-classifier and the second sub-classifier in the target neural network.
  • sample images with labeled information can be used.
  • the annotation information can be calibrated manually or in other ways in advance, and the annotation information is used to determine the ground truth of the characters in the sample image, including the annotation information used to indicate blank characters and the character category used to indicate non-blank characters Callout information.
  • the second sub-classifier can classify specific characters of blanks and non-blanks.
  • the output result of the second sub-classifier (probability of blank/non-blank) is used as supervisory information to train the first sub-classifier.
  • Input the target image to be recognized (such as a text image);
  • the design is based on lightweight convolutional neural networks such as MobileNet, and the hyperparameters such as the depth, width, and downsampling strategy of the convolutional neural network are searched carefully to obtain the optimal speed and accuracy.
  • Convolutional neural networks such as MobileNet
  • hyperparameters such as the depth, width, and downsampling strategy of the convolutional neural network are searched carefully to obtain the optimal speed and accuracy.
  • Output The probability of a blank character at each position (such as a character) in the sequence.
  • Input feature sequence, the probability that each position in the sequence is a blank character
  • the feature vector of the non-blank character position in the sequence is taken out, and the fully connected layer is used for character classification.
  • Input the probability of a blank character at each position in the sequence, the character classification confidence of non-blank characters in the sequence;
  • the character classification confidence of the feature sequence is restored and decoded using CTC.
  • the embodiment of the present disclosure also provides a neural network training method, the method comprising:
  • Step 601 Using sample images, train each of the multiple initial candidate neural networks respectively to obtain multiple candidate neural networks, where each initial candidate neural network includes:
  • the initial candidate convolutional neural network is used to perform feature extraction on the sample image to obtain a feature sequence of the sample image; the candidate hyperparameters of each initial candidate convolutional neural network are at least partially different;
  • the initial classifier is used to classify the characters corresponding to the feature sequence of the sample image to obtain the category of the character corresponding to the feature sequence of the sample image, and the category of the character corresponding to the feature sequence of the sample image is used to classify the characters corresponding to the feature sequence of the sample image.
  • the above sample image is used for character recognition;
  • Step 602 Screen out a target neural network satisfying the constraints from the plurality of candidate neural networks.
  • the initial classifier includes a first initial sub-classifier, configured to first classify the characters corresponding to the feature sequence of the sample image, so as to determine the characters corresponding to the feature sequence of the sample image
  • the non-blank characters, and the second initial sub-classifier are used to perform a second classification on the non-blank characters to determine the category of the non-blank characters
  • the sample image is used to perform a plurality of initial candidate neural networks respectively
  • Each initial candidate neural network in is trained, including: performing the first training on the initial candidate convolutional neural network and the initial second sub-classifier based on the first sample image, to obtain the candidate convolutional neural network, the initial first sub-classifier An intermediate candidate neural network of a sub-classifier and a candidate second sub-classifier; fixing the network parameters of the candidate convolutional neural network and the candidate second sub-classifier in the intermediate candidate neural network, the candidate first sub-classifier
  • the output result of the second sub-classifier is used as supervisory information, and the second training is performed on
  • the writing order of each step does not mean a strict execution order and constitutes any limitation on the implementation process.
  • the specific execution order of each step should be based on its function and possible
  • the inner logic is OK.
  • an embodiment of the present disclosure also provides a neural network, and the neural network includes:
  • a convolutional neural network 701 configured to extract features from the target image to obtain a feature sequence of the target image
  • a classifier 702 configured to classify the characters corresponding to the feature sequence of the target image to obtain the category of the character corresponding to the feature sequence of the target image, and the category of the character corresponding to the feature sequence of the target image is used to classify the characters corresponding to the feature sequence of the target image
  • the target image is used for character recognition;
  • the convolutional neural network 701 is determined from multiple groups of candidate hyperparameters based on constraints of the neural network.
  • the classifier 702 includes a first sub-classifier 7021, configured to perform a first classification on the characters corresponding to the feature sequence of the target image, so as to determine the characters that are not in the characters corresponding to the feature sequence of the target image. a blank character; and a second subclassifier 7022, configured to perform a second classification on the non-blank character to determine the category of the non-blank character.
  • the neural network in the embodiments of the present disclosure can be trained by using the neural network training method described in any of the above embodiments.
  • the trained neural network can be used to implement the character recognition method described in any of the foregoing embodiments.
  • the neural network may further include a decoder 703, which is a function module without parameters, and may be decoded in a CTC decoding manner.
  • a decoder 703 which is a function module without parameters, and may be decoded in a CTC decoding manner.
  • an embodiment of the present disclosure also provides a character recognition device, which includes:
  • the feature extraction module 801 is used to extract the features of the target image through the convolutional neural network of the target neural network to obtain the feature sequence of the target image;
  • a classification module 802 configured to classify the character corresponding to the feature sequence of the target image through the classifier of the target neural network, to obtain the category of the character corresponding to the feature sequence of the target image;
  • a recognition module 803, configured to perform character recognition on the target image based on the character category corresponding to the feature sequence of the target image
  • the convolutional neural network is determined from multiple groups of candidate hyperparameters based on constraints of the target neural network.
  • the classification module includes: a first classification unit, configured to perform a first classification on the characters corresponding to the feature sequence of the target image, so as to determine the characters that are not in the characters corresponding to the feature sequence of the target image Blank characters; a second classification unit, configured to perform a second classification on the non-blank characters to determine the category of the non-blank characters.
  • the first classification unit includes: a probability determination subunit, configured to determine the probability that a character corresponding to the feature sequence of the target image belongs to a non-blank character; a classification subunit, configured to set the probability greater than a preset Characters for the probability threshold are determined to be non-blank characters.
  • the apparatus further includes: a search module, configured to search multiple sets of candidate hyperparameters from the hyperparameter search space; a network building module, configured to establish an initial candidate convolution based on each set of candidate hyperparameters Neural network; a training module, for training the initial candidate neural network comprising the initial candidate convolutional neural network and the initial classifier to obtain a candidate neural network, the candidate neural network comprising a candidate convolutional neural network and a candidate classifier ; a first determination module, configured to determine the candidate convolutional neural network as the convolutional neural network of the target neural network and classify the candidate when the candidate neural network satisfies the constraint condition The classifier is determined as the target neural network.
  • a search module configured to search multiple sets of candidate hyperparameters from the hyperparameter search space
  • a network building module configured to establish an initial candidate convolution based on each set of candidate hyperparameters Neural network
  • a training module for training the initial candidate neural network comprising the initial candidate convolutional neural network and the initial classifier
  • the classifier includes a first sub-classifier, configured to perform a first classification on characters corresponding to the feature sequence of the target image, so as to determine characters that are not in the characters corresponding to the feature sequence of the target image. Blank characters, and a second sub-classifier, used to perform a second classification on the non-blank characters to determine the category of the non-blank characters;
  • the training module includes: a first training unit, used to fix the initial first
  • the network parameters of the sub-classifier based on the first sample image, the initial candidate neural network, that is, the initial candidate convolutional neural network and the initial second sub-classifier are first trained to obtain the candidate convolutional neural network
  • the intermediate candidate neural network of the network, the initial first sub-classifier and the candidate second sub-classifier; the second training unit is used to fix the candidate convolutional neural network and the candidate second sub-classifier in the intermediate candidate neural network.
  • the network parameters of the classifier using the output of the candidate second sub-classifier as supervisory information, and performing second training on the initial first sub-classifier based on the second sample image, to obtain the candidate convolutional neural network parameters network, a candidate first sub-classifier, and said candidate neural network for said candidate second sub-classifier.
  • the constraint condition includes an upper limit of the character recognition time of the target neural network
  • the device further includes: a first acquisition module, configured to acquire the time spent by the candidate neural network to perform character recognition on the test image duration; a second determination module, configured to determine that the candidate neural network satisfies the constraint condition when the duration is less than the upper limit of the duration.
  • the constraints include the lower limit of the accuracy of character recognition performed by the target neural network; the device further includes: a second acquisition module, configured to acquire the character recognition accuracy of the candidate neural network for the test image. Recognition accuracy; a third determining module, configured to determine that the candidate neural network satisfies the constraints when the recognition accuracy is higher than the lower limit of accuracy.
  • the hyperparameters include at least one of the following: the depth of the convolutional neural network, the number of channels, the size of the convolution kernel, the number of downsampling times, the downsampling method, and the convolutional neural network. The location of the downsampled network layer, the resolution of the target image input to the convolutional neural network.
  • the embodiment of the present disclosure also provides a neural network training device, the device includes:
  • the training module 901 is configured to use sample images to train each of the multiple initial candidate neural networks respectively to obtain multiple candidate neural networks, wherein each initial candidate neural network includes:
  • the initial candidate convolutional neural network is used to perform feature extraction on the sample image to obtain a feature sequence of the sample image; the candidate hyperparameters of each initial candidate convolutional neural network are at least partially different;
  • the initial classifier is used to classify the characters corresponding to the feature sequence of the sample image to obtain the category of the character corresponding to the feature sequence of the sample image, and the category of the character corresponding to the feature sequence of the sample image is used to classify the characters corresponding to the feature sequence of the sample image.
  • the above sample image is used for character recognition;
  • a screening module 902 configured to screen out a target neural network satisfying constraints from the plurality of candidate neural networks.
  • the initial classifier includes a first initial sub-classifier, configured to first classify the characters corresponding to the feature sequence of the sample image, so as to determine the characters corresponding to the feature sequence of the sample image
  • the non-blank character, and the second initial sub-classifier are used to carry out the second classification to the non-blank character, to determine the category of the non-blank character
  • the training module includes: a first training unit, for based on The first sample image performs the first training on the initial candidate convolutional neural network and the initial second sub-classifier to obtain intermediate candidates including the candidate convolutional neural network, the initial first sub-classifier and the candidate second sub-classifier Neural network
  • the second training unit is used to fix the network parameters of the candidate convolutional neural network and the candidate second sub-classifier in the intermediate candidate neural network, and output the output result of the candidate second sub-classifier
  • the initial first sub-classifier is trained for the second time to obtain the candidate convolutional neural network
  • the functions or modules included in the device provided by the embodiments of the present disclosure can be used to execute the methods described in the method embodiments above, and its specific implementation can refer to the description of the method embodiments above. For brevity, here No longer.
  • the embodiment of this specification also provides an electronic device, which at least includes a memory, a processor, and a computer program stored in the memory and operable on the processor, wherein, when the processor executes the program, the computer program described in any of the preceding embodiments is implemented. described method.
  • FIG. 10 shows a schematic diagram of a more specific hardware structure of an electronic device provided by the embodiment of this specification.
  • the device may include: a processor 1001 , a memory 1002 , an input/output interface 1003 , a communication interface 1004 and a bus 1005 .
  • the processor 1001 , the memory 1002 , the input/output interface 1003 and the communication interface 1004 are connected to each other within the device through the bus 1005 .
  • the processor 1001 can be implemented by a general-purpose CPU (Central Processing Unit, central processing unit), a microprocessor, an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute related program to implement the character recognition method or the neural network training method provided in the embodiment of this specification.
  • the processor 1001 may also include a graphics card, and the graphics card may be an Nvidia titan X graphics card or a 1080Ti graphics card.
  • the memory 1002 can be implemented in the form of ROM (Read Only Memory, read-only memory), RAM (Random Access Memory, random access memory), static storage device, dynamic storage device, etc.
  • the memory 1002 can store an operating system and other application programs. When implementing the technical solutions provided by the embodiments of this specification through software or firmware, the relevant program codes are stored in the memory 1002 and invoked by the processor 1001 for execution.
  • the input/output interface 1003 is used to connect the input/output module to realize information input and output.
  • the input/output module can be configured in the device as a component (not shown in the figure), or can be externally connected to the device to provide corresponding functions.
  • the input module may include a keyboard, mouse, touch screen, microphone, various sensors, etc.
  • the output module may include a display, a speaker, a vibrator, an indicator light, and the like.
  • the communication interface 1004 is used to connect with a communication module (not shown in the figure), so as to send the information of the own device to the communication module of other devices, or receive the information sent by the communication modules of other devices.
  • the communication module can realize communication through wired methods (such as USB, network cable, etc.), and can also realize communication through wireless methods (such as mobile network, WIFI, Bluetooth, etc.).
  • Bus 1005 includes a path for transferring information between various components of the device (eg, processor 1001, memory 1002, input/output interface 1003, and communication interface 1004).
  • the above device only shows the processor 1001, the memory 1002, the input/output interface 1003, the communication interface 1004, and the bus 1005, in the specific implementation process, the device may also include other components.
  • the above-mentioned device may only include components necessary to implement the solutions of the embodiments of this specification, and does not necessarily include all the components shown in the figure.
  • An embodiment of the present disclosure further provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method described in any one of the foregoing embodiments is implemented.
  • Computer-readable media including both permanent and non-permanent, removable and non-removable media, can be implemented by any method or technology for storage of information.
  • Information may be computer readable instructions, data structures, modules of a program, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Flash memory or other memory technology, Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cartridge, tape magnetic disk storage or other magnetic storage device or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • computer-readable media excludes transitory computer-readable media, such as modulated data signals and carrier waves.
  • a typical implementing device is a computer, which may take the form of a personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media player, navigation device, e-mail device, game control device, etc. desktops, tablets, wearables, or any combination of these.
  • each embodiment in this specification is described in a progressive manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments.
  • the description is relatively simple, and for relevant parts, please refer to part of the description of the method embodiment.
  • the device embodiments described above are only illustrative, and the modules described as separate components may or may not be physically separated, and the functions of each module may be integrated in the same or multiple software and/or hardware implementations. Part or all of the modules can also be selected according to actual needs to achieve the purpose of the solution of this embodiment. It can be understood and implemented by those skilled in the art without creative effort.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Character Discrimination (AREA)

Abstract

Provided in the embodiments of the present disclosure are a character recognition method and apparatus, a neural network training method and apparatus, and a neural network, a storage medium and an electronic device. A character is recognized from a target image by using a convolutional neural network and a classifier. Since there is no need to use a recurrent neural network, the character recognition efficiency is improved; in addition, a convolutional neural network is established by means of selecting a relatively good target hyper-parameter from a plurality of groups of candidate hyper-parameters, so as to ensure a receptive field of the convolutional neural network, such that an extracted feature can include more effective information. Therefore, a target neural network comprising a convolutional neural network and a classifier can generally meet a preset constraint condition, such that the target neural network can obtain a relatively high character recognition accuracy.

Description

字符识别及神经网络训练方法和装置、神经网络、存储介质及电子设备Character recognition and neural network training method and device, neural network, storage medium and electronic equipment
交叉引用声明cross-reference statement
本发明要求于2021年6月28日提交中国专利局的申请号为202110719983.4的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese Patent Application No. 202110719983.4 filed with the China Patent Office on June 28, 2021, the entire contents of which are incorporated herein by reference.
技术领域technical field
本公开涉及字符识别技术领域,尤其涉及字符识别及神经网络训练方法和装置、神经网络、计算机可读存储介质及电子设备。The present disclosure relates to the technical field of character recognition, in particular to character recognition and neural network training methods and devices, neural networks, computer-readable storage media and electronic equipment.
背景技术Background technique
字符识别是计算机视觉领域的重要研究方向,有着广泛的应用场景。以对水平方向排列的字符进行识别为例,相关技术中的字符识别方式需要利用循环神经网络提取出图像特征之间在水平方向上的联系,从而使提取的特征能够包含更多有效信息,以提高字符识别的准确性。然而,循环神经网络处理过程中的耗时较长,导致字符识别的效率较低。Character recognition is an important research direction in the field of computer vision and has a wide range of application scenarios. Taking the recognition of characters arranged in the horizontal direction as an example, the character recognition method in the related technology needs to use the cyclic neural network to extract the connection between the image features in the horizontal direction, so that the extracted features can contain more effective information. Improve the accuracy of character recognition. However, the processing time of the recurrent neural network is long, resulting in low efficiency of character recognition.
发明内容Contents of the invention
本公开提供字符识别及神经网络训练方法和装置、计算机可读存储介质和电子设备。The disclosure provides a character recognition and neural network training method and device, a computer-readable storage medium and electronic equipment.
根据本公开实施例的第一方面,提供一种字符识别方法,所述方法包括:通过目标神经网络的卷积神经网络对目标图像进行特征提取,得到所述目标图像的特征序列;通过所述目标神经网络的分类器对所述特征序列对应的字符进行分类,得到所述特征序列对应的字符的类别;基于所述字符的类别对所述目标图像进行字符识别;其中,通过选取所述卷积神经网络的超参数,基于所述目标神经网络的约束条件,从多组候选超参数中确定所述卷积神经网络。According to the first aspect of an embodiment of the present disclosure, there is provided a method for character recognition, the method comprising: performing feature extraction on a target image through a convolutional neural network of the target neural network to obtain a feature sequence of the target image; through the The classifier of the target neural network classifies the characters corresponding to the feature sequence to obtain the category of the character corresponding to the feature sequence; character recognition is performed on the target image based on the category of the character; wherein, by selecting the volume hyperparameters of the convolutional neural network, and determine the convolutional neural network from multiple groups of candidate hyperparameters based on the constraints of the target neural network.
在一些实施例中,所述对所述特征序列对应的字符进行分类,得到所述特征序列对应的字符的类别,包括:对所述特征序列对应的字符进行第一分类,以确定所述特征序列对应的字符中的非空白符;对所述非空白符进行第二分类,以确定所述非空白符的类别。In some embodiments, the classifying the character corresponding to the feature sequence to obtain the category of the character corresponding to the feature sequence includes: first classifying the character corresponding to the feature sequence to determine the feature A non-blank character in the characters corresponding to the sequence; performing a second classification on the non-blank character to determine the category of the non-blank character.
在一些实施例中,所述对所述特征序列对应的字符进行第一分类,以确定所述特征序列对应的字符中的非空白符,包括:确定所述特征序列对应的字符属于非空白符的概率;将概率大于预设概率阈值的字符确定为非空白符。In some embodiments, the first classifying the characters corresponding to the feature sequence to determine the non-blank characters in the characters corresponding to the feature sequence includes: determining that the characters corresponding to the feature sequence belong to non-blank characters Probability of ; Characters with a probability greater than a preset probability threshold are determined as non-blank characters.
在一些实施例中,所述方法还包括:从所述超参数的搜索空间中搜索所述多组候选超参数;对于每组候选超参数,基于所述组候选超参数建立初始候选卷积神经网络;对包括所述初始候选卷积神经网络以及初始分类器的初始候选神经网络进行训练,得到候选神经网络,所述候选神经网络包括候选卷积神经网络和候选分类器;在所述候选神经网络满足所述约束条件的情况下,将所述候选卷积神经网络确定为所述目标神经网络的所述卷积神经网络,并将所述候选分类器确定为所述目标神经网络的所述分类器。In some embodiments, the method further includes: searching the multiple sets of candidate hyperparameters from the search space of the hyperparameters; for each set of candidate hyperparameters, establishing an initial candidate convolutional neural network based on the set of candidate hyperparameters network; the initial candidate neural network comprising the initial candidate convolutional neural network and the initial classifier is trained to obtain a candidate neural network, and the candidate neural network comprises a candidate convolutional neural network and a candidate classifier; in the candidate neural network When the network satisfies the constraints, the candidate convolutional neural network is determined as the convolutional neural network of the target neural network, and the candidate classifier is determined as the target neural network. Classifier.
在一些实施例中,所述分类器包括第一子分类器以及第二子分类器;所述第一子分类器用于对所述特征序列对应的字符进行第一分类,以确定所述特征序列对应的字符中的非空白符;所述第二子分类器用于对所述非空白符进行第二分类,以确定所述非空白符的类别;所述对包括所述初始候选卷积神经网络以及初始分类器的初始候选神经网络进行训练,包括:基于第一样本图像对所述初始候选卷积神经网络和初始第二子分类器进行第一训练,得到包括所述候选卷积神经网络、初始第一子分类器和候选第二子分类器的中间候选神经网络;固定所述中间候选神经网络中所述候选卷积神经网络和所述候选第二子分类器的网络参数,将所述候选第二子分类器的输出结果作为监督信息,并基于第二样本图像对所述初始第一子分类器进行第二训练,得到包括所述候选卷积神经网络、候选第一子分类器和所述候选第二子分类器的所述候选神经网络。In some embodiments, the classifier includes a first sub-classifier and a second sub-classifier; the first sub-classifier is used to first classify characters corresponding to the feature sequence to determine the feature sequence Non-blank characters in the corresponding characters; the second sub-classifier is used to perform a second classification on the non-blank characters to determine the category of the non-blank characters; the pair includes the initial candidate convolutional neural network And the initial candidate neural network of the initial classifier is trained, including: based on the first sample image, the initial candidate convolutional neural network and the initial second sub-classifier are first trained to obtain the candidate convolutional neural network , the intermediate candidate neural network of the initial first sub-classifier and the candidate second sub-classifier; the network parameters of the candidate convolutional neural network and the candidate second sub-classifier in the intermediate candidate neural network are fixed, and the The output result of the candidate second sub-classifier is used as supervision information, and the second training is performed on the initial first sub-classifier based on the second sample image to obtain the candidate convolutional neural network, the candidate first sub-classifier and said candidate neural network of said candidate second sub-classifier.
在一些实施例中,所述约束条件包括所述目标神经网络进行字符识别的时长上限,所述方法还包括:获取所述候选神经网络对测试图像进行字符识别所花费的时长;在所述时长小于所述时长上限的情况下,确定所述候选神经网络满足所述约束条件。In some embodiments, the constraints include an upper limit of the character recognition time of the target neural network, and the method further includes: acquiring the time spent by the candidate neural network for character recognition on the test image; If it is less than the upper limit of the duration, it is determined that the candidate neural network satisfies the constraint condition.
在一些实施例中,所述约束条件包括所述目标神经网络进行字符识别的准确度下限;所述方法还包括:获取所述候选神经网络对测试图像进行字符识别的识别准确度;在所述识别准确度高于所述准确度下限的情况下,确定所述候选神经网络满足所述约束条件。In some embodiments, the constraints include the lower limit of the character recognition accuracy of the target neural network; the method further includes: obtaining the recognition accuracy of the character recognition of the candidate neural network on the test image; When the recognition accuracy is higher than the lower limit of accuracy, it is determined that the candidate neural network satisfies the constraint condition.
在一些实施例中,所述超参数包括以下至少一者:所述卷积神经网络的深度、通道数、卷积核的尺寸、下采样次数、下采样方式、所述卷积神经网络中进行下采样的网络层所处的位置、输入所述卷积神经网络的目标图像的分辨率。In some embodiments, the hyperparameters include at least one of the following: the depth of the convolutional neural network, the number of channels, the size of the convolution kernel, the number of downsampling times, the downsampling method, and the convolutional neural network. The location of the downsampled network layer, the resolution of the target image input to the convolutional neural network.
根据本公开实施例的第二方面,提供一种神经网络训练方法,所述方法包括:采用 样本图像,分别对多个初始候选神经网络中的每一个进行训练,得到多个候选神经网络,其中每个初始候选神经网络包括:初始候选卷积神经网络,用于对所述样本图像进行特征提取,得到所述样本图像的特征序列;各个初始候选卷积神经网络的候选超参数至少部分不同;初始分类器,用于对所述特征序列对应的字符进行分类,得到所述特征序列对应的字符的类别,所述字符的类别用于对所述样本图像进行字符识别;从所述多个候选神经网络中筛选出满足约束条件的目标神经网络。According to a second aspect of an embodiment of the present disclosure, a neural network training method is provided, the method comprising: using a sample image, respectively training each of a plurality of initial candidate neural networks to obtain a plurality of candidate neural networks, wherein Each initial candidate neural network includes: an initial candidate convolutional neural network for feature extraction of the sample image to obtain a feature sequence of the sample image; candidate hyperparameters of each initial candidate convolutional neural network are at least partially different; an initial classifier, configured to classify the characters corresponding to the feature sequence to obtain the category of the character corresponding to the feature sequence, and the category of the character is used to perform character recognition on the sample image; from the plurality of candidate The target neural network that satisfies the constraints is screened out from the neural network.
在一些实施例中,所述初始分类器包括第一初始子分类器以及第二初始子分类器;所述第一初始子分类器用于对所述特征序列对应的字符进行第一分类,以确定所述样本图像的特征序列对应的字符中的非空白符;所述第二初始子分类器用于对所述非空白符进行第二分类,以确定所述非空白符的类别;所述采用样本图像,分别对所述多个初始候选神经网络中的每一个进行训练,包括:基于第一样本图像对所述初始候选卷积神经网络和初始第二子分类器进行第一训练,得到包括候选卷积神经网络、初始第一子分类器和候选第二子分类器的中间候选神经网络;固定所述中间候选神经网络中所述候选卷积神经网络和所述候选第二子分类器的网络参数,将所述候选第二子分类器的输出结果作为监督信息,并基于第二样本图像对所述初始第一子分类器进行第二训练,得到包括所述候选卷积神经网络、候选第一子分类器和所述候选第二子分类器的所述候选神经网络。In some embodiments, the initial classifier includes a first initial sub-classifier and a second initial sub-classifier; the first initial sub-classifier is used to first classify characters corresponding to the feature sequence to determine The non-blank characters in the character corresponding to the feature sequence of the sample image; the second initial sub-classifier is used to perform a second classification on the non-blank characters to determine the category of the non-blank characters; image, respectively training each of the plurality of initial candidate neural networks, including: performing first training on the initial candidate convolutional neural network and the initial second sub-classifier based on the first sample image, and obtaining the first training comprising The intermediate candidate neural network of the candidate convolutional neural network, the initial first subclassifier and the candidate second subclassifier; fixing the candidate convolutional neural network and the candidate second subclassifier in the intermediate candidate neural network Network parameters, using the output of the candidate second sub-classifier as supervisory information, and performing second training on the initial first sub-classifier based on the second sample image, to obtain the candidate convolutional neural network, candidate The candidate neural network of the first sub-classifier and the candidate second sub-classifier.
根据本公开实施例的第三方面,提供一种神经网络,所述神经网络包括:卷积神经网络,用于对目标图像进行特征提取,得到所述目标图像的特征序列;以及分类器,用于对所述特征序列对应的字符进行分类,得到所述特征序列对应的字符的类别,所述字符的类别用于对所述目标图像进行字符识别;其中,通过选取所述卷积神经网络的超参数,基于所述神经网络的约束条件,从多组候选超参数中确定所述卷积神经网络。According to a third aspect of an embodiment of the present disclosure, a neural network is provided, and the neural network includes: a convolutional neural network, configured to extract features of a target image, to obtain a feature sequence of the target image; and a classifier, used To classify the characters corresponding to the feature sequence to obtain the category of the character corresponding to the feature sequence, and the category of the character is used to perform character recognition on the target image; wherein, by selecting the convolutional neural network Hyperparameters, based on the constraints of the neural network, determine the convolutional neural network from multiple sets of candidate hyperparameters.
根据本公开实施例的第四方面,提供一种字符识别装置,所述装置包括:特征提取模块,用于通过目标神经网络的卷积神经网络对目标图像进行特征提取,得到所述目标图像的特征序列;分类模块,用于通过所述目标神经网络的分类器对所述特征序列对应的字符进行分类,得到所述特征序列对应的字符的类别;识别模块,用于基于所述字符的类别对所述目标图像进行字符识别;其中,通过选取所述卷积神经网络的超参数,基于所述目标神经网络的约束条件,从多组候选超参数中确定所述卷积神经网络。According to a fourth aspect of an embodiment of the present disclosure, there is provided a character recognition device, the device comprising: a feature extraction module, configured to perform feature extraction on a target image through a convolutional neural network of the target neural network to obtain a feature of the target image A feature sequence; a classification module, configured to classify the character corresponding to the feature sequence through a classifier of the target neural network, to obtain the category of the character corresponding to the feature sequence; a recognition module, used to classify based on the character Performing character recognition on the target image; wherein, by selecting the hyperparameters of the convolutional neural network and based on the constraints of the target neural network, the convolutional neural network is determined from multiple groups of candidate hyperparameters.
在一些实施例中,所述分类模块包括:第一分类单元,用于对所述特征序列对应的字符进行第一分类,以确定所述特征序列对应的字符中的非空白符;第二分类单元,用 于对所述非空白符进行第二分类,以确定所述非空白符的类别。In some embodiments, the classification module includes: a first classification unit, configured to first classify the characters corresponding to the feature sequence, so as to determine non-blank characters in the characters corresponding to the feature sequence; second classification A unit, configured to perform a second classification on the non-blank characters to determine the category of the non-blank characters.
在一些实施例中,所述第一分类单元包括:概率确定子单元,用于确定所述特征序列对应的字符属于非空白符的概率;分类子单元,用于将概率大于预设概率阈值的字符确定为非空白符。In some embodiments, the first classification unit includes: a probability determination subunit, configured to determine the probability that the character corresponding to the feature sequence belongs to a non-blank character; a classification subunit, configured to classify characters whose probability is greater than a preset probability threshold Characters are determined to be non-blank characters.
在一些实施例中,所述装置还包括:搜索模块,用于从所述超参数的搜索空间中搜索所述多组候选超参数;网络建立模块,用于对于每组候选超参数,基于所述组候选超参数建立初始候选卷积神经网络;训练模块,用于对包括所述初始候选卷积神经网络以及初始分类器的初始候选神经网络进行训练,得到候选神经网络,所述候选神经网络包括候选卷积神经网络和候选分类器;第一确定模块,用于在所述候选神经网络满足所述约束条件的情况下,将所述候选卷积神经网络确定为所述目标神经网络的所述卷积神经网络,并将所述候选分类器确定为所述目标神经网络的所述分类器。In some embodiments, the apparatus further includes: a search module, configured to search the multiple sets of candidate hyperparameters from the hyperparameter search space; a network building module, configured to, for each set of candidate hyperparameters, based on the The set of candidate hyperparameters establishes an initial candidate convolutional neural network; the training module is used to train an initial candidate neural network comprising the initial candidate convolutional neural network and an initial classifier to obtain a candidate neural network, and the candidate neural network Including a candidate convolutional neural network and a candidate classifier; a first determination module, configured to determine the candidate convolutional neural network as the target neural network when the candidate neural network satisfies the constraint condition the convolutional neural network, and determine the candidate classifier as the classifier of the target neural network.
在一些实施例中,所述分类器包括第一子分类器以及第二子分类器;所述第一子分类器用于对所述特征序列对应的字符进行第一分类,以确定所述特征序列对应的字符中的非空白符;所述第二子分类器用于对所述非空白符进行第二分类,以确定所述非空白符的类别;所述训练模块包括:第一训练单元,用于基于第一样本图像对所述初始候选卷积神经网络和初始第二子分类器进行第一训练,得到包括所述候选卷积神经网络、初始第一子分类器和候选第二子分类器的中间候选神经网络;第二训练单元,用于固定所述中间候选神经网络中所述候选卷积神经网络和所述候选第二子分类器的网络参数,将所述候选第二子分类器的输出结果作为监督信息,并基于第二样本图像对所述初始第一子分类器进行第二训练,得到包括所述候选卷积神经网络、候选第一子分类器和所述候选第二子分类器的所述候选神经网络。In some embodiments, the classifier includes a first sub-classifier and a second sub-classifier; the first sub-classifier is used to first classify characters corresponding to the feature sequence to determine the feature sequence The non-blank character in the corresponding character; The second subclassifier is used to carry out the second classification to the non-blank character to determine the category of the non-blank character; the training module includes: a first training unit, using Based on the first sample image, the initial candidate convolutional neural network and the initial second subclassifier are first trained to obtain the candidate convolutional neural network, the initial first subclassifier and the candidate second subclassifier. The intermediate candidate neural network of the device; the second training unit is used to fix the network parameters of the candidate convolutional neural network and the candidate second sub-classifier in the intermediate candidate neural network, and classify the candidate second sub-classifier The output result of the detector is used as the supervisory information, and the second training is performed on the initial first sub-classifier based on the second sample image, and the candidate convolutional neural network, the candidate first sub-classifier and the candidate second sub-classifier are obtained. The candidate neural network for the sub-classifier.
在一些实施例中,所述约束条件包括所述目标神经网络进行字符识别的时长上限,所述装置还包括:第一获取模块,用于获取所述候选神经网络对测试图像进行字符识别所花费的时长;第二确定模块,用于在所述时长小于所述时长上限的情况下,确定所述候选神经网络满足所述约束条件。In some embodiments, the constraint condition includes an upper limit of the character recognition time of the target neural network, and the device further includes: a first acquisition module, configured to acquire the time spent by the candidate neural network to perform character recognition on the test image duration; a second determination module, configured to determine that the candidate neural network satisfies the constraint condition when the duration is less than the upper limit of the duration.
在一些实施例中,所述约束条件包括所述目标神经网络进行字符识别的准确度下限;所述装置还包括:第二获取模块,用于获取所述候选神经网络对测试图像进行字符识别的识别准确度;第三确定模块,用于在所述识别准确度高于所述准确度下限的情况下,确定所述候选神经网络满足所述约束条件。In some embodiments, the constraints include the lower limit of the accuracy of character recognition performed by the target neural network; the device further includes: a second acquisition module, configured to acquire the character recognition accuracy of the candidate neural network for the test image. Recognition accuracy; a third determining module, configured to determine that the candidate neural network satisfies the constraints when the recognition accuracy is higher than the lower limit of accuracy.
在一些实施例中,所述超参数包括以下至少一者:所述卷积神经网络的深度、通道数、卷积核的尺寸、下采样次数、下采样方式、所述卷积神经网络中进行下采样的网络层所处的位置、输入所述卷积神经网络的目标图像的分辨率。In some embodiments, the hyperparameters include at least one of the following: the depth of the convolutional neural network, the number of channels, the size of the convolution kernel, the number of downsampling times, the downsampling method, and the convolutional neural network. The location of the downsampled network layer, the resolution of the target image input to the convolutional neural network.
根据本公开实施例的第五方面,提供一种神经网络训练装置,所述装置包括:训练模块,用于采用样本图像,分别对多个初始候选神经网络中的每一个进行训练,得到多个候选神经网络,其中每个初始候选神经网络包括:初始候选卷积神经网络,用于对所述样本图像进行特征提取,得到所述样本图像的特征序列;各个初始候选卷积神经网络的候选超参数至少部分不同;初始分类器,用于对所述特征序列对应的字符进行分类,得到所述特征序列对应的字符的类别,所述字符的类别用于对所述样本图像进行字符识别;筛选模块,用于从所述多个候选神经网络中筛选出满足约束条件的目标神经网络。According to a fifth aspect of an embodiment of the present disclosure, there is provided a neural network training device, the device comprising: a training module, configured to use a sample image to train each of a plurality of initial candidate neural networks to obtain a plurality of Candidate neural networks, wherein each initial candidate neural network includes: an initial candidate convolutional neural network for feature extraction of the sample image to obtain a feature sequence of the sample image; candidate superstructures of each initial candidate convolutional neural network The parameters are at least partially different; the initial classifier is used to classify the characters corresponding to the feature sequence to obtain the category of the character corresponding to the feature sequence, and the category of the character is used to perform character recognition on the sample image; screening A module, configured to screen out a target neural network satisfying constraints from the plurality of candidate neural networks.
在一些实施例中,所述初始分类器包括第一初始子分类器以及第二初始子分类器;所述第一初始子分类器用于对所述特征序列对应的字符进行第一分类,以确定所述样本图像的特征序列对应的字符中的非空白符;所述第二初始子分类器用于对所述非空白符进行第二分类,以确定所述非空白符的类别;所述训练模块包括:第一训练单元,用于基于第一样本图像对所述初始候选卷积神经网络和初始第二子分类器进行第一训练,得到包括候选卷积神经网络、初始第一子分类器和候选第二子分类器的中间候选神经网络;第二训练单元,用于固定所述中间候选神经网络中所述候选卷积神经网络和所述候选第二子分类器的网络参数,将所述候选第二子分类器的输出结果作为监督信息,并基于第二样本图像对所述初始第一子分类器进行第二训练,得到包括所述候选卷积神经网络、候选第一子分类器和所述候选第二子分类器的所述候选神经网络。In some embodiments, the initial classifier includes a first initial sub-classifier and a second initial sub-classifier; the first initial sub-classifier is used to first classify characters corresponding to the feature sequence to determine The non-blank character in the character corresponding to the feature sequence of the sample image; the second initial sub-classifier is used to perform a second classification on the non-blank character to determine the category of the non-blank character; the training module Including: a first training unit, which is used to perform first training on the initial candidate convolutional neural network and the initial second sub-classifier based on the first sample image, to obtain the candidate convolutional neural network, the initial first sub-classifier and the intermediate candidate neural network of the candidate second sub-classifier; the second training unit is used to fix the network parameters of the candidate convolutional neural network and the candidate second sub-classifier in the intermediate candidate neural network, and the The output result of the candidate second sub-classifier is used as supervision information, and the second training is performed on the initial first sub-classifier based on the second sample image to obtain the candidate convolutional neural network, the candidate first sub-classifier and said candidate neural network of said candidate second sub-classifier.
根据本公开实施例的第六方面,提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现任一实施例所述的方法。According to a sixth aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method described in any embodiment is implemented.
根据本公开实施例的第七方面,提供一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现任一实施例所述的方法。According to a seventh aspect of the embodiments of the present disclosure, there is provided an electronic device, including a memory, a processor, and a computer program stored in the memory and operable on the processor, when the processor executes the program, any implementation The method described in the example.
根据本公开实施例的第八方面,提供一种计算机程序,所述计算机程序包括计算机可读代码,所述计算机可读代码被处理器执行时实现任一实施例所述的方法。According to an eighth aspect of the embodiments of the present disclosure, a computer program is provided, the computer program includes computer readable code, and when the computer readable code is executed by a processor, the method described in any embodiment is implemented.
本公开实施例采用卷积神经网络和分类器来从目标图像中识别出字符,一方面,由于无需采用循环神经网络,从而提高了字符识别效率;另一方面,通过从多组候选超参 数中选取较优的目标超参数来建立卷积神经网络,以保证卷积神经网络的感受野,使提取的特征能够包含更多有效信息。因此包括该卷积神经网络和分类器的目标神经网络总体上能够满足预设的约束条件,从而使目标神经网络能够获得较高的字符识别准确度。Embodiments of the present disclosure use a convolutional neural network and a classifier to identify characters from target images. On the one hand, the efficiency of character recognition is improved due to the absence of a cyclic neural network; Select better target hyperparameters to build a convolutional neural network to ensure the receptive field of the convolutional neural network, so that the extracted features can contain more effective information. Therefore, the target neural network including the convolutional neural network and the classifier can generally meet the preset constraint conditions, so that the target neural network can obtain higher character recognition accuracy.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本公开。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
附图说明Description of drawings
此处的附图被并入说明书中并构成本说明书的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。The accompanying drawings here are incorporated into the specification and constitute a part of the specification. These drawings show embodiments consistent with the present disclosure, and are used together with the description to explain the technical solutions of the present disclosure.
图1是相关技术中用于进行字符识别的神经网络的示意图。FIG. 1 is a schematic diagram of a neural network used for character recognition in the related art.
图2是本公开实施例的字符识别方法的流程图。FIG. 2 is a flowchart of a character recognition method according to an embodiment of the present disclosure.
图3是相关技术中字符识别的总体流程的示意图。FIG. 3 is a schematic diagram of an overall process of character recognition in the related art.
图4是本公开实施例的分类器的分类方式的示意图。Fig. 4 is a schematic diagram of a classification method of a classifier according to an embodiment of the present disclosure.
图5是本公开实施例的二阶段训练方式的示意图。FIG. 5 is a schematic diagram of a two-stage training method of an embodiment of the present disclosure.
图6是本公开实施例的神经网络训练方法的流程图。Fig. 6 is a flowchart of a neural network training method according to an embodiment of the present disclosure.
图7是本公开实施例的神经网络的示意图。FIG. 7 is a schematic diagram of a neural network of an embodiment of the present disclosure.
图8是本公开实施例的字符识别装置的框图。FIG. 8 is a block diagram of a character recognition device according to an embodiment of the present disclosure.
图9是本公开实施例的神经网络训练装置的框图。FIG. 9 is a block diagram of a neural network training device according to an embodiment of the present disclosure.
图10是本公开实施例的电子设备的结构示意图。FIG. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
具体实施方式detailed description
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatuses and methods consistent with aspects of the present disclosure as recited in the appended claims.
在本公开使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本公开。在本公开和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数 形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合。The terminology used in the present disclosure is for the purpose of describing particular embodiments only, and is not intended to limit the present disclosure. As used in this disclosure and the appended claims, the singular forms "a", "the", and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It should also be understood that the term "and/or" as used herein refers to and includes any and all possible combinations of one or more of the associated listed items. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality.
应当理解,尽管在本公开可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本公开范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。It should be understood that although the terms first, second, third, etc. may be used in the present disclosure to describe various information, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of the present disclosure, first information may also be called second information, and similarly, second information may also be called first information. Depending on the context, the word "if" as used herein may be interpreted as "at" or "when" or "in response to a determination."
为了使本技术领域的人员更好的理解本公开实施例中的技术方案,并使本公开实施例的上述目的、特征和优点能够更加明显易懂,下面结合附图对本公开实施例中的技术方案作进一步详细的说明。In order to enable those skilled in the art to better understand the technical solutions in the embodiments of the present disclosure, and to make the above-mentioned purposes, features and advantages of the embodiments of the present disclosure more obvious and understandable, the technical solutions in the embodiments of the present disclosure are described below in conjunction with the accompanying drawings The program is described in further detail.
字符识别一般是指从图像中识别出字符,所述字符可包括文字(例如,数字、汉字、英文字母等)以及符号(例如,算数运算符号、逻辑运算符号等)。相关技术中一般通过神经网络进行字符识别。一种用于进行字符识别的神经网络如图1所示,可包括卷积神经网络101、循环神经网络102、分类器103以及解码器104。其中,卷积神经网络101用于从图像中提取特征,得到特征序列,循环神经网络102用于对特征序列进行编码,分类器103用于对编码后的特征序列进行分类,解码器用于对分类器103的分类结果进行解码,以识别出图像中的字符。然而,循环神经网络的处理过程耗时较长,导致字符识别的效率较低。Character recognition generally refers to recognizing characters from images, and the characters may include text (for example, numbers, Chinese characters, English letters, etc.) and symbols (for example, arithmetic operation symbols, logical operation symbols, etc.). In related technologies, character recognition is generally performed through a neural network. A neural network for character recognition is shown in FIG. 1 , which may include a convolutional neural network 101 , a recurrent neural network 102 , a classifier 103 and a decoder 104 . Among them, the convolutional neural network 101 is used to extract features from the image to obtain a feature sequence, the cyclic neural network 102 is used to encode the feature sequence, the classifier 103 is used to classify the encoded feature sequence, and the decoder is used to classify The classification result of the device 103 is decoded to recognize the characters in the image. However, the processing process of the recurrent neural network takes a long time, resulting in low efficiency of character recognition.
基于此,本公开实施例提供一种字符识别方法,如图2所示,所述方法包括:Based on this, an embodiment of the present disclosure provides a character recognition method, as shown in FIG. 2 , the method includes:
步骤201:通过目标神经网络的卷积神经网络对目标图像进行特征提取,得到所述目标图像的特征序列;Step 201: performing feature extraction on the target image through the convolutional neural network of the target neural network to obtain a feature sequence of the target image;
步骤202:通过所述目标神经网络的分类器对所述目标图像的特征序列对应的字符进行分类,得到所述目标图像的特征序列对应的字符的类别;Step 202: Classify the characters corresponding to the feature sequence of the target image by the classifier of the target neural network to obtain the category of the character corresponding to the feature sequence of the target image;
步骤203:基于所述目标图像的特征序列对应的字符的类别对所述目标图像进行字符识别;Step 203: Perform character recognition on the target image based on the character category corresponding to the feature sequence of the target image;
其中,通过选取所述卷积神经网络的超参数,基于所述目标神经网络的约束条件,从多组候选超参数中确定所述卷积神经网络。Wherein, by selecting hyperparameters of the convolutional neural network, the convolutional neural network is determined from multiple groups of candidate hyperparameters based on constraints of the target neural network.
在步骤201中,该目标图像可以是包括字符的图像。例如,从真实场景中采集到的 广告牌、证件等对象的图像,通过屏幕录制方式生成的图像,通过格式转换方式生成的图像,或者是通过其他方式生成的图像。目标图像中的字符可以是手写体、打印体等各种字体的字符,本公开对此不做限制。所述字符可以包括数字、汉字、假名、符号等中的至少一者。目标图像中可以包括一个或多个字符,所述多个字符可以规则或者不规则地排列在目标图像上,例如,可以沿着水平方向排列在目标图像上。In step 201, the target image may be an image including characters. For example, images of objects such as billboards and certificates collected from real scenes, images generated by screen recording, images generated by format conversion, or images generated by other methods. The characters in the target image may be characters in various fonts such as handwriting and printing, which is not limited in the present disclosure. The characters may include at least one of numerals, Chinese characters, kana, symbols, and the like. The target image may include one or more characters, and the multiple characters may be arranged on the target image regularly or irregularly, for example, may be arranged on the target image along a horizontal direction.
可以对目标图像进行特征提取,得到特征图F1。假设特征图F1的尺寸为32×512×128,即包括32行,512列,128个通道,可以先对特征图F1进行下采样,例如,下采样后的特征图F2的尺寸为4×64×128。对于横向书写(即沿着水平方向排列在目标图像上)的字符,一般更关注于横向特征之间的关系,因此,可以在竖直方向上对下采样后的特征图F2进行池化处理(例如,最大池化处理或者平均池化处理等),得到1×64×128的特征图F3。将特征图F3中每个横向像素位置的各个通道的特征确定为一个特征序列,共得到64个128维的特征序列t。Feature extraction can be performed on the target image to obtain a feature map F1. Assuming that the size of the feature map F1 is 32×512×128, that is, including 32 rows, 512 columns, and 128 channels, the feature map F1 can be down-sampled first, for example, the size of the down-sampled feature map F2 is 4×64 ×128. For characters written horizontally (that is, arranged on the target image along the horizontal direction), generally more attention is paid to the relationship between horizontal features. Therefore, the downsampled feature map F2 can be pooled in the vertical direction ( For example, maximum pooling processing or average pooling processing, etc.) to obtain a feature map F3 of 1×64×128. The features of each channel of each horizontal pixel position in the feature map F3 are determined as a feature sequence, and a total of 64 128-dimensional feature sequences t are obtained.
在一些实施例中,可以基于包括所述卷积神经网络和所述分类器的整个目标神经网络的约束条件,为所述卷积神经网络确定目标超参数,从而使卷积神经网络能够替代循环神经网络的功能,进而从整个目标神经网络中去除掉循环神经网络,并使整个目标神经网络的性能与去掉循环神经网络之前的性能持平。In some embodiments, target hyperparameters can be determined for the convolutional neural network based on the constraints of the entire target neural network including the convolutional neural network and the classifier, thereby enabling the convolutional neural network to replace loops The function of the neural network, and then remove the recurrent neural network from the entire target neural network, and make the performance of the entire target neural network equal to the performance before removing the recurrent neural network.
所述卷积神经网络的超参数与所述卷积神经网络的感受野相关,从而通过合理地选取卷积神经网络的超参数,使卷积神经网络提取的特征能包含足够多的有效信息。所述超参数可包括以下至少一者:所述卷积神经网络的深度、通道数、卷积核的尺寸、下采样次数、下采样方式、所述卷积神经网络中进行下采样的网络层所处的位置、输入所述卷积神经网络的目标图像的分辨率。其中,卷积神经网络的深度即卷积神经网络的层数,层数越多,卷积神经网络的特征提取能力越强,一次提取的特征能包含的有效信息越多,但特征提取所花费的时间也越长。卷积神经网络的通道数也称为卷积神经网络的宽度,通道数越多,卷积神经网络的感受野越大,一次提取的特征能包含的有效信息越多。卷积核的尺寸越大,卷积神经网络的感受野也越大。下采样方式可以包括带参数的下采样方式(例如,通过卷积处理的方式进行下采样)和不带参数的下采样方式(例如,通过池化处理的方式进行下采样)。卷积神经网络中进行下采样的网络层所处的位置,即在卷积神经网络中的哪几层进行下采样,也即是下采样的时机。下采样的时机较早,能够降低字符识别过程中的计算量,但也会导致特征提取不够充分;相对地,下采样的时机较晚,则能够充分地提取出特征,但会增加计算量。此外,输入所述卷积神经网络的目 标图像的分辨率越高,一次特征提取所获得的有效信息也越多。因此,需要合理地选取卷积神经网络的目标超参数,从而既保证卷积神经网络的感受野,以便提取的特征能包含足够多的有效信息,又使整个目标神经网络能够满足一定的约束条件。所述约束条件可以包括以下至少一者:所述目标神经网络进行字符识别的时长上限;所述目标神经网络进行字符识别的准确度下限。The hyperparameters of the convolutional neural network are related to the receptive field of the convolutional neural network, so that the features extracted by the convolutional neural network can contain enough effective information by reasonably selecting the hyperparameters of the convolutional neural network. The hyperparameters may include at least one of the following: the depth of the convolutional neural network, the number of channels, the size of the convolution kernel, the number of downsampling times, the downsampling method, and the network layer for downsampling in the convolutional neural network The location, the resolution of the target image input to the convolutional neural network. Among them, the depth of the convolutional neural network is the number of layers of the convolutional neural network. The more layers, the stronger the feature extraction ability of the convolutional neural network, and the more effective information can be contained in the features extracted at one time, but the cost of feature extraction is The time is also longer. The number of channels of the convolutional neural network is also called the width of the convolutional neural network. The more the number of channels, the larger the receptive field of the convolutional neural network, and the more effective information can be contained in the features extracted at one time. The larger the size of the convolution kernel, the larger the receptive field of the convolutional neural network. The downsampling manner may include a downsampling manner with parameters (for example, performing downsampling through convolution processing) and a downsampling manner without parameters (for example, performing downsampling through pooling processing). The position of the network layer for downsampling in the convolutional neural network is located, that is, which layers of the convolutional neural network are downsampled, that is, the timing of downsampling. The timing of downsampling is earlier, which can reduce the amount of calculation in the character recognition process, but it will also lead to insufficient feature extraction; relatively, the timing of downsampling is later, it can fully extract features, but it will increase the amount of calculation. In addition, the higher the resolution of the target image input to the convolutional neural network, the more effective information obtained by one feature extraction. Therefore, it is necessary to reasonably select the target hyperparameters of the convolutional neural network, so as to ensure the receptive field of the convolutional neural network, so that the extracted features can contain enough effective information, and make the entire target neural network meet certain constraints . The constraints may include at least one of the following: an upper limit of the character recognition time of the target neural network; a lower limit of the character recognition accuracy of the target neural network.
所述目标超参数可以从多组候选超参数中选取,例如,可以通过网格搜索的方式搜索得到。其中,先定义上述每种超参数的搜索空间以及搜索步长,例如,定义卷积神经网络的深度的搜索空间为N1到N2,定义卷积神经网络的通道数的搜索空间为C1到C2,定义下采样次数的搜索空间为3次到8次。然后,确定上述超参数的各种组合方案,例如,组合方案1为{卷积神经网络的深度为N1,通道数为C1,下采样次数为3次},组合方案2为{卷积神经网络的深度为N1,通道数为C1,下采样次数为4次},等等。可以先确定多个组合方案,再从多个组合方案中选取使整个目标神经网络符合约束条件的组合方案作为目标超参数。或者,可以每次选取一个组合方案,再确定该组合方案是否能够使整个目标神经网络符合约束条件,如果不能,则继续搜索,如果能,则停止搜索。或者,可以遍历所有的组合方案,从中选取使整个目标神经网络符合约束条件的最优组合方案作为目标超参数。The target hyperparameters may be selected from multiple groups of candidate hyperparameters, for example, may be obtained by searching in a grid search manner. Among them, the search space and search step size of each of the above hyperparameters are first defined. For example, the search space defining the depth of the convolutional neural network is N1 to N2, and the search space defining the number of channels of the convolutional neural network is C1 to C2. The search space that defines the number of downsampling is from 3 to 8 times. Then, determine the various combinations of the above hyperparameters, for example, the combination scheme 1 is {the depth of the convolutional neural network is N1, the number of channels is C1, and the number of downsampling is 3 times}, and the combination scheme 2 is {convolutional neural network The depth is N1, the number of channels is C1, the number of downsampling is 4}, and so on. Multiple combination schemes can be determined first, and then the combination scheme that makes the entire target neural network meet the constraint conditions is selected from the multiple combination schemes as the target hyperparameter. Alternatively, one combination scheme can be selected each time, and then it is determined whether the combination scheme can make the entire target neural network meet the constraint conditions, if not, continue the search, and if yes, stop the search. Alternatively, all combination schemes can be traversed, and the optimal combination scheme that makes the entire target neural network meet the constraint conditions can be selected as the target hyperparameter.
一个组合方案对应的超参数称为一组候选超参数。可以基于该组候选超参数建立初始候选卷积神经网络,对包括所述初始候选卷积神经网络以及初始分类器的初始候选神经网络进行训练,得到包括候选卷积神经网络和所述分类器的候选神经网络,再确定该候选神经网络是否符合约束条件。其中,可以基于测试图像对所述候选神经网络进行测试,以确定所述候选神经网络是否满足所述约束条件。例如,在所述时长小于所述时长上限的情况下,确定所述候选神经网络满足所述约束条件。或者,在所述识别准确度高于所述准确度下限的情况下,确定所述候选神经网络满足所述约束条件。或者,在所述时长小于所述时长上限,且所述识别准确度高于所述准确度下限的情况下,确定所述候选神经网络满足所述约束条件。The hyperparameters corresponding to a combined scheme are called a set of candidate hyperparameters. An initial candidate convolutional neural network can be established based on the group of candidate hyperparameters, and the initial candidate neural network including the initial candidate convolutional neural network and the initial classifier is trained to obtain a candidate convolutional neural network and the classifier. candidate neural network, and then determine whether the candidate neural network meets the constraints. Wherein, the candidate neural network may be tested based on the test image to determine whether the candidate neural network satisfies the constraint condition. For example, in a case where the duration is less than the upper limit of the duration, it is determined that the candidate neural network satisfies the constraint condition. Or, in a case where the recognition accuracy is higher than the lower limit of accuracy, it is determined that the candidate neural network satisfies the constraint condition. Alternatively, in a case where the duration is less than the upper limit of the duration and the recognition accuracy is higher than the lower limit of accuracy, it is determined that the candidate neural network satisfies the constraint condition.
在步骤202中,可以采用一个由全连接层构成的分类器对各个特征序列t对应的字符进行分类。当然,分类器的类型也可以是其他类型,本公开对此不做限制。每个字符对应一个类别,例如,汉字“你”对应类别1,“好”对应类别2,数字“1”对应类别3,数字“2”对应类别4,符号“+”对应类别5,等等。In step 202, a classifier composed of a fully connected layer may be used to classify characters corresponding to each feature sequence t. Certainly, the type of the classifier may also be other types, which is not limited in the present disclosure. Each character corresponds to a category, for example, the Chinese character "you" corresponds to category 1, "good" corresponds to category 2, the number "1" corresponds to category 3, the number "2" corresponds to category 4, the symbol "+" corresponds to category 5, etc. .
相关技术中,由于解码后得到的字符序列与解码前的特征序列之间没有显式的对齐 关系,例如,假设特征序列t的维度为5,包括t0到t4这5个特征,解码出的字符序列为“al”,则解码后的字符序列的长度为2,无法与特征序列对齐。为了解决该问题,会将对应同一字符的多个像素位置处的特征识别为相同的字符,例如,将上述特征序列识别为“aaall”或者“aalll”等包括多个连续的重复字符的字符序列。然而,上述方式无法识别出字符中原本就包括重复的两个或两个以上字符的情况,例如,英文单词“hello”中的两个“l”,或者成语“沾沾自喜”中的两个“沾”。为了解决上述情况,可以在字符之间插入空白符,空白符是一种特殊的字符,用来插入到字符之间。假设用符号“-”表示空白符,则可以在字符序列“al”的“a”之前、“l”之后以及“a”和/或“l”之间插入若干个空白符,例如,得到字符序列“--aaa---ll-”。然后,再通过CTC(Connectionist Temporal Classification)解码对字符序列中重复的字符进行合并,得到合并后的字符序列“-a-l-”,并从合并后的字符中去掉空白符,得到字符序列“al”,上述过程如图3所示。In related technologies, since there is no explicit alignment relationship between the character sequence obtained after decoding and the feature sequence before decoding, for example, assuming that the dimension of the feature sequence t is 5, including the 5 features t0 to t4, the decoded character If the sequence is "al", the length of the decoded character sequence is 2, which cannot be aligned with the feature sequence. In order to solve this problem, the features at multiple pixel positions corresponding to the same character will be recognized as the same character, for example, the above feature sequence will be recognized as a character sequence including multiple consecutive repeated characters such as "aaall" or "aalll" . However, the above-mentioned method cannot recognize the situation that the character originally includes two or more characters that are repeated, for example, the two "l" in the English word "hello", or the two "l" in the idiom "smug" ". In order to solve the above situation, a blank character can be inserted between characters. A blank character is a special character used to insert between characters. Assuming that the symbol "-" is used to represent a blank character, several blank characters can be inserted before "a", after "l" and between "a" and/or "l" of the character sequence "al", for example, to obtain the characters The sequence "--aaa---ll-". Then, through CTC (Connectionist Temporal Classification) decoding, the repeated characters in the character sequence are merged to obtain the merged character sequence "-a-l-", and the blank character is removed from the merged characters to obtain the character sequence "al" , the above process is shown in Figure 3.
在实际应用中,一个字符序列中的大部分字符都是空白符,只有少数字符是目标图像中的字符。然而,在进行CTC解码时,并不区分空白符和非空白符,而是对字符序列中的所有字符按照同样的方式进行分类。以汉字识别过程为例,汉字的数量约为2万个,每对一个字符进行分类,都需要从这2万个汉字中确定出待分类的字符的类别。因此,上述字符分类过程的效率较低。In practical applications, most of the characters in a character sequence are blank characters, and only a few characters are characters in the target image. However, when CTC decoding is performed, blank characters and non-blank characters are not distinguished, but all characters in the character sequence are classified in the same manner. Taking the Chinese character recognition process as an example, the number of Chinese characters is about 20,000. Every time a character is classified, it is necessary to determine the category of the character to be classified from the 20,000 Chinese characters. Therefore, the efficiency of the above character classification process is low.
为了解决上述问题,如图4所示,本公开实施例先对所述目标图像的特征序列对应的字符进行第一分类,以确定所述目标图像的特征序列对应的字符中的非空白符;对所述非空白符进行第二分类,以确定所述非空白符的类别。上述过程先进行一个二分类,以区分出空白符和非空白符,再对非空白符进行两万汉字的分类,而空白符则不需要进行分类。由于空白符无需进行分类,因此,有效提高了分类过程的效率。In order to solve the above problem, as shown in FIG. 4 , the embodiment of the present disclosure first classifies the characters corresponding to the feature sequence of the target image to determine the non-blank characters in the characters corresponding to the feature sequence of the target image; A second classification is performed on the non-blank characters to determine the category of the non-blank characters. The above process first performs a binary classification to distinguish blank characters and non-blank characters, and then classifies 20,000 Chinese characters for non-blank characters, while blank characters do not need to be classified. Since blank characters do not need to be classified, the efficiency of the classification process is effectively improved.
在进行空白符和非空白符的二分类时,可以确定所述目标图像的特征序列对应的字符属于非空白符的概率;将概率大于预设概率阈值的字符确定为非空白符。由于二分类所花费的时间较短,因此,上述过程能够有效提高字符分类效率,节省字符分类过程的耗时。When performing binary classification of blanks and non-blanks, the probability of characters corresponding to the feature sequence of the target image belonging to non-blanks can be determined; characters with a probability greater than a preset probability threshold are determined as non-blanks. Since the time spent on the binary classification is relatively short, the above process can effectively improve the efficiency of character classification and save time in the character classification process.
进一步地,在以目标神经网络进行字符识别的时长上限作为约束条件的情况下,由于在分类过程中的耗时减少了,因此,可以将更多的时间分配给卷积神经网络,从而使卷积神经网络能够采用更优的参数(例如,更大的深度、更多的通道数)来保证特征提取能力,从而进一步提高了字符识别的准确性。Further, under the condition that the upper limit of the character recognition time of the target neural network is used as a constraint, since the time consumption in the classification process is reduced, more time can be allocated to the convolutional neural network, so that the convolution The product neural network can adopt better parameters (for example, greater depth, more number of channels) to ensure the feature extraction ability, thereby further improving the accuracy of character recognition.
上述过程可以采用两个子分类器实现,其中,第一子分类器用于进行所述第一分类,第二子分类器用于进行所述第二分类。在这种情况下,可以采用二阶段训练方式对包括所述初始候选卷积神经网络以及初始分类器的初始候选神经网络进行训练。具体而言,如图5所示,训练前的初始候选神经网络包括初始候选卷积神经网络、初始分类器和CTC解码器,其中,初始分类器包括初始第一子分类器和初始第二子分类器,CTC解码器是一个无参数的解码器,无需优化。首先,可以固定初始第一子分类器的网络参数,对初始候选神经网络,即初始候选卷积神经网络和初始第二子分类器,进行第一训练,得到包括所述候选卷积神经网络、初始第一子分类器和候选第二子分类器的中间候选神经网络。然后,固定所述中间候选神经网络中所述候选卷积神经网络和所述候选第二子分类器的网络参数,将所述候选第二子分类器的输出结果作为监督信息,对所述初始第一子分类器进行第二训练,得到包括所述候选卷积神经网络、候选第一子分类器和所述候选第二子分类器的所述候选神经网络。The above process can be realized by using two sub-classifiers, wherein the first sub-classifier is used for the first classification, and the second sub-classifier is used for the second classification. In this case, the initial candidate neural network including the initial candidate convolutional neural network and the initial classifier may be trained in a two-stage training manner. Specifically, as shown in Figure 5, the initial candidate neural network before training includes an initial candidate convolutional neural network, an initial classifier, and a CTC decoder, where the initial classifier includes an initial first sub-classifier and an initial second sub-classifier. Classifier, CTC decoder is a parameter-free decoder without optimization. First, the network parameters of the initial first sub-classifier can be fixed, and the initial candidate neural network, that is, the initial candidate convolutional neural network and the initial second sub-classifier, is first trained to obtain the candidate convolutional neural network, Intermediate candidate neural network for initial first sub-classifier and candidate second sub-classifier. Then, the network parameters of the candidate convolutional neural network and the candidate second sub-classifier in the intermediate candidate neural network are fixed, and the output result of the candidate second sub-classifier is used as supervision information, and the initial The first sub-classifier performs second training to obtain the candidate neural network including the candidate convolutional neural network, the candidate first sub-classifier, and the candidate second sub-classifier.
可以对各个初始候选神经网络均采用上述方式进行训练,分别得到对应的候选神经网络,然后从各个候选神经网络中确定满足约束条件的目标神经网络,被选择的候选神经网络中的候选卷积神经网络、候选第一子分类器和候选第二子分类器即分别为目标神经网络中的卷积神经网络、第一子分类器和第二子分类器。Each initial candidate neural network can be trained in the above manner to obtain the corresponding candidate neural network, and then determine the target neural network satisfying the constraints from each candidate neural network, and the candidate convolutional neural network in the selected candidate neural network The network, the candidate first sub-classifier and the candidate second sub-classifier are respectively the convolutional neural network, the first sub-classifier and the second sub-classifier in the target neural network.
在第一训练过程中,可以采用带标注信息的样本图像。该标注信息可以预先通过人工方式或者其他方式标定,标注信息用于确定样本图像中字符的真实值(ground truth),包括用于指示空白符的标注信息以及用于指示非空白符的字符类别的标注信息。在该训练阶段,无需训练用于二分类的第一子分类器。经过第一训练,第二子分类器可以对空白符和非空白符的具体字符进行分类。在第二训练过程中,采用第二子分类器的输出结果(空白符/非空白符的概率)作为监督信息,训练第一子分类器。In the first training process, sample images with labeled information can be used. The annotation information can be calibrated manually or in other ways in advance, and the annotation information is used to determine the ground truth of the characters in the sample image, including the annotation information used to indicate blank characters and the character category used to indicate non-blank characters Callout information. In this training phase, there is no need to train the first sub-classifier for binary classification. After the first training, the second sub-classifier can classify specific characters of blanks and non-blanks. In the second training process, the output result of the second sub-classifier (probability of blank/non-blank) is used as supervisory information to train the first sub-classifier.
整个字符识别过程的输入输出如下:The input and output of the entire character recognition process are as follows:
(1)特征提取(1) Feature extraction
输入:待识别的目标图像(如文字图像);Input: the target image to be recognized (such as a text image);
输出:特征序列。Output: sequence of features.
为了提高模型的推理速度,以MobileNet等轻量级卷积神经网络为基础进行设计,对卷积神经网络的深度、宽度以及下采样策略等超参数进行精细的搜索,得到速度和精度最优的神经网络。In order to improve the reasoning speed of the model, the design is based on lightweight convolutional neural networks such as MobileNet, and the hyperparameters such as the depth, width, and downsampling strategy of the convolutional neural network are searched carefully to obtain the optimal speed and accuracy. Neural Networks.
(2)空白符预测(2) Blank prediction
输入:特征序列;input: feature sequence;
输出:序列中每个位置(如字符)处为空白符的概率。Output: The probability of a blank character at each position (such as a character) in the sequence.
使用类别数为2的线性分类器对特征序列中每个位置处是否为空白符进行预测。Use a linear classifier with 2 categories to predict whether each position in the feature sequence is a blank character.
(3)非空白符分类(3) Classification of non-blank characters
输入:特征序列、序列中每个位置处为空白符的概率;Input: feature sequence, the probability that each position in the sequence is a blank character;
输出:序列中非空白符的字符分类置信度。Output: Character classification confidence for non-whitespace characters in the sequence.
根据序列中每个位置处为空白符的概率,取出序列中非空白符位置的特征向量,使用全连接层进行字符分类。According to the probability that each position in the sequence is a blank character, the feature vector of the non-blank character position in the sequence is taken out, and the fully connected layer is used for character classification.
(4)CTC解码(4) CTC decoding
输入:序列中每个位置处为空白符的概率、序列中非空白符的字符分类置信度;Input: the probability of a blank character at each position in the sequence, the character classification confidence of non-blank characters in the sequence;
输出:预测字符串。Output: Predicted string.
根据序列中每个位置处为空白符的概率以及序列中非空白符的字符分类置信度,复原特征序列的字符分类置信度,并使用CTC解码。According to the probability of blank characters at each position in the sequence and the character classification confidence of non-blank characters in the sequence, the character classification confidence of the feature sequence is restored and decoded using CTC.
如图6所示,本公开实施例还提供一种神经网络训练方法,所述方法包括:As shown in Figure 6, the embodiment of the present disclosure also provides a neural network training method, the method comprising:
步骤601:采用样本图像,分别对多个初始候选神经网络中的每个初始候选神经网络进行训练,得到多个候选神经网络,其中每个初始候选神经网络包括:Step 601: Using sample images, train each of the multiple initial candidate neural networks respectively to obtain multiple candidate neural networks, where each initial candidate neural network includes:
初始候选卷积神经网络,用于对所述样本图像进行特征提取,得到所述样本图像的特征序列;各个初始候选卷积神经网络的候选超参数至少部分不同;The initial candidate convolutional neural network is used to perform feature extraction on the sample image to obtain a feature sequence of the sample image; the candidate hyperparameters of each initial candidate convolutional neural network are at least partially different;
初始分类器,用于对所述样本图像的特征序列对应的字符进行分类,得到所述样本图像的特征序列对应的字符的类别,所述样本图像的特征序列对应的字符的类别用于对所述样本图像进行字符识别;The initial classifier is used to classify the characters corresponding to the feature sequence of the sample image to obtain the category of the character corresponding to the feature sequence of the sample image, and the category of the character corresponding to the feature sequence of the sample image is used to classify the characters corresponding to the feature sequence of the sample image. The above sample image is used for character recognition;
步骤602:从所述多个候选神经网络中筛选出满足约束条件的目标神经网络。Step 602: Screen out a target neural network satisfying the constraints from the plurality of candidate neural networks.
在一些实施例中,所述初始分类器包括第一初始子分类器,用于对所述样本图像的特征序列对应的字符进行第一分类,以确定所述样本图像的特征序列对应的字符中的非空白符,以及第二初始子分类器,用于对所述非空白符进行第二分类,以确定所述非空白符的类别;所述采用样本图像,分别对多个初始候选神经网络中的每个初始候选神经 网络进行训练,包括:基于第一样本图像对所述初始候选卷积神经网络和初始第二子分类器进行第一训练,得到包括候选卷积神经网络、初始第一子分类器和候选第二子分类器的中间候选神经网络;固定所述中间候选神经网络中所述候选卷积神经网络和所述候选第二子分类器的网络参数,将所述候选第二子分类器的输出结果作为监督信息,并基于第二样本图像对所述初始第一子分类器进行第二训练,得到包括所述候选卷积神经网络、候选第一子分类器和所述候选第二子分类器的所述候选神经网络。In some embodiments, the initial classifier includes a first initial sub-classifier, configured to first classify the characters corresponding to the feature sequence of the sample image, so as to determine the characters corresponding to the feature sequence of the sample image The non-blank characters, and the second initial sub-classifier are used to perform a second classification on the non-blank characters to determine the category of the non-blank characters; the sample image is used to perform a plurality of initial candidate neural networks respectively Each initial candidate neural network in is trained, including: performing the first training on the initial candidate convolutional neural network and the initial second sub-classifier based on the first sample image, to obtain the candidate convolutional neural network, the initial first sub-classifier An intermediate candidate neural network of a sub-classifier and a candidate second sub-classifier; fixing the network parameters of the candidate convolutional neural network and the candidate second sub-classifier in the intermediate candidate neural network, the candidate first sub-classifier The output result of the second sub-classifier is used as supervisory information, and the second training is performed on the initial first sub-classifier based on the second sample image to obtain the candidate convolutional neural network, the candidate first sub-classifier and the Said candidate neural network is a candidate second sub-classifier.
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。Those skilled in the art can understand that in the above method of specific implementation, the writing order of each step does not mean a strict execution order and constitutes any limitation on the implementation process. The specific execution order of each step should be based on its function and possible The inner logic is OK.
如图7所示,本公开实施例还提供一种神经网络,所述神经网络包括:As shown in FIG. 7, an embodiment of the present disclosure also provides a neural network, and the neural network includes:
卷积神经网络701,用于对目标图像进行特征提取,得到所述目标图像的特征序列;以及A convolutional neural network 701, configured to extract features from the target image to obtain a feature sequence of the target image; and
分类器702,用于对所述目标图像的特征序列对应的字符进行分类,得到所述目标图像的特征序列对应的字符的类别,所述目标图像的特征序列对应的字符的类别用于对所述目标图像进行字符识别;A classifier 702, configured to classify the characters corresponding to the feature sequence of the target image to obtain the category of the character corresponding to the feature sequence of the target image, and the category of the character corresponding to the feature sequence of the target image is used to classify the characters corresponding to the feature sequence of the target image The target image is used for character recognition;
其中,通过选取卷积神经网络701的超参数,基于所述神经网络的约束条件,从多组候选超参数中确定所述卷积神经网络701。Wherein, by selecting hyperparameters of the convolutional neural network 701, the convolutional neural network 701 is determined from multiple groups of candidate hyperparameters based on constraints of the neural network.
在一些实施例中,分类器702包括第一子分类器7021,用于对所述目标图像的特征序列对应的字符进行第一分类,以确定所述目标图像的特征序列对应的字符中的非空白符;以及第二子分类器7022,用于对所述非空白符进行第二分类,以确定所述非空白符的类别。In some embodiments, the classifier 702 includes a first sub-classifier 7021, configured to perform a first classification on the characters corresponding to the feature sequence of the target image, so as to determine the characters that are not in the characters corresponding to the feature sequence of the target image. a blank character; and a second subclassifier 7022, configured to perform a second classification on the non-blank character to determine the category of the non-blank character.
本公开实施例的神经网络可采用上述任一实施例中所述的神经网络训练方法训练得到。训练后的神经网络可以用于执行前述任一实施例所述的字符识别方法。The neural network in the embodiments of the present disclosure can be trained by using the neural network training method described in any of the above embodiments. The trained neural network can be used to implement the character recognition method described in any of the foregoing embodiments.
在一些实施例中,所述神经网络还可以包括解码器703,解码器703是一个无参数的功能模块,可采用CTC解码方式进行解码。In some embodiments, the neural network may further include a decoder 703, which is a function module without parameters, and may be decoded in a CTC decoding manner.
如图8所示,本公开实施例还提供一种字符识别装置,所述装置包括:As shown in FIG. 8 , an embodiment of the present disclosure also provides a character recognition device, which includes:
特征提取模块801,用于通过目标神经网络的卷积神经网络对目标图像进行特征提取,得到所述目标图像的特征序列;The feature extraction module 801 is used to extract the features of the target image through the convolutional neural network of the target neural network to obtain the feature sequence of the target image;
分类模块802,用于通过所述目标神经网络的分类器对所述目标图像的特征序列对应的字符进行分类,得到所述目标图像的特征序列对应的字符的类别;A classification module 802, configured to classify the character corresponding to the feature sequence of the target image through the classifier of the target neural network, to obtain the category of the character corresponding to the feature sequence of the target image;
识别模块803,用于基于所述目标图像的特征序列对应的字符的类别对所述目标图像进行字符识别;A recognition module 803, configured to perform character recognition on the target image based on the character category corresponding to the feature sequence of the target image;
其中,通过选取所述卷积神经网络的超参数,基于所述目标神经网络的约束条件,从多组候选超参数中确定所述卷积神经网络。Wherein, by selecting hyperparameters of the convolutional neural network, the convolutional neural network is determined from multiple groups of candidate hyperparameters based on constraints of the target neural network.
在一些实施例中,所述分类模块包括:第一分类单元,用于对所述目标图像的特征序列对应的字符进行第一分类,以确定所述目标图像的特征序列对应的字符中的非空白符;第二分类单元,用于对所述非空白符进行第二分类,以确定所述非空白符的类别。In some embodiments, the classification module includes: a first classification unit, configured to perform a first classification on the characters corresponding to the feature sequence of the target image, so as to determine the characters that are not in the characters corresponding to the feature sequence of the target image Blank characters; a second classification unit, configured to perform a second classification on the non-blank characters to determine the category of the non-blank characters.
在一些实施例中,所述第一分类单元包括:概率确定子单元,用于确定所述目标图像的特征序列对应的字符属于非空白符的概率;分类子单元,用于将概率大于预设概率阈值的字符确定为非空白符。In some embodiments, the first classification unit includes: a probability determination subunit, configured to determine the probability that a character corresponding to the feature sequence of the target image belongs to a non-blank character; a classification subunit, configured to set the probability greater than a preset Characters for the probability threshold are determined to be non-blank characters.
在一些实施例中,所述装置还包括:搜索模块,用于从所述超参数的搜索空间中搜索多组候选超参数;网络建立模块,用于基于每组候选超参数建立初始候选卷积神经网络;训练模块,用于对包括所述初始候选卷积神经网络以及初始分类器的初始候选神经网络进行训练,得到候选神经网络,所述候选神经网络包括候选卷积神经网络和候选分类器;第一确定模块,用于在所述候选神经网络满足所述约束条件的情况下,将所述候选卷积神经网络确定为所述目标神经网络的卷积神经网络,并将所述候选分类器确定为所述目标神经网络的分类器。In some embodiments, the apparatus further includes: a search module, configured to search multiple sets of candidate hyperparameters from the hyperparameter search space; a network building module, configured to establish an initial candidate convolution based on each set of candidate hyperparameters Neural network; a training module, for training the initial candidate neural network comprising the initial candidate convolutional neural network and the initial classifier to obtain a candidate neural network, the candidate neural network comprising a candidate convolutional neural network and a candidate classifier ; a first determination module, configured to determine the candidate convolutional neural network as the convolutional neural network of the target neural network and classify the candidate when the candidate neural network satisfies the constraint condition The classifier is determined as the target neural network.
在一些实施例中,所述分类器包括第一子分类器,用于对所述目标图像的特征序列对应的字符进行第一分类,以确定所述目标图像的特征序列对应的字符中的非空白符,以及第二子分类器,用于对所述非空白符进行第二分类,以确定所述非空白符的类别;所述训练模块包括:第一训练单元,用于固定初始第一子分类器的网络参数,基于第一样本图像对所述初始候选神经网络,即所述初始候选卷积神经网络和初始第二子分类器进行第一训练,得到包括所述候选卷积神经网络、初始第一子分类器和候选第二子分类器的中间候选神经网络;第二训练单元,用于固定所述中间候选神经网络中所述候选卷积神经网络和所述候选第二子分类器的网络参数,将所述候选第二子分类器的输出结果作为监督信息,并基于第二样本图像对所述初始第一子分类器进行第二训练,得到包括所述候选卷积神经网络、候选第一子分类器和所述候选第二子分类器的所述候选神 经网络。In some embodiments, the classifier includes a first sub-classifier, configured to perform a first classification on characters corresponding to the feature sequence of the target image, so as to determine characters that are not in the characters corresponding to the feature sequence of the target image. Blank characters, and a second sub-classifier, used to perform a second classification on the non-blank characters to determine the category of the non-blank characters; the training module includes: a first training unit, used to fix the initial first The network parameters of the sub-classifier, based on the first sample image, the initial candidate neural network, that is, the initial candidate convolutional neural network and the initial second sub-classifier are first trained to obtain the candidate convolutional neural network The intermediate candidate neural network of the network, the initial first sub-classifier and the candidate second sub-classifier; the second training unit is used to fix the candidate convolutional neural network and the candidate second sub-classifier in the intermediate candidate neural network. The network parameters of the classifier, using the output of the candidate second sub-classifier as supervisory information, and performing second training on the initial first sub-classifier based on the second sample image, to obtain the candidate convolutional neural network parameters network, a candidate first sub-classifier, and said candidate neural network for said candidate second sub-classifier.
在一些实施例中,所述约束条件包括所述目标神经网络进行字符识别的时长上限,所述装置还包括:第一获取模块,用于获取所述候选神经网络对测试图像进行字符识别所花费的时长;第二确定模块,用于在所述时长小于所述时长上限的情况下,确定所述候选神经网络满足所述约束条件。In some embodiments, the constraint condition includes an upper limit of the character recognition time of the target neural network, and the device further includes: a first acquisition module, configured to acquire the time spent by the candidate neural network to perform character recognition on the test image duration; a second determination module, configured to determine that the candidate neural network satisfies the constraint condition when the duration is less than the upper limit of the duration.
在一些实施例中,所述约束条件包括所述目标神经网络进行字符识别的准确度下限;所述装置还包括:第二获取模块,用于获取所述候选神经网络对测试图像进行字符识别的识别准确度;第三确定模块,用于在所述识别准确度高于所述准确度下限的情况下,确定所述候选神经网络满足所述约束条件。In some embodiments, the constraints include the lower limit of the accuracy of character recognition performed by the target neural network; the device further includes: a second acquisition module, configured to acquire the character recognition accuracy of the candidate neural network for the test image. Recognition accuracy; a third determining module, configured to determine that the candidate neural network satisfies the constraints when the recognition accuracy is higher than the lower limit of accuracy.
在一些实施例中,所述超参数包括以下至少一者:所述卷积神经网络的深度、通道数、卷积核的尺寸、下采样次数、下采样方式、所述卷积神经网络中进行下采样的网络层所处的位置、输入所述卷积神经网络的目标图像的分辨率。In some embodiments, the hyperparameters include at least one of the following: the depth of the convolutional neural network, the number of channels, the size of the convolution kernel, the number of downsampling times, the downsampling method, and the convolutional neural network. The location of the downsampled network layer, the resolution of the target image input to the convolutional neural network.
如图9所示,本公开实施例还提供一种神经网络训练装置,所述装置包括:As shown in Figure 9, the embodiment of the present disclosure also provides a neural network training device, the device includes:
训练模块901,用于采用样本图像,分别对多个初始候选神经网络中的每个初始候选神经网络进行训练,得到多个候选神经网络,其中每个初始候选神经网络包括:The training module 901 is configured to use sample images to train each of the multiple initial candidate neural networks respectively to obtain multiple candidate neural networks, wherein each initial candidate neural network includes:
初始候选卷积神经网络,用于对所述样本图像进行特征提取,得到所述样本图像的特征序列;各个初始候选卷积神经网络的候选超参数至少部分不同;The initial candidate convolutional neural network is used to perform feature extraction on the sample image to obtain a feature sequence of the sample image; the candidate hyperparameters of each initial candidate convolutional neural network are at least partially different;
初始分类器,用于对所述样本图像的特征序列对应的字符进行分类,得到所述样本图像的特征序列对应的字符的类别,所述样本图像的特征序列对应的字符的类别用于对所述样本图像进行字符识别;The initial classifier is used to classify the characters corresponding to the feature sequence of the sample image to obtain the category of the character corresponding to the feature sequence of the sample image, and the category of the character corresponding to the feature sequence of the sample image is used to classify the characters corresponding to the feature sequence of the sample image. The above sample image is used for character recognition;
筛选模块902,用于从所述多个候选神经网络中筛选出满足约束条件的目标神经网络。A screening module 902, configured to screen out a target neural network satisfying constraints from the plurality of candidate neural networks.
在一些实施例中,所述初始分类器包括第一初始子分类器,用于对所述样本图像的特征序列对应的字符进行第一分类,以确定所述样本图像的特征序列对应的字符中的非空白符,以及第二初始子分类器,用于对所述非空白符进行第二分类,以确定所述非空白符的类别;所述训练模块包括:第一训练单元,用于基于第一样本图像对所述初始候选卷积神经网络和初始第二子分类器进行第一训练,得到包括候选卷积神经网络、初始第一子分类器和候选第二子分类器的中间候选神经网络;第二训练单元,用于固定所述中间候选神经网络中所述候选卷积神经网络和所述候选第二子分类器的网络参数, 将所述候选第二子分类器的输出结果作为监督信息,并基于第二样本图像对所述初始第一子分类器进行第二训练,得到包括所述候选卷积神经网络、候选第一子分类器和所述候选第二子分类器的所述候选神经网络。In some embodiments, the initial classifier includes a first initial sub-classifier, configured to first classify the characters corresponding to the feature sequence of the sample image, so as to determine the characters corresponding to the feature sequence of the sample image The non-blank character, and the second initial sub-classifier are used to carry out the second classification to the non-blank character, to determine the category of the non-blank character; the training module includes: a first training unit, for based on The first sample image performs the first training on the initial candidate convolutional neural network and the initial second sub-classifier to obtain intermediate candidates including the candidate convolutional neural network, the initial first sub-classifier and the candidate second sub-classifier Neural network; the second training unit is used to fix the network parameters of the candidate convolutional neural network and the candidate second sub-classifier in the intermediate candidate neural network, and output the output result of the candidate second sub-classifier As supervision information, and based on the second sample image, the initial first sub-classifier is trained for the second time to obtain the candidate convolutional neural network, the candidate first sub-classifier and the candidate second sub-classifier. The candidate neural network.
在一些实施例中,本公开实施例提供的装置具有的功能或包含的模块可以用于执行上文方法实施例描述的方法,其具体实现可以参照上文方法实施例的描述,为了简洁,这里不再赘述。In some embodiments, the functions or modules included in the device provided by the embodiments of the present disclosure can be used to execute the methods described in the method embodiments above, and its specific implementation can refer to the description of the method embodiments above. For brevity, here No longer.
本说明书实施例还提供一种电子设备,其至少包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,处理器执行所述程序时实现前述任一实施例所述的方法。The embodiment of this specification also provides an electronic device, which at least includes a memory, a processor, and a computer program stored in the memory and operable on the processor, wherein, when the processor executes the program, the computer program described in any of the preceding embodiments is implemented. described method.
图10示出了本说明书实施例所提供的一种更为具体的电子设备硬件结构示意图,该设备可以包括:处理器1001、存储器1002、输入/输出接口1003、通信接口1004和总线1005。其中处理器1001、存储器1002、输入/输出接口1003和通信接口1004通过总线1005实现彼此之间在设备内部的通信连接。FIG. 10 shows a schematic diagram of a more specific hardware structure of an electronic device provided by the embodiment of this specification. The device may include: a processor 1001 , a memory 1002 , an input/output interface 1003 , a communication interface 1004 and a bus 1005 . The processor 1001 , the memory 1002 , the input/output interface 1003 and the communication interface 1004 are connected to each other within the device through the bus 1005 .
处理器1001可以采用通用的CPU(Central Processing Unit,中央处理器)、微处理器、应用专用集成电路(Application Specific Integrated Circuit,ASIC)、或者一个或多个集成电路等方式实现,用于执行相关程序,以实现本说明书实施例所提供的字符识别方法或神经网络训练方法。处理器1001还可以包括显卡,所述显卡可以是Nvidia titan X显卡或者1080Ti显卡等。The processor 1001 can be implemented by a general-purpose CPU (Central Processing Unit, central processing unit), a microprocessor, an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute related program to implement the character recognition method or the neural network training method provided in the embodiment of this specification. The processor 1001 may also include a graphics card, and the graphics card may be an Nvidia titan X graphics card or a 1080Ti graphics card.
存储器1002可以采用ROM(Read Only Memory,只读存储器)、RAM(Random Access Memory,随机存取存储器)、静态存储设备,动态存储设备等形式实现。存储器1002可以存储操作系统和其他应用程序,在通过软件或者固件来实现本说明书实施例所提供的技术方案时,相关的程序代码保存在存储器1002中,并由处理器1001来调用执行。The memory 1002 can be implemented in the form of ROM (Read Only Memory, read-only memory), RAM (Random Access Memory, random access memory), static storage device, dynamic storage device, etc. The memory 1002 can store an operating system and other application programs. When implementing the technical solutions provided by the embodiments of this specification through software or firmware, the relevant program codes are stored in the memory 1002 and invoked by the processor 1001 for execution.
输入/输出接口1003用于连接输入/输出模块,以实现信息输入及输出。输入/输出模块可以作为组件配置在设备中(图中未示出),也可以外接于设备以提供相应功能。其中输入模块可以包括键盘、鼠标、触摸屏、麦克风、各类传感器等,输出模块可以包括显示器、扬声器、振动器、指示灯等。The input/output interface 1003 is used to connect the input/output module to realize information input and output. The input/output module can be configured in the device as a component (not shown in the figure), or can be externally connected to the device to provide corresponding functions. The input module may include a keyboard, mouse, touch screen, microphone, various sensors, etc., and the output module may include a display, a speaker, a vibrator, an indicator light, and the like.
通信接口1004用于连接通信模块(图中未示出),以向其他设备的通信模块发送本设备的信息,或者接收其他设备的通信模块发送的信息。其中通信模块可以通过有 线方式(例如USB、网线等)实现通信,也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信。The communication interface 1004 is used to connect with a communication module (not shown in the figure), so as to send the information of the own device to the communication module of other devices, or receive the information sent by the communication modules of other devices. The communication module can realize communication through wired methods (such as USB, network cable, etc.), and can also realize communication through wireless methods (such as mobile network, WIFI, Bluetooth, etc.).
总线1005包括一通路,在设备的各个组件(例如处理器1001、存储器1002、输入/输出接口1003和通信接口1004)之间传输信息。 Bus 1005 includes a path for transferring information between various components of the device (eg, processor 1001, memory 1002, input/output interface 1003, and communication interface 1004).
需要说明的是,尽管上述设备仅示出了处理器1001、存储器1002、输入/输出接口1003、通信接口1004以及总线1005,但是在具体实施过程中,该设备还可以包括实现正常运行所必需的其他组件。此外,本领域的技术人员可以理解的是,上述设备中也可以仅包含实现本说明书实施例方案所必需的组件,而不必包含图中所示的全部组件。It should be noted that although the above device only shows the processor 1001, the memory 1002, the input/output interface 1003, the communication interface 1004, and the bus 1005, in the specific implementation process, the device may also include other components. In addition, those skilled in the art can understand that the above-mentioned device may only include components necessary to implement the solutions of the embodiments of this specification, and does not necessarily include all the components shown in the figure.
本公开实施例还提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现前述任一实施例所述的方法。An embodiment of the present disclosure further provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method described in any one of the foregoing embodiments is implemented.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer-readable media, including both permanent and non-permanent, removable and non-removable media, can be implemented by any method or technology for storage of information. Information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Flash memory or other memory technology, Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cartridge, tape magnetic disk storage or other magnetic storage device or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer-readable media excludes transitory computer-readable media, such as modulated data signals and carrier waves.
通过以上的实施方式的描述可知,本领域的技术人员可以清楚地了解到本说明书实施例可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解,本说明书实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台电子设备(可以是个人计算机,服务器,或者网络设备等)执行本说明书实施例各个实施例或者实施例的某些部分所述的方法。It can be known from the above description of the implementation manners that those skilled in the art can clearly understand that the embodiments of this specification can be implemented by means of software plus a necessary general hardware platform. Based on this understanding, the essence of the technical solutions of the embodiments of this specification or the part that contributes to the prior art can be embodied in the form of software products, and the computer software products can be stored in storage media, such as ROM/RAM, A magnetic disk, an optical disk, etc., include several instructions to make an electronic device (which may be a personal computer, a server, or a network device, etc.) execute the methods described in various embodiments or some parts of the embodiments of this specification.
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机,计算机的具体形式可以是个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件收发设备、游戏控制台、平板计算机、可穿戴设 备或者这些设备中的任意几种设备的组合。The systems, devices, modules, or units described in the above embodiments can be specifically implemented by computer chips or entities, or by products with certain functions. A typical implementing device is a computer, which may take the form of a personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media player, navigation device, e-mail device, game control device, etc. desktops, tablets, wearables, or any combination of these.
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,在实施本说明书实施例方案时可以把各模块的功能在同一个或多个软件和/或硬件中实现。也可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。Each embodiment in this specification is described in a progressive manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, as for the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for relevant parts, please refer to part of the description of the method embodiment. The device embodiments described above are only illustrative, and the modules described as separate components may or may not be physically separated, and the functions of each module may be integrated in the same or multiple software and/or hardware implementations. Part or all of the modules can also be selected according to actual needs to achieve the purpose of the solution of this embodiment. It can be understood and implemented by those skilled in the art without creative effort.
以上所述仅是本说明书实施例的具体实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本说明书实施例原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本说明书实施例的保护范围。The above is only the specific implementation of the embodiment of this specification. It should be pointed out that for those of ordinary skill in the art, without departing from the principle of the embodiment of this specification, some improvements and modifications can also be made. These Improvements and modifications should also be regarded as the scope of protection of the embodiments of this specification.

Claims (16)

  1. 一种字符识别方法,包括:A character recognition method, comprising:
    通过目标神经网络的卷积神经网络对目标图像进行特征提取,得到所述目标图像的特征序列;Carrying out feature extraction to the target image through the convolutional neural network of the target neural network to obtain a feature sequence of the target image;
    通过所述目标神经网络的分类器对所述特征序列对应的字符进行分类,得到所述特征序列对应的字符的类别;Classifying the character corresponding to the feature sequence through the classifier of the target neural network to obtain the category of the character corresponding to the feature sequence;
    基于所述字符的类别对所述目标图像进行字符识别;performing character recognition on the target image based on the category of the character;
    其中,通过选取所述卷积神经网络的超参数,基于所述目标神经网络的约束条件,从多组候选超参数中确定所述卷积神经网络。Wherein, by selecting hyperparameters of the convolutional neural network, the convolutional neural network is determined from multiple groups of candidate hyperparameters based on constraints of the target neural network.
  2. 根据权利要求1所述的方法,其特征在于,所述对所述特征序列对应的字符进行分类,得到所述特征序列对应的字符的类别,包括:The method according to claim 1, wherein the classifying the character corresponding to the feature sequence to obtain the category of the character corresponding to the feature sequence comprises:
    对所述特征序列对应的字符进行第一分类,以确定所述特征序列对应的字符中的非空白符;Performing a first classification on the characters corresponding to the feature sequence to determine non-blank characters in the characters corresponding to the feature sequence;
    对所述非空白符进行第二分类,以确定所述非空白符的类别。A second classification is performed on the non-blank characters to determine the category of the non-blank characters.
  3. 根据权利要求2所述的方法,其特征在于,所述对所述特征序列对应的字符进行第一分类,以确定所述特征序列对应的字符中的非空白符,包括:The method according to claim 2, wherein the first classifying the characters corresponding to the feature sequence to determine non-blank characters in the characters corresponding to the feature sequence includes:
    确定所述特征序列对应的字符属于非空白符的概率;Determine the probability that the character corresponding to the feature sequence belongs to a non-blank character;
    将概率大于预设概率阈值的字符确定为非空白符。A character whose probability is greater than a preset probability threshold is determined as a non-blank character.
  4. 根据权利要求1-3任一所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-3, wherein the method further comprises:
    从所述超参数的搜索空间中搜索所述多组候选超参数;searching the sets of candidate hyperparameters from the hyperparameter search space;
    对于每组候选超参数:For each set of candidate hyperparameters:
    基于所述组候选超参数建立初始候选卷积神经网络;establishing an initial candidate convolutional neural network based on the set of candidate hyperparameters;
    对包括所述初始候选卷积神经网络以及初始分类器的初始候选神经网络进行训练,得到候选神经网络,所述候选神经网络包括候选卷积神经网络和候选分类器;Training the initial candidate neural network comprising the initial candidate convolutional neural network and the initial classifier to obtain a candidate neural network, the candidate neural network comprising a candidate convolutional neural network and a candidate classifier;
    在所述候选神经网络满足所述约束条件的情况下,将所述候选卷积神经网络确定为所述目标神经网络的所述卷积神经网络,并将所述候选分类器确定为所述目标神经网络的所述分类器。When the candidate neural network satisfies the constraint condition, determine the candidate convolutional neural network as the convolutional neural network of the target neural network, and determine the candidate classifier as the target The classifier of the neural network.
  5. 根据权利要求4所述的方法,其特征在于,所述分类器包括第一子分类器以及第二子分类器;所述第一子分类器用于对所述特征序列对应的字符进行第一分类,以确定所述特征序列对应的字符中的非空白符;所述第二子分类器用于对所述非空白符进行第二分类,以确定所述非空白符的类别;所述对包括所述初始候选卷积神经网络以及初始分类器的初始候选神经网络进行训练,包括:The method according to claim 4, wherein the classifier comprises a first sub-classifier and a second sub-classifier; the first sub-classifier is used to first classify characters corresponding to the feature sequence , to determine the non-blank characters in the character corresponding to the feature sequence; the second sub-classifier is used to perform a second classification on the non-blank characters, so as to determine the category of the non-blank characters; the pair includes the The initial candidate convolutional neural network and the initial candidate neural network of the initial classifier are trained, including:
    基于第一样本图像对所述初始候选卷积神经网络和初始第二子分类器进行第一训练,得到包括所述候选卷积神经网络、初始第一子分类器和候选第二子分类器的中间候 选神经网络;Based on the first sample image, the initial candidate convolutional neural network and the initial second sub-classifier are first trained to obtain the candidate convolutional neural network, the initial first sub-classifier and the candidate second sub-classifier. The intermediate candidate neural network of ;
    固定所述中间候选神经网络中所述候选卷积神经网络和所述候选第二子分类器的网络参数,将所述候选第二子分类器的输出结果作为监督信息,并基于第二样本图像对所述初始第一子分类器进行第二训练,得到包括所述候选卷积神经网络、候选第一子分类器和所述候选第二子分类器的所述候选神经网络。Fixing the network parameters of the candidate convolutional neural network and the candidate second sub-classifier in the intermediate candidate neural network, using the output of the candidate second sub-classifier as supervisory information, and based on the second sample image Performing second training on the initial first sub-classifier to obtain the candidate neural network including the candidate convolutional neural network, the candidate first sub-classifier, and the candidate second sub-classifier.
  6. 根据权利要求4或5所述的方法,其特征在于,所述约束条件包括所述目标神经网络进行字符识别的时长上限,所述方法还包括:The method according to claim 4 or 5, wherein the constraints include an upper limit on the character recognition time of the target neural network, and the method also includes:
    获取所述候选神经网络对测试图像进行字符识别所花费的时长;Obtain the time spent by the candidate neural network for character recognition on the test image;
    在所述时长小于所述时长上限的情况下,确定所述候选神经网络满足所述约束条件。If the duration is less than the upper limit of the duration, it is determined that the candidate neural network satisfies the constraint condition.
  7. 根据权利要求4或5所述的方法,其特征在于,所述约束条件包括所述目标神经网络进行字符识别的准确度下限;所述方法还包括:The method according to claim 4 or 5, wherein the constraints include a lower limit of accuracy for character recognition by the target neural network; the method also includes:
    获取所述候选神经网络对测试图像进行字符识别的识别准确度;Acquiring the recognition accuracy of character recognition performed by the candidate neural network on the test image;
    在所述识别准确度高于所述准确度下限的情况下,确定所述候选神经网络满足所述约束条件。If the recognition accuracy is higher than the lower limit of accuracy, it is determined that the candidate neural network satisfies the constraint condition.
  8. 根据权利要求1-7任一所述的方法,其特征在于,所述超参数包括以下至少一者:所述卷积神经网络的深度、通道数、卷积核的尺寸、下采样次数、下采样方式、所述卷积神经网络中进行下采样的网络层所处的位置、输入所述卷积神经网络的目标图像的分辨率。The method according to any one of claims 1-7, wherein the hyperparameters include at least one of the following: the depth of the convolutional neural network, the number of channels, the size of the convolution kernel, the number of times of downsampling, the number of downsampling The sampling method, the location of the down-sampled network layer in the convolutional neural network, and the resolution of the target image input to the convolutional neural network.
  9. 一种神经网络训练方法,包括:A neural network training method, comprising:
    采用样本图像,分别对多个初始候选神经网络中的每一个进行训练,得到多个候选神经网络,其中每个初始候选神经网络包括:Using sample images, each of the multiple initial candidate neural networks is trained separately to obtain multiple candidate neural networks, wherein each initial candidate neural network includes:
    初始候选卷积神经网络,用于对所述样本图像进行特征提取,得到所述样本图像的特征序列;各个初始候选卷积神经网络的候选超参数至少部分不同;The initial candidate convolutional neural network is used to perform feature extraction on the sample image to obtain a feature sequence of the sample image; the candidate hyperparameters of each initial candidate convolutional neural network are at least partially different;
    初始分类器,用于对所述特征序列对应的字符进行分类,得到所述特征序列对应的字符的类别,所述字符的类别用于对所述样本图像进行字符识别;an initial classifier, configured to classify the characters corresponding to the feature sequence to obtain the category of the character corresponding to the feature sequence, and the category of the character is used to perform character recognition on the sample image;
    从所述多个候选神经网络中筛选出满足约束条件的目标神经网络。A target neural network satisfying the constraint condition is selected from the plurality of candidate neural networks.
  10. 根据权利要求9所述的方法,其特征在于,所述初始分类器包括第一初始子分类器以及第二初始子分类器;所述第一初始子分类器用于对所述特征序列对应的字符进行第一分类,以确定所述样本图像的特征序列对应的字符中的非空白符;所述第二初始子分类器用于对所述非空白符进行第二分类,以确定所述非空白符的类别;所述采用样本图像,分别对所述多个初始候选神经网络中的每一个进行训练,包括:The method according to claim 9, wherein the initial classifier comprises a first initial sub-classifier and a second initial sub-classifier; the first initial sub-classifier is used to characterize characters corresponding to the feature sequence performing a first classification to determine the non-blank characters in the character corresponding to the feature sequence of the sample image; the second initial subclassifier is used to perform a second classification on the non-blank characters to determine the non-blank characters The category of; said sample image is used to train each of the plurality of initial candidate neural networks respectively, including:
    基于第一样本图像对所述初始候选卷积神经网络和初始第二子分类器进行第一训练,得到包括候选卷积神经网络、初始第一子分类器和候选第二子分类器的中间候选神经网络;Based on the first sample image, the initial candidate convolutional neural network and the initial second sub-classifier are first trained to obtain the intermediate candidate convolutional neural network, the initial first sub-classifier and the candidate second sub-classifier. candidate neural network;
    固定所述中间候选神经网络中所述候选卷积神经网络和所述候选第二子分类器的网络参数,将所述候选第二子分类器的输出结果作为监督信息,并基于第二样本图像对所述初始第一子分类器进行第二训练,得到包括所述候选卷积神经网络、候选第一子分类器和所述候选第二子分类器的所述候选神经网络。Fixing the network parameters of the candidate convolutional neural network and the candidate second sub-classifier in the intermediate candidate neural network, using the output of the candidate second sub-classifier as supervisory information, and based on the second sample image Performing second training on the initial first sub-classifier to obtain the candidate neural network including the candidate convolutional neural network, the candidate first sub-classifier, and the candidate second sub-classifier.
  11. 一种神经网络,包括:A neural network comprising:
    卷积神经网络,用于对目标图像进行特征提取,得到所述目标图像的特征序列;A convolutional neural network is used for feature extraction of the target image to obtain a feature sequence of the target image;
    分类器,用于对所述特征序列对应的字符进行分类,得到所述特征序列对应的字符的类别,其中所述字符的类别用于对所述目标图像进行字符识别;A classifier, configured to classify the characters corresponding to the feature sequence to obtain the category of the character corresponding to the feature sequence, wherein the category of the character is used to perform character recognition on the target image;
    其中,通过选取所述卷积神经网络的超参数,基于所述神经网络的约束条件,从多组候选超参数中确定所述卷积神经网络。Wherein, by selecting hyperparameters of the convolutional neural network, the convolutional neural network is determined from multiple groups of candidate hyperparameters based on constraints of the neural network.
  12. 一种字符识别装置,包括:A character recognition device, comprising:
    特征提取模块,用于通过目标神经网络的卷积神经网络对目标图像进行特征提取,得到所述目标图像的特征序列;The feature extraction module is used to extract the features of the target image through the convolutional neural network of the target neural network to obtain the feature sequence of the target image;
    分类模块,用于通过所述目标神经网络的分类器对所述特征序列对应的字符进行分类,得到所述特征序列对应的字符的类别;A classification module, configured to classify the character corresponding to the feature sequence through the classifier of the target neural network, to obtain the category of the character corresponding to the feature sequence;
    识别模块,用于基于所述字符的类别对所述目标图像进行字符识别;a recognition module, configured to perform character recognition on the target image based on the type of the character;
    其中,通过选取所述卷积神经网络的超参数,基于所述目标神经网络的约束条件,从多组候选超参数中确定所述卷积神经网络。Wherein, by selecting hyperparameters of the convolutional neural network, the convolutional neural network is determined from multiple groups of candidate hyperparameters based on constraints of the target neural network.
  13. 一种神经网络训练装置,包括:A neural network training device, comprising:
    训练模块,用于采用样本图像,分别对多个初始候选神经网络中的每一个进行训练,得到多个候选神经网络,其中每个初始候选神经网络包括:The training module is used to use the sample image to train each of the multiple initial candidate neural networks respectively to obtain multiple candidate neural networks, wherein each initial candidate neural network includes:
    初始候选卷积神经网络,用于对所述样本图像进行特征提取,得到所述样本图像的特征序列;各个初始候选卷积神经网络的候选超参数至少部分不同;The initial candidate convolutional neural network is used to perform feature extraction on the sample image to obtain a feature sequence of the sample image; the candidate hyperparameters of each initial candidate convolutional neural network are at least partially different;
    初始分类器,用于对所述特征序列对应的字符进行分类,得到所述特征序列对应的字符的类别,所述字符的类别用于对所述样本图像进行字符识别;an initial classifier, configured to classify the characters corresponding to the feature sequence to obtain the category of the character corresponding to the feature sequence, and the category of the character is used to perform character recognition on the sample image;
    筛选模块,用于从所述多个候选神经网络中筛选出满足约束条件的目标神经网络。A screening module, configured to screen out a target neural network satisfying constraints from the plurality of candidate neural networks.
  14. 一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现权利要求1至10任意一项所述的方法。A computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method according to any one of claims 1 to 10 is implemented.
  15. 一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现权利要求1至10任意一项所述的方法。An electronic device, comprising a memory, a processor, and a computer program stored in the memory and operable on the processor, the processor implementing the method according to any one of claims 1 to 10 when executing the program.
  16. 一种计算机程序,所述计算机程序包括计算机可读代码,所述计算机可读代码被处理器执行时实现权利要求1至10任意一项所述的方法。A computer program, the computer program comprising computer readable codes, when the computer readable codes are executed by a processor, the method according to any one of claims 1 to 10 is realized.
PCT/CN2022/086989 2021-06-28 2022-04-15 Character recognition method and apparatus, neural network training method and apparatus, and neural network, storage medium and electronic device WO2023273516A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110719983.4A CN113298188A (en) 2021-06-28 2021-06-28 Character recognition and neural network training method and device
CN202110719983.4 2021-06-28

Publications (1)

Publication Number Publication Date
WO2023273516A1 true WO2023273516A1 (en) 2023-01-05

Family

ID=77329795

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/086989 WO2023273516A1 (en) 2021-06-28 2022-04-15 Character recognition method and apparatus, neural network training method and apparatus, and neural network, storage medium and electronic device

Country Status (2)

Country Link
CN (1) CN113298188A (en)
WO (1) WO2023273516A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298188A (en) * 2021-06-28 2021-08-24 深圳市商汤科技有限公司 Character recognition and neural network training method and device
CN115394288B (en) * 2022-10-28 2023-01-24 成都爱维译科技有限公司 Language identification method and system for civil aviation multi-language radio land-air conversation

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019057087A (en) * 2017-09-20 2019-04-11 株式会社バーズ情報科学研究所 Method for recognizing characters and system for recognizing characters
CN109711548A (en) * 2018-12-26 2019-05-03 歌尔股份有限公司 Choosing method, application method, device and the electronic equipment of hyper parameter
CN110543560A (en) * 2019-08-08 2019-12-06 厦门市美亚柏科信息股份有限公司 Long text classification and identification method, device and medium based on convolutional neural network
CN110555439A (en) * 2019-09-04 2019-12-10 北京迈格威科技有限公司 identification recognition method, training method and device of model thereof and electronic system
CN112288018A (en) * 2020-10-30 2021-01-29 北京市商汤科技开发有限公司 Training method of character recognition network, character recognition method and device
CN112800972A (en) * 2021-01-29 2021-05-14 北京市商汤科技开发有限公司 Character recognition method and device, and storage medium
CN113298188A (en) * 2021-06-28 2021-08-24 深圳市商汤科技有限公司 Character recognition and neural network training method and device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108121984B (en) * 2016-11-30 2021-09-21 杭州海康威视数字技术股份有限公司 Character recognition method and device
CN109657135B (en) * 2018-11-13 2023-06-23 华南理工大学 Scholars user portrait information extraction method and model based on neural network
CN110689658A (en) * 2019-10-08 2020-01-14 北京邮电大学 Taxi bill identification method and system based on deep learning
CN110942063B (en) * 2019-11-21 2023-04-07 望海康信(北京)科技股份公司 Certificate text information acquisition method and device and electronic equipment
CN112287934A (en) * 2020-08-12 2021-01-29 北京京东尚科信息技术有限公司 Method and device for recognizing characters and obtaining character image feature extraction model

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019057087A (en) * 2017-09-20 2019-04-11 株式会社バーズ情報科学研究所 Method for recognizing characters and system for recognizing characters
CN109711548A (en) * 2018-12-26 2019-05-03 歌尔股份有限公司 Choosing method, application method, device and the electronic equipment of hyper parameter
CN110543560A (en) * 2019-08-08 2019-12-06 厦门市美亚柏科信息股份有限公司 Long text classification and identification method, device and medium based on convolutional neural network
CN110555439A (en) * 2019-09-04 2019-12-10 北京迈格威科技有限公司 identification recognition method, training method and device of model thereof and electronic system
CN112288018A (en) * 2020-10-30 2021-01-29 北京市商汤科技开发有限公司 Training method of character recognition network, character recognition method and device
CN112800972A (en) * 2021-01-29 2021-05-14 北京市商汤科技开发有限公司 Character recognition method and device, and storage medium
CN113298188A (en) * 2021-06-28 2021-08-24 深圳市商汤科技有限公司 Character recognition and neural network training method and device

Also Published As

Publication number Publication date
CN113298188A (en) 2021-08-24

Similar Documents

Publication Publication Date Title
WO2023273516A1 (en) Character recognition method and apparatus, neural network training method and apparatus, and neural network, storage medium and electronic device
CN107992596B (en) Text clustering method, text clustering device, server and storage medium
WO2017167046A1 (en) Character recognition method and device
JP6188976B2 (en) Method, apparatus and computer-readable recording medium for detecting text contained in an image
CN109409288B (en) Image processing method, image processing device, electronic equipment and storage medium
WO2020133442A1 (en) Text recognition method and terminal device
Shen et al. Vehicle detection in aerial images based on lightweight deep convolutional network and generative adversarial network
CN112861842A (en) Case text recognition method based on OCR and electronic equipment
US20240013563A1 (en) System and method to extract information from unstructured image documents
US20130268476A1 (en) Method and system for classification of moving objects and user authoring of new object classes
CN112364851A (en) Automatic modulation recognition method and device, electronic equipment and storage medium
Manjari et al. QEST: Quantized and efficient scene text detector using deep learning
CN115171125A (en) Data anomaly detection method
CN111507250B (en) Image recognition method, device and storage medium
CN111553442B (en) Optimization method and system for classifier chain tag sequence
Ghosh et al. A light-weight natural scene text detection and recognition system
CN111898570A (en) Method for recognizing text in image based on bidirectional feature pyramid network
CN115082598B (en) Text image generation, training, text image processing method and electronic equipment
CN110825874A (en) Chinese text classification method and device and computer readable storage medium
Lei et al. Noise-robust wagon text extraction based on defect-restore generative adversarial network
Nag et al. Offline extraction of Indic regional language from natural scene image using text segmentation and deep convolutional sequence
CN113344197A (en) Training method of recognition model, service execution method and device
Yang et al. Lightweight lane line detection based on learnable cluster segmentation with self‐attention mechanism
Luo et al. Binary residual feature pyramid network: An improved feature fusion module based on double‐channel residual pyramid structure for autonomous detection algorithm
CN116092105B (en) Method and device for analyzing table structure

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE