CN110942004A

CN110942004A - Handwriting recognition method and device based on neural network model and electronic equipment

Info

Publication number: CN110942004A
Application number: CN201911143048.7A
Authority: CN
Inventors: 刘俊仕
Original assignee: Shenzhen Chase Technology Co Ltd
Current assignee: Shenzhen Chase Technology Co Ltd; Shenzhen Zhuiyi Technology Co Ltd
Priority date: 2019-11-20
Filing date: 2019-11-20
Publication date: 2020-03-31

Abstract

The embodiment of the application discloses a handwriting recognition method and device based on a neural network model and electronic equipment, and relates to the technical field of image recognition. The neural network model comprises a first neural network and a second neural network, and the method comprises the following steps: preprocessing an image to be recognized to obtain at least one text line, wherein the image to be recognized comprises a handwritten font; inputting a text line into a first neural network to obtain at least one segmented character; and inputting at least one segmented character into the second neural network to output corresponding text after the handwriting font is recognized. According to the embodiment of the application, the handwritten fonts are respectively segmented through the two neural networks, and the segmented characters are recognized, so that the segmentation result and the recognition result can be well controlled, and the segmentation and recognition accuracy and the network training efficiency are improved.

Description

Handwriting recognition method and device based on neural network model and electronic equipment

Technical Field

The embodiment of the application relates to the technical field of image recognition, in particular to a handwriting recognition method and device based on a neural network model and electronic equipment.

Background

Chinese handwriting recognition has long been used in taking pictures of documents, checks, forms, certificates, postal envelopes, notes, manuscripts, and the like. The existing handwritten Chinese character recognition framework is mostly based on the traditional preprocessing, feature extraction and classifier, and with the rise of deep learning, the effect which can be obtained by the handwriting recognition method based on the deep learning generally precedes the traditional method, but the recognition accuracy rate is still low when continuous handwritten Chinese characters are recognized.

Disclosure of Invention

The embodiment of the application provides a handwriting recognition method and device based on a neural network model, an electronic device and a storage medium, so as to overcome the defects.

In a first aspect, an embodiment of the present application provides a handwriting recognition method based on a neural network model, where the neural network model includes a first neural network and a second neural network, the method includes: preprocessing the image to be recognized to obtain at least one text line, wherein the image to be recognized comprises a handwritten font; inputting the text line into the first neural network to obtain at least one segmented character; and inputting the at least one segmented character into the second neural network to output corresponding text after the handwriting font is recognized.

Optionally, the inputting the text line into the first neural network to obtain at least one segmented character includes: inputting the text line into a first neural network, and obtaining estimated position information of at least one character and segmentation labels corresponding to the estimated position information, wherein the segmentation labels comprise divisible labels; determining a partitionable position according to the partitionable label; and performing character segmentation on the text line region according to the segmentable position to obtain at least one segmented character.

Optionally, after the inputting the at least one segmented character into the second neural network to output the text after recognizing the handwritten font, the method includes: obtaining an evaluation result of the text of the handwriting font recognized by the user based on the output, wherein the evaluation result comprises an error character, a correct character corresponding to the error character and correct position information; taking the correct position information as real position information, and acquiring a first loss function value corresponding to the error character, wherein the first loss function value corresponds to the first neural network and is used for measuring the error between the output of the first neural network corresponding to the error character and the real position information corresponding to the error character; taking the correct character as a real character, and acquiring a second loss function value corresponding to the wrong character, wherein the second loss function value corresponds to the second neural network and is used for measuring the error between the output of the second neural network corresponding to the wrong character and the real character corresponding to the wrong character; comparing the first loss function value and the second loss function value with a preset threshold value respectively, determining the loss function value exceeding the preset threshold value as a target loss function value, and determining a neural network corresponding to the target loss function value as a target neural network; and adjusting the network parameters of the target neural network based on the target loss function value, and using the adjusted target neural network for next handwriting recognition.

Optionally, the comparing the first loss function value and the second loss function value with a preset threshold, determining a loss function value exceeding the preset threshold as a target loss function value, and determining a neural network corresponding to the target loss function value as a target neural network, includes: if the first loss function value exceeds a preset threshold value, determining the first loss function value as a target loss function value, and determining the first neural network as a target neural network for adjusting the first neural network corresponding to the first loss function value; and if the second loss function value exceeds a preset threshold value, determining the second loss function value as a target loss function value, and determining the second neural network as a target neural network so as to adjust the second neural network corresponding to the second loss function value.

Optionally, obtaining a set of training samples comprises: collecting sample visual model driving parameters and sample audio information; and aligning the sample visual model driving parameters with the sample audio information according to the time stamp information of the sample visual model driving parameters and the time stamp information of the sample audio information.

Optionally, the method further comprises: training the neural network model by using a training handwritten text, wherein the training handwritten text comprises a training text line, each real character in the training text line, real position information of each real character and a corresponding real segmentation label; the training the neural network model using training handwritten text comprises: inputting the training text line into the first neural network to obtain at least one segmented character, taking the real position information of each character in the training text line and the corresponding real segmentation label as the expected output of the first neural network, and obtaining a first loss function according to the expected output and the actual output of the first neural network; inputting the at least one segmented character into the second neural network, taking each real character in the training text line as expected output, and acquiring a second loss function according to the expected output and the actual output of the second neural network; when at least one of the first loss function and the second loss function does not meet a preset convergence condition or the iteration frequency does not exceed a preset frequency, adjusting model parameters of the neural network model according to a judgment result of whether the first loss function and the second loss function meet the preset convergence condition, and acquiring a next training text line to be input into the first neural network for next training; and when the first loss function and the second loss function both meet a preset convergence condition and the iteration times exceed the preset times, stopping training the neural network model and obtaining the trained neural network model for handwriting recognition.

Optionally, the adjusting the model parameters of the neural network model according to the determination result of whether the first loss function and the second loss function satisfy the preset convergence condition includes: if the first loss function does not meet a preset convergence condition, adjusting parameters of the first neural network; and if the second loss function does not meet the preset convergence condition, adjusting the parameters of the second neural network.

Optionally, after the inputting the at least one segmented character into the second neural network to output the text after recognizing the handwritten font, the method further comprises: and inputting the recognized text into a preset language model to obtain a corrected text, wherein the preset language model is used for correcting the expression rationality of the recognized text.

In a second aspect, an embodiment of the present application provides a handwriting recognition apparatus based on a neural network model, where the neural network model includes a first neural network and a second neural network, the apparatus includes: the preprocessing module is used for preprocessing the image to be recognized to obtain at least one text line, and the image to be recognized comprises a handwritten font; a segmentation module for inputting the text line into the first neural network to obtain at least one segmented character; and the recognition module is used for inputting the at least one segmented character into the second neural network so as to output the text after the handwriting font is recognized.

Optionally, the segmentation module comprises: the device comprises a text line input sub-module, a position determination sub-module and a character cutting sub-module, wherein: the text line input submodule is used for inputting the text line into a first neural network to obtain estimated position information of at least one character and a segmentation label corresponding to the estimated position information, and the segmentation label comprises a segmentable label; the position determining submodule is used for determining a partitionable position according to the partitionable labels; and the character cutting submodule is used for carrying out character cutting on the text line region according to the divisible positions to obtain at least one cut character.

Optionally, the handwriting recognition apparatus based on neural network model further includes: the system comprises an evaluation result acquisition module, a first loss acquisition module, a second loss acquisition module, a target network determination module and a network parameter adjustment module, wherein: the evaluation result acquisition module is used for acquiring an evaluation result of the text of which the handwriting font is recognized by the user based on the output, wherein the evaluation result comprises an error character, a correct character corresponding to the error character and correct position information; a first loss obtaining module, configured to obtain a first loss function value corresponding to the error character by using the correct position information as real position information, where the first loss function value corresponds to the first neural network, and is used to measure an error between an output of the first neural network corresponding to the error character and the real position information corresponding to the error character; a second loss obtaining module, configured to obtain a second loss function value corresponding to the error character by using the correct character as a real character, where the second loss function value corresponds to the second neural network and is used to measure an error between an output of the second neural network corresponding to the error character and the real character corresponding to the error character; a target network determining module, configured to compare the first loss function value and the second loss function value with a preset threshold, determine a loss function value exceeding the preset threshold as a target loss function value, and determine a neural network corresponding to the target loss function value as a target neural network; and the network parameter adjusting module is used for adjusting the network parameters of the target neural network based on the target loss function value and using the adjusted target neural network for next handwriting recognition.

Optionally, the target network determining module includes: a first network determination submodule and a second network determination submodule, wherein: a first network determining submodule, configured to determine the first loss function value as a target loss function value if the first loss function value exceeds a preset threshold, and determine the first neural network as a target neural network, so as to adjust the first neural network corresponding to the first loss function value; and the second network determining submodule is used for determining the second loss function value as a target loss function value if the second loss function value exceeds a preset threshold value, and determining the second neural network as a target neural network so as to adjust the second neural network corresponding to the second loss function value.

Optionally, the handwriting recognition apparatus based on neural network model further includes: and the neural network training module is used for training the neural network model by using a training handwritten text, wherein the training handwritten text comprises a training text line, each real character in the training text line, real position information of each real character and a corresponding real segmentation label.

Optionally, the neural network training module includes: the device comprises a first input submodule, a second input submodule, a first judgment submodule and a second judgment submodule, wherein: the first input submodule is used for inputting the training text line into the first neural network to obtain at least one segmented character, taking the real position information of each character in the training text line and the corresponding real segmentation label as the expected output of the first neural network, and acquiring a first loss function according to the expected output and the actual output of the first neural network; the second input submodule is used for inputting the at least one segmented character into the second neural network, taking each real character in the training text line as expected output, and acquiring a second loss function according to the expected output and the actual output of the second neural network; a first judging submodule, configured to, when at least one of the first loss function and the second loss function does not satisfy a preset convergence condition or the iteration number does not exceed a preset number, adjust a model parameter of the neural network model according to a judgment result of whether the first loss function and the second loss function satisfy the preset convergence condition, and obtain a next training text line, and input the next training text line into the first neural network for a next training; and the second judgment submodule is used for stopping the training of the neural network model and obtaining the trained neural network model for handwriting recognition when the first loss function and the second loss function both meet a preset convergence condition and the iteration times exceed the preset times.

Optionally, the handwriting recognition apparatus based on neural network model further includes: and the recognized text correction module is used for inputting the recognized text into a preset language model to obtain a corrected text, and the preset language model is used for correcting the expression reasonableness of the recognized text.

In a third aspect, an embodiment of the present application provides an electronic device, which may include: a memory; one or more processors coupled with the memory; one or more programs, wherein the one or more application programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the method of the first aspect as described above.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having program code stored therein, the program code being invoked by a processor to perform the method according to the first aspect.

The embodiment of the application provides a handwriting recognition method and device based on a neural network model, electronic equipment and a storage medium, wherein the neural network model comprises a first neural network and a second neural network, at least one text line is obtained by preprocessing an image to be recognized comprising a handwriting font, then the text line is input into the first neural network to obtain at least one segmented character, and finally the at least one segmented character is input into the second neural network to output a text corresponding to the recognized handwriting font. Therefore, the handwritten font is respectively segmented through the two neural networks, namely the first neural network, and the segmented character is identified through the second neural network, so that the segmentation result and the identification result can be well controlled, and the segmentation and identification accuracy and the network training efficiency are improved.

These and other aspects of the present application will be more readily apparent from the following description of the embodiments.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments will be briefly introduced below, and it is apparent that the drawings in the following description are only some embodiments, not all embodiments, of the present application. All other embodiments and drawings obtained by a person skilled in the art based on the embodiments of the present application without any inventive step are within the scope of the present invention.

FIG. 1 is a schematic diagram of an application environment suitable for use in embodiments of the present application;

FIG. 2 is a flow chart illustrating a neural network model-based handwriting recognition method according to an embodiment of the present application;

FIG. 3 is a flow chart illustrating a method for neural network model-based handwriting recognition according to another embodiment of the present application;

FIG. 4 is a segmentation schematic diagram of a neural network model-based handwriting recognition method according to another embodiment of the present application;

FIG. 5 is a flow chart illustrating a method for neural network model-based handwriting recognition according to another embodiment of the present application;

FIG. 6 is a flow chart illustrating a neural network model-based handwriting recognition method according to yet another embodiment of the present application;

FIG. 7 is a flow chart illustrating a method for training a neural network model according to yet another embodiment of the present application;

FIG. 8 is a schematic flow chart diagram illustrating a method for training a neural network model according to yet another embodiment of the present application;

FIG. 9 is a block diagram of a neural network model-based handwriting recognition apparatus according to an embodiment of the present application;

FIG. 10 is a block diagram of a neural network model-based handwriting recognition apparatus according to another embodiment of the present application;

FIG. 11 is a block diagram illustrating an electronic device for executing a neural network model-based handwriting recognition method according to an embodiment of the present application;

fig. 12 is a block diagram illustrating a computer-readable storage medium for executing a neural network model-based handwriting recognition method according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The main disadvantage of the existing Chinese handwriting recognition is that the recognition of a continuous line of Chinese handwriting fonts (namely text lines including handwriting fonts) is not accurate enough, and the accuracy rate is often low. In the process of research, the inventor finds that one main reason for inaccurate recognition of text lines is that the segmentation effect on the written Chinese is poor, and continuous strokes, radical radicals and the like are often segmented by mistake, so that the recognition rate of the text lines is reduced.

Therefore, in order to improve the above defects, a handwriting recognition method, a handwriting recognition device and an electronic device based on a neural network model according to the embodiments of the present application are provided.

In order to better understand the method, the apparatus, the electronic device, and the storage medium for handwriting recognition based on a neural network model provided in the embodiments of the present application, an application environment suitable for the embodiments of the present application is described below.

Referring to fig. 1, fig. 1 is a schematic diagram of an application environment suitable for one embodiment of the present application. The handwriting recognition method based on the neural network model provided by the embodiment of the application can be applied to the handwriting recognition system 10 based on the neural network model shown in fig. 1. The handwriting recognition system 10 based on the neural network model includes a terminal device 100 and a server 120.

The terminal device 100 may be various electronic devices with a display screen, including but not limited to a smart phone, a tablet computer, a laptop portable computer, a desktop computer, a wearable electronic device, and the like. In some embodiments, the terminal device 100 may receive data input, the data input may be based on a display screen input device provided on the terminal device 100, receive a handwriting input signal based on the display screen input, and the terminal device 100 may obtain corresponding characters according to the handwriting input signal.

The server 120 may be a traditional server, a cloud server, a single server, or a server cluster, and is not limited specifically herein.

The terminal device 100 and the server 200 are located in a wireless network or a wired network, and the terminal device 100 and the server 200 can perform data interaction. In some embodiments, the server 200 may be communicatively connected to a plurality of terminal devices 100, the terminal devices 100 may be communicatively connected to each other through the internet, and the server 200 may also be used as a transmission medium to implement data interaction with each other through the internet.

In some embodiments, the terminal device 100 may store a neural network model. In other embodiments, server 200 may store a neural network model.

The above application environments are only examples for facilitating understanding, and it is to be understood that the embodiments of the present application are not limited to the above application environments.

The method, the apparatus, the electronic device and the storage medium for handwriting recognition based on a neural network model provided in the embodiments of the present application will be described in detail below with specific embodiments.

Referring to fig. 2, an embodiment of the present application provides a handwriting recognition method based on a neural network model, which can be applied to the terminal device or the server. As will be explained in detail with respect to the flow shown in fig. 2, the above-mentioned handwriting recognition method based on neural network model may specifically include the following steps:

step S110: and preprocessing the image to be recognized to obtain at least one text line.

The image to be recognized includes handwritten fonts which may include, but are not limited to, handwritten Chinese and English, for example, the handwritten Chinese and English may be handwritten "the first content a may be the second content B in one case".

In some embodiments, the image to be recognized may be obtained by photographing a document, a form table, a ticket, a manuscript document, and the like, and the image to be recognized may be preprocessed to obtain at least one text line. The preprocessing comprises the steps of segmenting to obtain text lines, denoising the text lines obtained after segmenting, normalizing the size of the text lines, carrying out binarization processing and the like, and preprocessing the image to be recognized to obtain at least one standard text line for subsequent operation.

It should be noted that the text line refers to an image including the text line in this embodiment.

In some embodiments, step S110 may be preceded by acquiring an image to be recognized. In one example, a user may upload an image to be recognized including a handwritten font through a client installed in a terminal device, such as an APP, an applet, or the like, and thus, the terminal device may acquire the image to be recognized based on the client.

In some embodiments, step S110 may be performed by the terminal device. In other embodiments, step S110 may also be executed by the server, for example, after the terminal device acquires the image to be recognized, the terminal device may send the image to be recognized to the server, and instruct the server to pre-process the image to be recognized, so as to obtain at least one text line.

Step S120: the line of text is input into a first neural network to obtain at least one segmented character.

The first neural network is used for segmentation, can segment input text lines and obtains at least one segmented character. In some embodiments, a text line is input to the first neural network, a plurality of segmentation labels may be output, a segmentation position is determined according to the segmentation labels, and the text line is segmented according to the segmentation position to obtain at least one segmented character.

In some embodiments, the first neural network may be constructed based on a Convolutional Neural Network (CNN). Specifically, in one example, the input features of the first neural network may be a 256X2048 two-dimensional image matrix, which may correspond to a grayscale map of 256 pixels high and 2048 pixels wide, and the output features may be 2048X1 one-dimensional vectors with values of 0 or 1. Wherein, 0 represents that the character is not divisible, and 1 represents that the character can be divided, so that the dividing position can be determined according to the output result of the first neural network, and the character is divided to obtain at least one divided character. Because the current Chinese handwriting recognition method is usually based on a rule and an image processing method, adjustment and optimization are difficult to perform according to inaccurate recognition results, and the embodiment can automatically fit data by segmenting Chinese handwriting characters by adopting a neural network, so that the generalization capability is stronger.

In some embodiments, the first neural network may be stored in a server, and the server runs the first neural network. The server obtains the text line and inputs the text line into the first neural network, and at least one segmented character output by the first neural network can be obtained. Therefore, the first neural network can be stored in the server and run by the server, so that the occupation of the storage space, the calculation resources and the like of the terminal equipment can be greatly reduced, and the terminal equipment with insufficient storage space and less calculation resources can also realize the handwriting recognition method based on the neural network model provided by the embodiment.

In other embodiments, the first neural network may be stored and executed on the terminal device. The terminal device obtains the text line, inputs the text line into the first neural network, and can obtain at least one segmented character output by the first neural network. Therefore, when the terminal equipment and the server do not establish communication connection or the communication connection is disconnected, the terminal equipment can still segment the text line to obtain the segmented characters. Therefore, the handwriting recognition method based on the neural network model provided by the embodiment can also be suitable for an off-line environment, can still normally operate in an environment with a poor network state, and achieves the technical effect which can be achieved by the embodiment.

Step S130: and inputting at least one segmented character into the second neural network to output corresponding text after the handwriting font is recognized.

The second neural network is used for recognition, and in this embodiment, the segmented characters may be recognized to output corresponding text after recognizing the handwritten font. Note that one divided character corresponds to one image.

In some embodiments, the second neural network may be constructed based on a convolutional neural network. Specifically, in one example, the second neural network may classify the character picture, recognize the segmented character into a class 3755 chinese character in the international primary font library, and output a Softmax vector, which may be 3755X1, with each position corresponding to a class 3755 chinese character, in the input feature size of the second neural network may be a two-dimensional image matrix of 64X 64. Thus, from the output Softmax vector, the target kanji corresponding to the position where the numerical value is the largest, that is, the recognition result of the input divided character, can be specified.

In some embodiments, the segmented characters may be scaled to match the input feature size of the second neural network, and then the scaled segmented characters matching the input feature size may be input to the second neural network. In one example, the input feature size of the second neural network may be 64X64, and may be other sizes, which is not limited herein.

In some embodiments, the second neural network may be stored on a server, and the server runs the second neural network. The server obtains the text line and inputs the text line into the second neural network, and at least one segmented character output by the second neural network can be obtained. Therefore, the second neural network can be stored in the server and operated by the server, so that the occupation of the storage space, the operation resources and the like of the terminal equipment can be greatly reduced, and the terminal equipment with insufficient storage space and less operation resources can also realize the handwriting recognition method based on the neural network model provided by the embodiment.

In the handwriting recognition method based on the neural network model provided in this embodiment, at least one text line is obtained by preprocessing an image to be recognized including a handwriting font, then the text line is input to the first neural network to obtain at least one segmented character, and finally the at least one segmented character is input to the second neural network to output a text corresponding to the recognized handwriting font. Therefore, the handwritten font is respectively segmented through the two neural networks, namely the first neural network, and the segmented character is identified through the second neural network, so that the segmentation result and the identification result can be well controlled, and the segmentation and identification accuracy and the network training efficiency are improved.

Referring to fig. 3, another embodiment of the present application provides a handwriting recognition method based on a neural network model, which can be applied to the terminal device or the server. As will be explained in detail with respect to the flow shown in fig. 3, the above-mentioned handwriting recognition method based on neural network model may specifically include the following steps:

step S210: and preprocessing the image to be recognized to obtain at least one text line.

Step S220: inputting the text line into a first neural network to obtain estimated position information of at least one character and a segmentation label corresponding to the estimated position information.

In some embodiments, the first neural network may be constructed based on a convolutional neural network. Specifically, in one example, the first neural network may begin a one-dimensional convolution using a CNN convolution kernel of 7 × 64, then follow several two-dimensional CNN convolution layers, and finally perform a two-classification based on the fully-connected layers, which may result in a segmentation label. The segmentation label corresponds to estimated position information of the character, and whether the position corresponding to the estimated position information needs to be segmented or not can be determined according to the segmentation label corresponding to the estimated position information. The segmentation labels comprise divisible labels, and the positions to be segmented can be determined according to the divisible labels. In one example, the partitionable tags may be 1.

In a specific embodiment, the first neural network is constructed based on CNN, and includes 7 layers, and the convolution parameters of the first layer to the seventh layer may be 3x 3x 32, 3x 3x 64, 3x 3x 128, 3x 3x 256, 3x 3x 512, 3x 3x 1024, and 3x 3x 2048, respectively.

The character and the text line both correspond to an image, and the estimated position information of the character may be determined by the position of the character in the text line, for example, the coordinate of the lower left corner of the character in the text line may be used as the estimated position information of the character, and the coordinate of the center of the character in the text line may also be used as the estimated position information of the character, which is not limited in this embodiment.

In some embodiments, the characters include space characters, literal characters, etc., the literal characters have corresponding literals, and the space characters do not contain any literals. In one example, the partitionable labels may characterize that the estimated position information for the character corresponds to a blank character.

Step S230: and determining the divisible positions according to the divisible labels.

In some embodiments, the partitionable positions may be determined according to the partitionable labels, and if the partitionable labels correspond to the blank characters, the partitionable positions may be determined according to the estimated position information of the partitionable labels, for example, the partitionable labels may be determined as the partitionable positions for the corresponding regions in the text line.

For example, referring to FIG. 4, FIG. 4 illustrates a segmentation schematic implemented for an input line of text, including text characters. According to the motor vehicle traffic accident, the lower image is an image corresponding to the output segmentation result, wherein the black area corresponds to the segmentable label, namely the segmentable position. Additionally, in some embodiments, the white area may correspond to an undivided label. The black areas correspond to blank characters in the text lines and the white areas correspond to literal characters in the text lines.

In other embodiments, the split tag may also include an indivisible tag, which may be 0, for example. Wherein the non-divisible labels may characterize alphabetic characters and the divisible labels may characterize whitespace characters.

As an embodiment, by taking the upper left corner of the text line as the origin of coordinates (0,0) in units of pixels, downward as the y-axis positive half axis, and rightward as the x-axis positive half axis, the split labels may be acquired pixel by pixel along the x-axis positive half axis from x ═ 0, and at this time, the estimated position information of the split labels may include only the x coordinate, then the split labels and the non-split labels adjacent to the estimated position information may be acquired, and the start position or the end position of the partitionable position may be determined from the estimated position information of the split labels and the non-split labels, and the whole area between the start position and the end position may be taken as the partitionable position.

Specifically, in one example, when the split tags are acquired one by one along the positive x-axis half axis, if the previous split tag of the split tags is an unsegmentable tag, the start position of the split position may be determined according to the estimated position information of the split tag, and the next split tag may be continuously acquired; when the next segmentation label obtained from the segmentable labels is known to be an unsegmentable label, the ending position of the segmentable position can be determined according to the estimated position information of the segmentable label, so that the whole area between the starting position and the midpoint position can be used as the segmentable position according to the starting position and the ending position, and the starting position and the ending position are segmentation limits of the segmentable position respectively.

In some embodiments, in order to avoid erroneous segmentation and improve the segmentation accuracy, the segmentable positions may be determined according to whether a specified number of segmentable labels are continuously obtained, so that when the number of the continuously obtained segmentable labels is lower than the specified number, the segmentable positions are not determined, and the positions corresponding to the segmentable labels are segmented. Therefore, the method can effectively prevent wrong segmentation, so that a complete character is segmented, the segmentation accuracy is improved, and the identification accuracy is further improved. In one example, a partitionable position is determined if three partitionable labels are obtained in succession. For example, the partitionable label is 0, at this time, if only a single 0 is obtained, or only two consecutive 0 s are obtained, that is, three consecutive partitionable labels are "101" or "100", at this time, the partitionable position is not determined according to the partitionable label, that is, the position of the partitionable label is not considered to correspond to the space character; if 30 s are continuously obtained, the position of 30 s is considered to correspond to the blank character, and the stylizable position can be determined according to the continuously obtained 3 divisible labels

In another example, after three consecutive partitionable labels of which the estimated position information is consecutive are obtained, the partitionable positions are determined according to the three consecutive partitionable labels, which may be implemented as follows: judging whether the first segmentation label and the second segmentation label which are sequentially continuous in estimated position information are all divisible labels, if so, continuously acquiring a third segmentation label which is continuous in next estimated position information, if the third segmentation label is a divisible label, determining the divisible positions according to the first segmentation label, the second segmentation label and the third segmentation label, for example, determining the divisible positions according to the estimated position information of the second segmentation label, determining the positions between the estimated position information of the first segmentation label and the estimated position information of the third segmentation label as the divisible positions, and determining the whole area between the estimated position information of the first segmentation label and the estimated position information of the third segmentation label as the divisible positions without limitation.

Step S240: and performing character segmentation on the text line region according to the segmentable position to obtain at least one segmented character.

And performing character segmentation on the text line region according to the segmentable position to obtain at least one segmented character. In some embodiments, a location may be determined from the partitionable locations and the text line region partitioned from the location. In other embodiments, the text line region may be further character-cut according to the start position and the end position of the segmentable positions, and images between the start position and the end position of each segmentable position are removed, and each of the remaining images is used as each segmented character.

For example, referring to FIG. 4, FIG. 4 shows a segmentation schematic, the diagram of FIG. 4 being implemented for an input line of text, including a text character. According to the motor vehicle traffic accident, the lower graph in fig. 4 is an image corresponding to the output segmentation result, wherein the black area corresponds to the segmentable label, namely, the segmentable position. Additionally, in some embodiments, the white area may correspond to an undivided label. The black area corresponds to the blank character in the text line, and the white area corresponds to the literal character in the text line, for example, the text line area shown in fig. 4 is character-segmented according to the segmentable position, so that at least one segmented character "real", "executed", "of", "and" can be obtained. "," root "," according to ",", "machine", "moving", "vehicle", "traffic", "accident", "event".

Step S250: and inputting at least one segmented character into the second neural network to output corresponding text after the handwriting font is recognized.

In one embodiment, the second neural network may be constructed based on a Residual neural network (ResNet), in one example, the input features of the second neural network are a two-dimensional image matrix of 64X64, the output features are Softmax vectors of 3755X1, and 50 layers of recognition network layers may be included in the second neural network.

It should be noted that, portions not described in detail in this embodiment may refer to the foregoing embodiments, and are not described herein again.

The handwriting recognition method based on the neural network model provided by this embodiment obtains at least one text line by preprocessing an image to be recognized, then inputs the text line into a first neural network, obtains estimated position information of at least one character and a segmentation label corresponding to the estimated position information, then determines a segmentable position according to the segmentable label, then performs character segmentation on a text line region according to the segmentable position, obtains at least one segmented character, and finally inputs the at least one segmented character into a second neural network to output a text corresponding to the recognized handwriting font. Therefore, the segmentable positions can be determined through the first neural networks respectively, the text line is segmented, the segmented characters are obtained for recognition, the recognized text is output, handwriting recognition is achieved, the segmentation and recognition are processed through one neural network respectively, the first neural network for segmentation can be optimized according to needs, the segmentation accuracy is improved, and the second neural network for recognition is optimized, and the recognition accuracy is improved.

In addition, in some embodiments, after the text is recognized, the recognized text can be corrected through a preset language model, so that after the segmented characters are recognized, the recognized single character result is corrected and the recognition result is optimized by using the language model, and the recognition accuracy is improved. Specifically, referring to fig. 5, a method for handwriting recognition based on a neural network model according to another embodiment of the present application is shown, where the method includes: s310 to S360.

S310: and preprocessing the image to be recognized to obtain at least one text line.

S320: and inputting the text line into a first neural network to obtain estimated position information of at least one character and a segmentation label corresponding to the estimated position information.

S330: and determining the divisible positions according to the divisible labels.

S340: and performing character segmentation on the text line region according to the segmentable position to obtain at least one segmented character.

S350: and inputting at least one segmented character into the second neural network to output corresponding text after the handwriting font is recognized.

S360: and inputting the recognized text into a preset language model to obtain a corrected text.

The preset language model is used for correcting the expression reasonability of the recognized text. Therefore, after the segmented characters are recognized, the recognized single character result is corrected and the recognition result is optimized by using the language model, and the recognition accuracy is improved. Therefore, after the segmented characters are recognized, the recognized single character result is corrected and the recognition result is optimized by using the language model, and the recognition accuracy is improved.

In an embodiment, the preset language model may adopt a Ngram language model, and the input text is input into the Ngram language model based on the output text corresponding to the recognized handwritten font, so as to obtain a modified text. In one example, the text corresponding to the recognized handwritten font includes "income bank", which is obtained by word segmentation processing, N is 2, which is 2gram, and adjacent words are combined to obtain "income bank" and "citizen", where "income bank" may be detected as an error in the 2gram and then corrected to obtain "people".

In other embodiments, to further improve the correction effect, the preset language model may be replaced by a more complex neural network language model, such as a recursive neural network-based language model.

In addition, in some embodiments, after the text after the handwritten font is recognized is output, the evaluation of the handwriting recognition result by the user can be obtained, and the parameters of the neural network model are adjusted according to the evaluation result, so that the performance of the neural network model is optimized, and the recognition accuracy is further improved. Specifically, referring to fig. 6, a method for handwriting recognition based on a neural network model according to still another embodiment of the present application is shown, where the method includes:

step S410: and preprocessing the image to be recognized to obtain at least one text line.

Step S420: the line of text is input into a first neural network to obtain at least one segmented character.

Step S430: and inputting at least one segmented character into the second neural network to output corresponding text after the handwriting font is recognized.

Step S440: and obtaining an evaluation result of the text of the user after the handwriting font is recognized based on output.

In some embodiments, the terminal device may display a text corresponding to the recognized handwritten word for the user to view, and may receive an evaluation result input by the user based on the terminal device. The evaluation result comprises an error character, a correct character corresponding to the error character and correct position information. The correct position information is position information corresponding to the correct character, and may be position information of the correct character in a text line, for example.

In one example, when the terminal device displays a text corresponding to the recognized handwritten word, the terminal device may correspondingly display the input image to be recognized or a text line corresponding to the image to be recognized, and the user may select a position or an area in the image to be recognized or the text line as the correct position information. For example, the image to be recognized actually includes "we are not one person on the road", but the text after the handwritten font is recognized as "three or two of us on the road", and the "not" character is erroneously divided into "three" and "two", at this time, the user can input the wrong characters as "three" and "two", and the correct character corresponding to the wrong character is "not". And as an implementation mode, the user can select the left boundary position and the right boundary position of 'no' in the image or text line to be recognized as the correct position information.

For another example, the image to be recognized actually includes "motor vehicle is driving on eighty-five roads", and the text after the handwritten font is recognized is "motor vehicle is driving on quintuple-five roads", wherein the "eight" character is recognized as "person" by mistake, and at this time, the user can input the wrong character as "person", and the correct character corresponding to the wrong character is "eight". And as an implementation mode, the user can select the left boundary position and the right boundary position of 'eight' in the image or text line to be recognized as the correct position information.

It should be noted that the above is only an exemplary illustration, and the correct position information may also be at least one of a left boundary position, a right boundary position, an upper boundary position, a lower boundary position, an upper left corner position, a lower left corner position, an upper right corner position, a lower right corner position, and the like, which is not limited herein. In some examples, the aforementioned locations may correspond to coordinates in an image or text line to be recognized.

Step S450: and taking the correct position information as real position information, and acquiring a first loss function value corresponding to the wrong character.

The first loss function value corresponds to the first neural network and is used for measuring the error between the output of the first neural network corresponding to the error character and the real position information corresponding to the error character. In some embodiments, the first loss function value may employ Cross-Entropy (Cross-Entropy).

It can be understood that if there is a segmentation error that incorrectly segments one correct character into two incorrect characters, the positions of the two incorrect characters in the text line can be obtained; and acquiring the output of the first neural network corresponding to the error segmentation position according to the two positions, namely the segmentation position corresponding to the segmentation label, and recording the segmentation position as the error segmentation position. The erroneous-division position may be one coordinate or an area between two coordinates.

In some embodiments, the correct position information may be used as the real position information, the output of the first neural network corresponding to the error character may be obtained, the error segmentation position may be determined, and the first loss function value corresponding to the error character may be obtained according to the correct position information and the error segmentation position.

In one embodiment, the pixel unit is used, the upper left corner of a text line is used as a coordinate origin (0,0), the y-axis positive half axis is arranged downwards, the x-axis positive half axis is arranged rightwards, the x coordinate of correct position information is used as the x coordinate of real position information, the output of a first neural network corresponding to an error character is obtained, the x coordinate of an error segmentation position between the error characters is determined, and a first loss function value is obtained according to the x coordinate of the real position information and the x coordinate of the error segmentation position. For example, a difference between the x-coordinate of the erroneous-division position and the x-coordinate of the real position information may be acquired as the first loss function value. In one example, if the correct position information includes a plurality of position information, the difference between the x-coordinate of the erroneously divided position and the x-coordinate of each correct position information may be obtained, and the average value of the differences may be used as the first loss function value. In other examples, a cross entropy between an x-coordinate of the erroneous-division position and an x-coordinate of the real position information may also be acquired as the first loss function value.

In some examples, the correct position information includes a left boundary position and a right boundary position, x coordinates of the two positions are respectively used as x coordinates of the real position information, and after segmentation is performed according to the two x coordinates, a correct character located between the two x coordinates can be segmented.

In some embodiments, the evaluation result may not include the correct position information, that is, the text after the handwritten font is recognized may not have a segmentation error, and the first loss function value may be obtained to be 0, so that the network parameter of the first neural network may not be subsequently adjusted.

Step S460: and taking the correct character as a real character, and acquiring a second loss function value corresponding to the wrong character.

And the second loss function value corresponds to the second neural network and is used for measuring the error between the output of the second neural network corresponding to the error character and the real character corresponding to the error character. Since the second neural network is used for recognition, the second loss function value is used to scale and adjust the recognition accuracy. In some embodiments, the second loss function value may employ a cross-entropy.

In some embodiments, the text after the handwriting font is recognized may not have a recognition error, and the second loss function value may be obtained as 0, so that the network parameters of the second neural network may not be adjusted subsequently.

Step S470: and comparing the first loss function value and the second loss function value with a preset threshold value respectively, determining the loss function value exceeding the preset threshold value as a target loss function value, and determining the neural network corresponding to the target loss function value as a target neural network.

The preset threshold value can be set according to needs, and it can be understood that, in a certain range, the higher the preset threshold value is, the lower the precision requirement on the neural network is, and the lower the preset threshold value is, the higher the precision requirement on the neural network is. In some examples, the preset threshold may be 2.5, 3.5, 5, etc., and is not limited herein.

And if the first loss function value exceeds a preset threshold value, determining the first loss function value as a target loss function value, and determining the first neural network as the target neural network for adjusting the first neural network corresponding to the first loss function value. Therefore, the segmentation effect of the first neural network for segmentation can be determined according to the comparison between the first loss function value and the preset threshold, and if the first loss function value exceeds the preset threshold, the segmentation effect can be considered to be poor.

And if the second loss function value exceeds a preset threshold value, determining the second loss function value as a target loss function value, and determining the second neural network as the target neural network so as to adjust the second neural network corresponding to the second loss function value. Therefore, the recognition effect of the second neural network for recognition can be determined according to the comparison between the second loss function value and the preset threshold, and if the second loss function value exceeds the preset threshold, the recognition effect can be considered to be poor.

Therefore, by obtaining the first loss function value and the second loss function value and comparing the first loss function value and the second loss function value with the preset threshold value respectively, whether the currently output text corresponding to the recognized handwritten font is accurate can be determined, and when the text is inaccurate, the reason that the output text is inaccurate is that the segmentation of the first neural network is inaccurate, the recognition of the second neural network is inaccurate, or the segmentation is not accurate and the recognition is not accurate, so that the inaccurate neural network is determined as a target neural network, the loss function value exceeding the preset threshold value is determined as a target loss function value for subsequent targeted repair, and training and optimization are continued by adjusting network parameters, so that the training efficiency of the neural network is greatly improved, and the improvement of the accuracy of handwritten recognition is facilitated.

Step S480: and adjusting network parameters of the target neural network based on the target loss function value, and using the adjusted target neural network for next handwriting recognition.

And adjusting network parameters of the target neural network based on the target loss function value, and using the adjusted target neural network for next handwriting recognition, so that whether the reason for inaccurate recognition of the finally output text is poor segmentation of the first neural network or poor recognition of the second neural network, or poor segmentation and recognition can be determined according to the comparison of the first loss function value and the preset threshold and the comparison of the second loss function value and the preset threshold respectively. Therefore, the loss function value exceeding the preset threshold value is determined as the target loss function value, the neural network corresponding to the loss function value is determined as the target neural network, the network parameters of the target neural network can be adjusted based on the target loss function value, and the adjusted target neural network is used for next handwriting recognition, so that the recognition accuracy of handwriting recognition by adopting the first neural network and the second neural network is improved continuously through targeted training and optimization during next handwriting recognition.

On the basis of the foregoing embodiment, the handwriting recognition method based on the neural network model according to the embodiment obtains the evaluation of the handwriting recognition result by the user, and adjusts the parameters of the neural network model according to the evaluation result, so that the performance of the neural network model can be continuously optimized in the reuse process, and the recognition accuracy is further improved. And the two neural networks are adopted for respectively carrying out segmentation and recognition, namely the first neural network is adopted for segmentation, and the second neural network is adopted for recognizing the segmented characters, so that a first loss function value of the first neural network and a second loss function value of the second neural network can be obtained according to a final recognition result and are respectively compared with a preset threshold value, so that whether the reason of wrong character recognition is that the first neural network is not accurately segmented or the second neural network is not accurately recognized is determined, the inaccurate neural networks can be adjusted in a targeted manner, the accuracy of next handwriting recognition is improved, the handwriting effect is continuously optimized, the two neural networks are adopted for respectively carrying out segmentation and recognition, and the training efficiency of the neural networks is improved.

In addition, an embodiment of the present application further provides a training method of a neural network model, and specifically, please refer to fig. 7, which shows the training method of the neural network model provided in yet another embodiment of the present application, and specifically, the method may include: step S510 to step S550.

Step S510: inputting the training text line into a first neural network to obtain at least one segmented character, taking the real position information of each character in the training text line and the corresponding real segmentation label as the expected output of the first neural network, and acquiring a first loss function according to the expected output and the actual output of the first neural network.

In some embodiments, the neural network model may be trained using training handwritten text. The training handwritten text comprises a training text line, each real character in the training text line, real position information of each real character and a corresponding real segmentation label. Wherein the training handwritten text may be derived from handwritten text. In some examples, 5000 handwritten manuscripts containing the Chinese handwritten fonts can be collected, and training handwritten texts can be obtained after labeling the 5000 handwritten manuscripts, and specifically, each real character in a training text line, real position information of each real character in the training text line, and a corresponding real segmentation label can be labeled in the training handwritten texts.

The real position information of the real characters comprises the coordinates of the real characters, and real segmentation labels for segmenting between the characters can be obtained through the coordinates of each real character.

In some embodiments, if there is an overlap between two real characters, the overlapping portion may be treated as a partitionable label.

Optionally, the greater the difference between the actual output obtained by the first neural network and the desired output, the greater the first loss function value, which indicates a greater need to adjust the current network parameters of the first neural network. In some embodiments, the first loss function value may comprise a first cross entropy.

Step S520: inputting at least one segmented character into a second neural network, taking each real character in the training text line as expected output, and acquiring a second loss function according to the expected output and the actual output of the second neural network.

Alternatively, the greater the difference between the actual output obtained by the second neural network and the desired output, the greater the second loss function value, indicating a greater need to adjust the current network parameters of the second neural network. In some embodiments, the second loss function value may comprise a second cross entropy.

Step S530: and judging whether the first loss function and the second loss function both meet a preset convergence condition or not, wherein the iteration times exceed the preset times.

The number of iterations may be the number of times of adjusting network parameters of the neural network model in one model training. In one example, the number of iterations may be the number of times steps S510 to S540 are performed consecutively, and after each execution of step S550, model training is completed once. The preset times can be set according to needs. The iteration times are used as a basis for stopping one-time training, so that the number of times of model training at least exceeds the preset number of times, and one-time training can be stopped, thereby avoiding that the training is stopped too early and the actual training effect is not reached due to the fact that the first loss function and the second loss function both meet the preset convergence condition just in the early stage of the training.

In one embodiment, whether the first loss function satisfies the predetermined convergence condition may be determined by determining whether the first loss function is smaller than a first loss threshold, and determining that the first loss function satisfies the predetermined convergence condition when the first loss function is smaller than the first loss threshold. In addition, whether the second loss function satisfies the preset convergence condition may be determined by determining whether the second loss function is smaller than a second loss threshold, and when the second loss function is smaller than the second loss threshold, it is determined that the second loss function satisfies the preset convergence condition. Further, when the first loss function is smaller than the first loss threshold and the second loss function is smaller than the second loss threshold, it is determined that both the first loss function and the second loss function satisfy the preset convergence condition. The first loss threshold and the second loss threshold may be the same or different, and are not limited herein.

In some embodiments, when the number of iterations exceeds the preset number, it may be determined whether the first loss function satisfies the preset convergence condition, and when the first loss function satisfies the preset convergence condition, it may be determined whether the second loss function satisfies the preset convergence condition. In still other embodiments, it may be further determined whether the second loss function satisfies the predetermined convergence condition when the iteration number exceeds the predetermined number, and whether the first loss function satisfies the predetermined convergence condition when the second loss function satisfies the predetermined convergence condition.

In this embodiment, determining whether the first loss function and the second loss function both satisfy the predetermined convergence condition, and after the iteration number exceeds the predetermined number, the method may further include:

when at least one of the first loss function and the second loss function does not satisfy the preset convergence condition, or the number of iterations does not exceed a preset number, step S540 may be performed.

When both the first loss function and the second loss function satisfy the predetermined convergence condition and the iteration number exceeds the predetermined number, step S550 may be performed.

Step S540: and adjusting the model parameters of the neural network model according to the judgment result of whether the first loss function and the second loss function meet the preset convergence condition, and acquiring a next training text line and inputting the next training text line into the first neural network for next training.

When at least one of the first loss function and the second loss function does not meet the preset convergence condition or the iteration frequency does not exceed the preset frequency, the model parameters of the neural network model can be adjusted according to the judgment result of whether the first loss function and the second loss function meet the preset convergence condition, and the next training text line is obtained and input into the first neural network for the next training.

Step S550: stopping training the neural network model and obtaining the trained neural network model for handwriting recognition.

Therefore, the first loss function and the second loss function both meet the preset convergence condition and the iteration times exceed the preset times to serve as conditions for finishing the model training, when the first loss function and the second loss function both meet the preset convergence condition and the iteration times exceed the preset times, the training of the neural network model is stopped, and the trained neural network model is obtained to be used for handwriting recognition.

In some embodiments, referring to fig. 8, a method for training a neural network model according to still another embodiment of the present application includes:

step S610: inputting the training text line into a first neural network to obtain at least one segmented character, taking the real position information of each character in the training text line and the corresponding real segmentation label as the expected output of the first neural network, and acquiring a first loss function according to the expected output and the actual output of the first neural network.

Step S620: inputting at least one segmented character into a second neural network, taking each real character in the training text line as expected output, and acquiring a second loss function according to the expected output and the actual output of the second neural network.

Step S630: and judging whether the first loss function and the second loss function both meet a preset convergence condition or not, wherein the iteration times exceed the preset times.

S640 a: and if the first loss function does not meet the preset convergence condition, adjusting the parameters of the first neural network.

S640 b: and if the second loss function does not meet the preset convergence condition, adjusting the parameters of the second neural network.

Therefore, network parameters can be adjusted in a targeted manner according to whether the loss function meets the preset convergence condition, so that when the segmentation is not good, the first neural network parameters are adjusted to reduce the first loss function acquired at the next time; upon identifying the discrepancy, the second neural network parameters are adjusted to reduce the second loss function for the next acquisition. Therefore, the network parameters of the neural network model are continuously adjusted, so that the handwriting recognition effect of the neural network model is improved, and the recognition accuracy is improved.

In some embodiments, network parameters of the neural network model may be adjusted and optimized based on Adaptive Moment Estimation (ADAM). In one embodiment, for the first neural network, the momentum factor may be set to 0.97, the base learning rate may be 0.0001, and the training BATCH SIZE (BATCH _ SIZE) is 10, i.e., iterating one training pass of the first neural network requires 10 training text lines to be obtained and input into the first neural network. For the second neural network, the momentum factor can be set to 0.9, 0.999, the basic learning rate is 0.0001, and the training batch size is 32, i.e. 32 characters are required to be obtained by iterating once the training of the second neural network. Therefore, the neural network model is trained by adjusting the network parameters of the first neural network and the second neural network, the generalization capability of handwriting recognition based on the neural network model is improved, and the recognition effect is improved.

Step S650: stopping training the neural network model and obtaining the trained neural network model for handwriting recognition.

Referring to fig. 9, fig. 9 is a block diagram illustrating a handwriting recognition apparatus based on a neural network model according to an embodiment of the present application. As will be explained below with respect to the block diagram shown in fig. 9, the neural network model-based handwriting recognition apparatus 900 includes: a preprocessing module 910, a segmentation module 920, and an identification module 930, the neural network model comprising a first neural network, a second neural network, wherein:

the preprocessing module 910 is configured to preprocess the image to be recognized to obtain at least one text line, where the image to be recognized includes a handwritten font;

a segmentation module 920, configured to input the text line into the first neural network to obtain at least one segmented character;

a recognition module 930 configured to input the at least one segmented character into the second neural network to output the text after recognizing the handwritten font.

The handwriting recognition device based on the neural network model provided by the embodiment of the application is used for realizing the corresponding handwriting recognition method based on the neural network model in the embodiment of the method, has the beneficial effects of the corresponding method embodiment, and is not repeated herein.

It can be clearly understood by those skilled in the art that the handwriting recognition device based on the neural network model provided in the embodiment of the present application can implement each process in the method embodiment of fig. 2, and for convenience and brevity of description, the specific working processes of the above-described device and module may refer to the corresponding processes in the foregoing method embodiment, and are not described herein again.

Referring to fig. 10, fig. 10 is a block diagram illustrating a handwriting recognition apparatus based on a neural network model according to another embodiment of the present application. As will be explained below with respect to a block diagram of modules shown in fig. 10, the neural network model-based handwriting recognition apparatus 1000 includes: a preprocessing module 1010, a segmentation module 1020, and an identification module 1030, the neural network model including a first neural network, a second neural network, wherein:

the preprocessing module 1010 is configured to preprocess the image to be recognized to obtain at least one text line, where the image to be recognized includes a handwritten font;

a segmentation module 1020 for inputting the text line into the first neural network to obtain at least one segmented character;

a recognition module 1030 configured to input the at least one segmented character into the second neural network to output the text after recognizing the handwritten font.

Further, the segmentation module 1020 includes: a text line input sub-module 1021, a position determination sub-module 1022, and a character cutting sub-module 1023, wherein:

a text line input submodule 1021, configured to input the text line into a first neural network, and obtain estimated position information of at least one character and a segmentation tag corresponding to the estimated position information, where the segmentation tag includes a segmentable tag;

a position determining submodule 1022, configured to determine a partitionable position according to the partitionable label;

and the character cutting submodule 1023 is used for carrying out character cutting on the text line area according to the divisible positions to obtain at least one cut character.

Further, the neural network model-based handwriting recognition apparatus 1000 further includes: an evaluation result obtaining module 1041, a first loss obtaining module 1042, a second loss obtaining module 1043, a target network determining module 1044, and a network parameter adjusting module 1045, wherein:

an evaluation result obtaining module 1041, configured to obtain an evaluation result of the text after the user identifies the handwritten font based on the output, where the evaluation result includes an error character, a correct character corresponding to the error character, and correct position information;

a first loss obtaining module 1042, configured to use the correct position information as real position information, to obtain a first loss function value corresponding to the error character, where the first loss function value corresponds to the first neural network, and is used to measure an error between an output of the first neural network corresponding to the error character and the real position information corresponding to the error character;

a second loss obtaining module 1043, configured to use the correct character as a real character, obtain a second loss function value corresponding to the incorrect character, where the second loss function value corresponds to the second neural network, and is used to measure an error between an output of the second neural network corresponding to the incorrect character and the real character corresponding to the incorrect character;

a target network determining module 1044, configured to compare the first loss function value and the second loss function value with a preset threshold, determine a loss function value exceeding the preset threshold as a target loss function value, and determine a neural network corresponding to the target loss function value as a target neural network;

a network parameter adjusting module 1045, configured to adjust a network parameter of the target neural network based on the target loss function value, and use the adjusted target neural network for next handwriting recognition.

Further, the target network determination module 1044 includes: a first network determination submodule and a second network determination submodule, wherein:

a first network determining submodule, configured to determine the first loss function value as a target loss function value if the first loss function value exceeds a preset threshold, and determine the first neural network as a target neural network, so as to adjust the first neural network corresponding to the first loss function value;

and the second network determining submodule is used for determining the second loss function value as a target loss function value if the second loss function value exceeds a preset threshold value, and determining the second neural network as a target neural network so as to adjust the second neural network corresponding to the second loss function value.

Further, the neural network model-based handwriting recognition apparatus 1000 further includes: a neural network training module 1050 and a recognition text modification module 1060, wherein:

and the neural network training module is used for training the neural network model by using a training handwritten text, wherein the training handwritten text comprises a training text line, each real character in the training text line, real position information of each real character and a corresponding real segmentation label.

And the recognized text correction module is used for inputting the recognized text into a preset language model to obtain a corrected text, and the preset language model is used for correcting the expression reasonableness of the recognized text.

Further, the neural network training module 1050 includes: the device comprises a first input submodule, a second input submodule, a first judgment submodule and a second judgment submodule, wherein:

the first input submodule is used for inputting the training text line into the first neural network to obtain at least one segmented character, taking the real position information of each character in the training text line and the corresponding real segmentation label as the expected output of the first neural network, and acquiring a first loss function according to the expected output and the actual output of the first neural network;

the second input submodule is used for inputting the at least one segmented character into the second neural network, taking each real character in the training text line as expected output, and acquiring a second loss function according to the expected output and the actual output of the second neural network;

a first judging submodule, configured to, when at least one of the first loss function and the second loss function does not satisfy a preset convergence condition or the iteration number does not exceed a preset number, adjust a model parameter of the neural network model according to a judgment result of whether the first loss function and the second loss function satisfy the preset convergence condition, and obtain a next training text line, and input the next training text line into the first neural network for a next training;

and the second judgment submodule is used for stopping the training of the neural network model and obtaining the trained neural network model for handwriting recognition when the first loss function and the second loss function both meet a preset convergence condition and the iteration times exceed the preset times.

The handwriting recognition device based on the neural network model provided by the embodiment of the application is used for realizing the corresponding handwriting recognition method based on the neural network model in the embodiment of the method, has the beneficial effects of the corresponding method embodiment, and is not repeated herein. It can be clearly understood by those skilled in the art that the handwriting recognition device based on the neural network model provided in the embodiment of the present application can implement each process in the method embodiments of fig. 2 to fig. 8, and for convenience and simplicity of description, the specific working processes of the above-described device and module may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, the coupling or direct coupling or communication connection between the modules shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or modules may be in an electrical, mechanical or other form.

In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

An embodiment of the present application provides an electronic device, which includes a processor and a memory, where at least one instruction, at least one program, a set of codes, or a set of instructions is stored in the memory, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement the neural network model-based handwriting recognition method as described in fig. 2 to 8 provided in the above method embodiment. In this embodiment, the electronic device may be an electronic device capable of running an application, such as a mobile phone, a tablet, a computer, a wearable device, or a server, and the specific implementation may refer to the method described in the above method embodiment.

The memory may be used to store software programs and modules, and the processor may execute various functional applications and data processing by operating the software programs and modules stored in the memory. The memory can mainly comprise a program storage area and a data storage area, wherein the program storage area can store an operating system, application programs needed by functions and the like; the storage data area may store data created according to use of the apparatus, and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory may also include a memory controller to provide the processor access to the memory.

Referring to fig. 11, a block diagram of a mobile terminal according to an embodiment of the present disclosure is shown. The electronic device 1100 in the present application may include one or more of the following components: a processor 1110, a memory 1120, and one or more applications, wherein the one or more applications may be stored in the memory 1120 and configured to be executed by the one or more processors 1110, the one or more programs configured to perform a method as described in the aforementioned method embodiments.

Processor 1110 may include one or more processing cores. The processor 1110 interfaces with various components throughout the electronic device 1100 using various interfaces and circuitry to perform various functions of the electronic device 1100 and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 1120 and invoking data stored in the memory 1120. Alternatively, the processor 1110 may be implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 1110 may integrate one or a combination of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing display content; the modem is used to handle wireless communications. It is to be appreciated that the modem can be implemented by a single communication chip without being integrated into the processor 1110.

The Memory 1120 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). The memory 1120 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 1120 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like. The stored data area may also store data created during use by the electronic device 1100 (e.g., phone books, audio-visual data, chat log data), and the like.

Further, the electronic device 1100 may further include a foldable Display screen, which may be a Liquid Crystal Display (LCD), an Organic Light-emitting diode (OLED), or the like. The display screen is used to display information entered by the user, information provided to the user, and various graphical user interfaces that may be composed of graphics, text, icons, numbers, video, and any combination thereof.

Those skilled in the art will appreciate that the configuration shown in fig. 11 is a block diagram of only a portion of the configuration associated with the present application and does not constitute a limitation on the mobile terminal to which the present application applies, and that a particular mobile terminal may include more or less components than those shown in fig. 11, or may combine certain components, or have a different arrangement of components.

Referring to fig. 12, a block diagram of a computer-readable storage medium according to an embodiment of the present application is shown. The computer readable storage medium 1200 has stored therein a program code 1210, said program code 1210 being invokable by a processor for performing the method described in the above method embodiments.

The computer-readable storage medium 1200 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Optionally, the computer-readable storage medium 1200 includes a non-transitory computer-readable storage medium. The computer readable storage medium 1200 has storage space for program code 1210 that performs any of the method steps described above. The program code can be read from or written to one or more computer program products. The program code 1210 may be compressed, for example, in a suitable form.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a smart gateway, a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

While the present embodiments have been described with reference to the accompanying drawings, the present embodiments are not limited to the above embodiments, which are merely illustrative and not restrictive, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the present invention.

Claims

1. A handwriting recognition method based on a neural network model, wherein the neural network model comprises a first neural network and a second neural network, the method comprising:

preprocessing the image to be recognized to obtain at least one text line, wherein the image to be recognized comprises a handwritten font;

inputting the text line into the first neural network to obtain at least one segmented character;

and inputting the at least one segmented character into the second neural network to output corresponding text after the handwriting font is recognized.

2. The method of claim 1, wherein said inputting the line of text into the first neural network to obtain at least one segmented character comprises:

inputting the text line into a first neural network, and obtaining estimated position information of at least one character and segmentation labels corresponding to the estimated position information, wherein the segmentation labels comprise divisible labels;

determining a partitionable position according to the partitionable label;

and performing character segmentation on the text line region according to the segmentable position to obtain at least one segmented character.

3. The method of claim 1, wherein after inputting the at least one segmented character into the second neural network to output the text after recognizing the handwritten font, comprising:

obtaining an evaluation result of the text of the handwriting font recognized by the user based on the output, wherein the evaluation result comprises an error character, a correct character corresponding to the error character and correct position information;

taking the correct position information as real position information, and acquiring a first loss function value corresponding to the error character, wherein the first loss function value corresponds to the first neural network and is used for measuring the error between the output of the first neural network corresponding to the error character and the real position information corresponding to the error character;

taking the correct character as a real character, and acquiring a second loss function value corresponding to the wrong character, wherein the second loss function value corresponds to the second neural network and is used for measuring the error between the output of the second neural network corresponding to the wrong character and the real character corresponding to the wrong character;

comparing the first loss function value and the second loss function value with a preset threshold value respectively, determining the loss function value exceeding the preset threshold value as a target loss function value, and determining a neural network corresponding to the target loss function value as a target neural network;

and adjusting the network parameters of the target neural network based on the target loss function value, and using the adjusted target neural network for next handwriting recognition.

4. The method of claim 3, wherein comparing the first and second loss function values with a preset threshold, respectively, and determining a loss function value exceeding the preset threshold as a target loss function value, and determining a neural network corresponding to the target loss function value as a target neural network, comprises:

if the first loss function value exceeds a preset threshold value, determining the first loss function value as a target loss function value, and determining the first neural network as a target neural network for adjusting the first neural network corresponding to the first loss function value;

and if the second loss function value exceeds a preset threshold value, determining the second loss function value as a target loss function value, and determining the second neural network as a target neural network so as to adjust the second neural network corresponding to the second loss function value.

5. The method of claim 1, further comprising:

training the neural network model by using a training handwritten text, wherein the training handwritten text comprises a training text line, each real character in the training text line, real position information of each real character and a corresponding real segmentation label;

the training the neural network model using training handwritten text comprises:

inputting the training text line into the first neural network to obtain at least one segmented character, taking the real position information of each character in the training text line and the corresponding real segmentation label as the expected output of the first neural network, and obtaining a first loss function according to the expected output and the actual output of the first neural network;

inputting the at least one segmented character into the second neural network, taking each real character in the training text line as expected output, and acquiring a second loss function according to the expected output and the actual output of the second neural network;

when at least one of the first loss function and the second loss function does not meet a preset convergence condition or the iteration frequency does not exceed a preset frequency, adjusting model parameters of the neural network model according to a judgment result of whether the first loss function and the second loss function meet the preset convergence condition, and acquiring a next training text line to be input into the first neural network for next training;

and when the first loss function and the second loss function both meet a preset convergence condition and the iteration times exceed the preset times, stopping training the neural network model and obtaining the trained neural network model for handwriting recognition.

6. The method of claim 5, wherein the adjusting the model parameters of the neural network model according to the determination result of whether the first loss function and the second loss function satisfy the predetermined convergence condition comprises:

if the first loss function does not meet a preset convergence condition, adjusting parameters of the first neural network;

and if the second loss function does not meet the preset convergence condition, adjusting the parameters of the second neural network.

7. The method of claim 1, wherein after inputting the at least one segmented character into the second neural network to output the text after recognizing the handwritten font, the method further comprises:

and inputting the recognized text into a preset language model to obtain a corrected text, wherein the preset language model is used for correcting the expression rationality of the recognized text.

8. An apparatus for handwriting recognition based on a neural network model, wherein the neural network model comprises a first neural network and a second neural network, the apparatus comprising:

the preprocessing module is used for preprocessing the image to be recognized to obtain at least one text line, and the image to be recognized comprises a handwritten font;

a segmentation module for inputting the text line into the first neural network to obtain at least one segmented character;

and the recognition module is used for inputting the at least one segmented character into the second neural network so as to output the text after the handwriting font is recognized.

9. An electronic device, comprising:

a memory;

one or more processors coupled with the memory;

one or more programs, wherein the one or more application programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the method of any of claims 1-7.

10. A computer-readable storage medium, characterized in that a program code is stored in the computer-readable storage medium, which program code, when executed by a processor, implements the method according to any one of claims 1 to 7.