WO2019232872A1 - 手写字模型训练方法、汉字识别方法、装置、设备及介质 - Google Patents

手写字模型训练方法、汉字识别方法、装置、设备及介质 Download PDF

Info

Publication number
WO2019232872A1
WO2019232872A1 PCT/CN2018/094403 CN2018094403W WO2019232872A1 WO 2019232872 A1 WO2019232872 A1 WO 2019232872A1 CN 2018094403 W CN2018094403 W CN 2018094403W WO 2019232872 A1 WO2019232872 A1 WO 2019232872A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
handwriting
neural network
training
network model
Prior art date
Application number
PCT/CN2018/094403
Other languages
English (en)
French (fr)
Inventor
吴启
周罡
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019232872A1 publication Critical patent/WO2019232872A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/32Digital ink
    • G06V30/36Matching; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Definitions

  • the present application relates to the field of handwriting recognition, and in particular, to a handwriting model training method, a Chinese character recognition method, a device, a device, and a medium.
  • OCR optical character recognition
  • a handwriting model training method includes:
  • the handwriting training sample includes a handwriting image and a label Chinese character associated with the handwriting image
  • the training set is input into a convolutional recurrent neural network model to obtain the forward output of the convolutional recurrent neural network model.
  • a back-propagation based on batch gradient descent is used.
  • An algorithm updates weights and biases in the convolutional recurrent neural network model to obtain a handwriting training model;
  • Input the test set into the handwriting training model obtain the recognition Chinese characters corresponding to each handwriting image, and obtain a recognition accuracy rate based on the recognized Chinese characters and the labeled Chinese characters, if the recognition accuracy rate is greater than a preset accuracy Rate, it is determined that the handwriting training model is a handwriting recognition model.
  • a handwriting model training device includes:
  • a training sample acquisition module configured to obtain a handwriting training sample, where the handwriting training sample includes a handwriting image and a label Chinese character associated with the handwriting image;
  • a training sample processing module configured to divide the handwriting training sample into a training set and a test set
  • a training model acquisition module is configured to input the training set into a convolutional recurrent neural network model, and obtain the forward output of the convolutional recurrent neural network model. Based on the forward output of the convolutional recurrent neural network model, Batch gradient descent back-propagation algorithm updates weights and biases in the convolutional recurrent neural network model to obtain a handwritten training model;
  • a recognition model acquisition module is configured to input the test set into the handwriting training model, obtain recognition Chinese characters corresponding to each handwriting image, and obtain a recognition accuracy rate based on the recognition Chinese characters and the tag Chinese characters. If the recognition accuracy rate is greater than a preset accuracy rate, it is determined that the handwriting training model is a handwriting recognition model.
  • a computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor.
  • the processor executes the computer-readable instructions, the following steps are implemented:
  • the handwriting training sample includes a handwriting image and a label Chinese character associated with the handwriting image
  • the training set is input into a convolutional recurrent neural network model to obtain the forward output of the convolutional recurrent neural network model.
  • a back-propagation based on batch gradient descent is used.
  • An algorithm updates weights and biases in the convolutional recurrent neural network model to obtain a handwriting training model;
  • Input the test set into the handwriting training model obtain the recognition Chinese characters corresponding to each handwriting image, and obtain a recognition accuracy rate based on the recognized Chinese characters and the labeled Chinese characters, if the recognition accuracy rate is greater than a preset accuracy Rate, it is determined that the handwriting training model is a handwriting recognition model.
  • One or more non-volatile readable storage media storing computer-readable instructions, and when the computer-readable instructions are executed by one or more processors, the one or more processors implement the following steps:
  • the handwriting training sample includes a handwriting image and a label Chinese character associated with the handwriting image
  • the training set is input into a convolutional recurrent neural network model to obtain the forward output of the convolutional recurrent neural network model.
  • a back-propagation based on batch gradient descent is used.
  • An algorithm updates weights and biases in the convolutional recurrent neural network model to obtain a handwriting training model;
  • Input the test set into the handwriting training model obtain the recognition Chinese characters corresponding to each handwriting image, and obtain a recognition accuracy rate based on the recognized Chinese characters and the labeled Chinese characters, if the recognition accuracy rate is greater than a preset accuracy Rate, it is determined that the handwriting training model is a handwriting recognition model.
  • a semantic database is queried based on the recognition result to obtain a target Chinese character corresponding to the single-font image.
  • An original image acquisition module configured to acquire an original image, where the original image includes handwriting and a background image
  • An effective image acquisition module configured to pre-process the original image to obtain an effective image
  • a target image acquisition module configured to process the effective image by using a kernel density estimation algorithm and an erosion method, remove a background image, and obtain a target image including the handwriting;
  • a single font image acquisition module configured to use a vertical projection method to perform single font cutting on the target image to obtain a single font image
  • a recognition result acquisition module is configured to input the single-font image into a handwriting recognition model for recognition, and obtain a recognition result corresponding to the single-font image.
  • the handwriting recognition model is obtained by using the foregoing handwriting model training method. of;
  • the target Chinese character confirmation module is configured to query a semantic database based on the recognition result to obtain a target Chinese character corresponding to the single-font image.
  • a computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor.
  • the processor executes the computer-readable instructions, the following steps are implemented:
  • a semantic database is queried based on the recognition result to obtain a target Chinese character corresponding to the single-font image.
  • One or more non-volatile readable storage media storing computer-readable instructions, wherein when the computer-readable instructions are executed by one or more processors, the one or more processors implement The following steps:
  • a semantic database is queried based on the recognition result to obtain a target Chinese character corresponding to the single-font image.
  • FIG. 1 is an application scenario diagram of a handwriting model training method according to an embodiment of the present application
  • FIG. 2 is a flowchart of a handwriting model training method according to an embodiment of the present application
  • FIG. 3 is a specific flowchart of step S30 in FIG. 2;
  • FIG. 4 is a schematic diagram of a handwriting model training device according to an embodiment of the present application.
  • FIG. 5 is a flowchart of a Chinese character recognition method according to an embodiment of the present application.
  • step S52 in FIG. 5 is a specific flowchart of step S52 in FIG. 5;
  • step S53 in FIG. 5 is a specific flowchart of step S53 in FIG. 5;
  • step S534 in FIG. 7 is a specific flowchart of step S534 in FIG. 7;
  • FIG. 9 is a schematic diagram of a Chinese character recognition device in an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a computer device according to an embodiment of the present application.
  • the handwriting model training method provided in the embodiment of the present application can be applied in the application environment shown in FIG. 1.
  • the application environment of the handwriting model training method includes a server and a client.
  • the client communicates with the server through a network.
  • the client is a device that can interact with the user, including, but not limited to, a computer, a smartphone, and a tablet. device.
  • the handwriting model training method provided in the embodiment of the present application is applied to a server.
  • a handwriting model training method includes the following steps:
  • the handwriting training sample includes a handwriting image and a label Chinese character associated with the handwriting image.
  • the server obtains handwriting training samples from the database to provide data sources for subsequent model training.
  • the handwriting training samples refer to handwriting samples stored in a database in advance for training a neural network model.
  • the handwriting training sample includes handwriting images and labeled Chinese characters associated with the handwriting images.
  • the handwritten image refers to an image carrying Chinese characters handwritten by different people.
  • one handwritten image corresponds to one handwritten letter, and each handwritten image carries a corresponding sequence label.
  • the order label refers to a label indicating the order of the handwritten image.
  • Tag Chinese characters refer to Chinese characters in standard fonts that match handwritten images and are obtained from secondary Chinese character libraries. Standard fonts include, but are not limited to, Song, Kai, and imitation Song. Associating label Chinese characters with handwritten images, you can easily identify what characters the handwritten images belong to. In this embodiment, when associating a handwritten image with a label Chinese character, the label Chinese character can only be associated with the handwritten image using the commonly used Song, Kai, or imitation Song, saving storage space and simplifying model training. For example, the handwriting on a handwritten image is "I", and the Chinese characters associated with the handwritten image are "I” in Song, Kai, or imitation Song in the secondary Chinese font library.
  • a training set is data for adjusting parameters in a convolutional recurrent neural network model.
  • a test set is data used to test the recognition accuracy of a trained convolutional recurrent neural network model.
  • a ten-fold cross-validation method is used to divide the handwriting training sample into a training set and a test set.
  • the ten-fold cross-validation method is a commonly used method to test the accuracy of the algorithm.
  • a ten-fold cross-validation method is used to classify the handwriting training samples according to a 9: 1 ratio, that is, the handwriting training samples are divided into 10 groups, of which 9 groups of handwriting training samples are used as training sets for The convolutional recurrent neural network model is trained, and the remaining 1 set of handwriting training samples is used as a test set to verify the accuracy of the trained convolutional recurrent neural network model.
  • the training set is input into the convolutional recurrent neural network model, and the forward output of the convolutional recurrent neural network model is obtained.
  • a back-propagation algorithm based on batch gradient descent is used to update
  • the convolutional recurrent neural network model is used to obtain the handwriting training model by weighting and biasing.
  • the Convolutional-Recurrent Neural Networks (C-RNN) model is composed of a Convolutional Neural Networks (CNN) model and a Recurrent Neural Networks (RNN) model.
  • a neural network model The forward output of the convolutional recurrent neural network model is the forward output of the recurrent neural network model.
  • Batch gradient descent (BGD) refers to the convolution recurrent neural network model based on backpropagation when updating the weights and biases of the convolutional recurrent neural network model. The case where the forward output obtained in the recurrent neural network model updates the weights and offsets in the convolutional recurrent neural network model.
  • the Back Propagation algorithm refers to an algorithm that adjusts the weights and offsets between the hidden layer and the output layer, and the weights and offsets between the input layer and the hidden layer in the reverse order of the timing state.
  • the handwriting training model refers to the model after the training set is input into the convolutional recurrent neural network model for training.
  • the server inputs the training set into a convolutional recurrent neural network model, and uses the convolutional layer and pooling layer in the convolutional neural network model to extract and process the handwritten images in the training set to obtain each handwritten Image characteristics of the image.
  • the image feature refers to a pixel matrix obtained by calculating a convolution layer and a pooling layer of a handwritten image.
  • the image features are input into a recurrent neural network model for training, and the forward output of the recurrent neural network model is obtained, that is, the forward output of the convolutional recurrent neural network model is obtained.
  • the forward output of the recurrent neural network model refers to the image features of the hand-written image after being processed by the recurrent neural network model and the pixel matrix output in the output layer.
  • Obtaining the forward output of the convolutional recurrent neural network model is convenient for updating the weights and biases in the convolutional recurrent neural network model based on the forward output of the convolutional recurrent neural network model.
  • an error function is constructed based on the forward output and the label Chinese characters associated with the handwritten image. The error function is used to obtain partial derivatives to update the convolutional recurrent neural network model. Weights and biases to obtain the handwriting training model.
  • the batch gradient descent back-propagation algorithm is used to update the weights and offsets in the convolutional recurrent neural network model, so that the update of the weights and offsets is based on the error function constructed from all handwritten images in the training set, ensuring that The sufficient updating of the parameters in the convolutional recurrent neural network model is improved, thereby improving the recognition accuracy of the handwriting training model.
  • the test set is input into the handwriting training model, and the recognition Chinese characters corresponding to each handwriting image are obtained.
  • the recognition accuracy rate is obtained based on the recognized Chinese characters and the labeled Chinese characters. If the recognition accuracy rate is greater than a preset accuracy rate, the handwriting training model is determined. Identify models for handwriting.
  • the handwriting recognition model refers to a model in which the recognition accuracy rate determined after testing on the handwriting training model of the test set meets a preset accuracy rate, and the handwriting recognition model can be used to recognize handwriting images. After the handwriting training model training is completed, the handwriting images of each handwriting training sample in the test set are sequentially input into the handwriting training model to obtain the recognized Chinese characters corresponding to each handwriting image.
  • the recognized Chinese characters in this embodiment specifically refer to Chinese characters recognized by handwriting image after handwriting training model.
  • recognition accuracy accurate number of recognitions / Test the number of handwritten images to calculate the recognition accuracy of the handwritten training model. If the recognition accuracy rate of the handwriting training model is greater than the preset accuracy rate, the handwriting training model is determined to be the handwriting recognition model; otherwise, if the recognition accuracy rate of the handwriting training model is not greater than the preset accuracy rate, then Re-train the handwriting training model until the recognition accuracy of the handwriting training model meets the requirements.
  • the preset accuracy rate is a preset threshold used to evaluate that the accuracy rate of the handwriting training model meets a preset requirement. For example, the preset accuracy rate is 82%. After the recognition of the handwriting training model in the test set, the recognition accuracy rate obtained is greater than 82% (such as 85% or 90%, etc.), which indicates that the handwriting training model is opposite to the handwriting training.
  • the recognition accuracy of the sample meets the requirements, and the handwriting training model can be determined as a handwriting recognition model.
  • a training set is input into a convolutional recurrent neural network model, and a forward output of the convolutional recurrent neural network model is obtained, and then based on the forward output of the convolutional recurrent neural network model.
  • the test set is input into the handwriting training model for testing.
  • the handwriting training model is determined as a handwriting recognition model for recognizing a handwriting image, so that the obtained handwriting recognition model can recognize the handwriting with high recognition accuracy.
  • the convolutional recurrent neural network model is a neural network model composed of a convolutional neural network model and a recurrent neural network model
  • Convolutional neural network model and recurrent neural network model are used for model training.
  • the training set is input into the convolutional recurrent neural network model, and the forward output of the convolutional recurrent neural network model is obtained.
  • a batch-based gradient is used.
  • the descending back-propagation algorithm updates the weights and biases in the convolutional recurrent neural network model to obtain the handwriting training model, which specifically includes the following steps:
  • the training set is input into a convolutional neural network model, and image features corresponding to the handwritten image in the training set are obtained.
  • the convolutional neural network model includes multiple layers of convolutional layers and pooling layers. After obtaining the training set, the handwriting images in the corresponding handwriting training samples are input to the convolutional neural network model for training. Through the calculation of each layer of convolutional layers, the output of each layer of convolutional layers is obtained, and the convolution is performed.
  • the output of the layer can be calculated by the formula Calculation, where a m l represents the output of the m-th sequential label of the l- th convolution layer, z m l represents the output of the m-th sequential label before the activation function is used for processing, and a m l-1 represents l-
  • the m-th sequential label output of the 1-layer convolution layer that is, the output of the previous layer
  • represents the activation function
  • the activation function ⁇ used for the convolution layer is ReLu (Rectified Linear Unit, linear rectification function), compared to other
  • * represents the convolution operation
  • W l represents the convolution kernel (weight) of the l-th convolution layer
  • b l represents the offset of the l-th convolution layer.
  • the maximum pooling downsampling is used to reduce the dimension of the output of the convolution layer in the pooling layer.
  • the downsampling calculation can choose the method of maximum pooling.
  • the maximum pooling is actually taking the maximum value in the m * m sample.
  • T (m) represents the output of the output layer of the convolutional neural network model.
  • the output is to obtain the image features of the handwritten image corresponding to the mth sequential label.
  • the image feature carries the sequential label.
  • the order label of the image feature is consistent with the order label of the handwritten image corresponding to the image label.
  • U' represents the convolutional neural network The weight between the pooling layer of the model and the hidden layer of the recurrent neural network model.
  • W ' represents the weight between the hidden layer and the hidden layer, b' represents the offset between the input layer and the hidden layer, and T (m) represents the m-th sequential label obtained by the input layer of the recurrent neural network model.
  • o (m) represents the input of the hidden layer of the recurrent neural network model to the input of the output layer
  • V ' represents the weight between the hidden layer and the output layer of the recurrent neural network model
  • c' represents the weight between the hidden layer and the output layer.
  • y (m) the forward output of the recurrent neural network model y (m) specifically refers to the forward output corresponding to the handwritten image obtained by inputting the handwritten image in the training set into the recurrent neural network model
  • " ⁇ " represents the recurrent neural network
  • the activation function of the output layer of the model is generally a softmax function.
  • S33 Construct a loss function according to the forward output of the recurrent neural network model and the labeled Chinese characters.
  • the specific expression of the loss function is: Among them, N is the number of handwriting images in the handwriting training sample, E loss ( ⁇ ) is the average of the total errors corresponding to all handwriting images in the N handwriting training samples, and M is the handwriting in the handwriting training samples.
  • the number of sequential labels carried by the image Represents the forward output of the handwritten image corresponding to the mth sequential label in the nth handwritten training sample, Represents the label Chinese character corresponding to the mth sequential label in the nth handwriting training sample, and ⁇ represents the set of weights and biases.
  • represents a set of weights and biases in a convolutional neural network model and weights and biases in a recurrent neural network model.
  • the back-propagation algorithm based on batch gradient descent is used to update and adjust the weights and offsets in the recurrent neural network model and the convolutional neural network model to obtain a handwriting training model.
  • the batch gradient descent backpropagation algorithm refers to obtaining the error values of all the handwriting images in the N handwriting training samples, averaging the error values, and using the backpropagation algorithm to update and adjust the recurrent neural network model and convolution. Methods of weights and biases in neural network models.
  • a forward output of one handwriting image in the handwriting training sample is obtained, and an error is obtained according to the forward output and the corresponding label Chinese character, and then the errors of all the handwriting images in the handwriting training sample are performed. Accumulate to obtain the sample error, where the sample error refers to the sum of the errors of all the handwritten images in the sample. Finally, the sample errors of all handwriting training samples in the training set are accumulated to obtain the total error of the training set, and the average error of the total training set is calculated to obtain E loss ( ⁇ ).
  • the back-propagation algorithm based on batch gradient descent is used to update and adjust the weights and offsets in the recurrent neural network model and the convolutional neural network model, so that the errors generated by all handwritten images in the training set can participate in the convolutional neural network.
  • the updating and adjustment of the weights and biases of the models ensure the training comprehensiveness of the handwriting training model and improve the accuracy of the handwriting training model.
  • the partial loss of E loss ( ⁇ ) is used to update and adjust the weights and offsets in the recurrent neural network model and the convolutional neural network model to obtain the handwriting training model.
  • the formula for finding partial derivatives is
  • Steps S31-S34 the image features corresponding to the handwritten images in the training set are obtained through the convolutional neural network model, and then the image features are input to the recurrent neural network model for training, and the forward output of the recurrent neural network model is obtained, and Output and label Chinese characters to build a loss function.
  • the back-propagation algorithm based on batch gradient descent is used to update and adjust the weights and offsets in the recurrent neural network model and the convolutional neural network model to obtain the handwriting training model.
  • the training comprehensiveness of the handwriting training model is guaranteed, thereby improving the accuracy of the handwriting training model.
  • a training set is input into a convolutional recurrent neural network model, and through the convolutional neural network model, image features corresponding to the handwritten image are obtained, and then the image features are input to the recurrent neural network.
  • the forward output of the recurrent neural network model is obtained, and then a loss function is constructed based on the forward output of the recurrent neural network model and the labeled Chinese characters.
  • the loss function is used to update the convolutional recurrent neural network using a back-propagation algorithm based on batch gradient descent.
  • the weights and biases in the network model and the acquisition of the handwriting training model ensure that the parameters in the convolutional recurrent neural network model are fully updated and the recognition accuracy of the handwriting training model is improved.
  • the test set is input into the handwriting training model for testing. If the recognition accuracy rate of the handwriting training model opponent training sample is greater than the preset accuracy rate, it means that the handwriting training model opponent writes The recognition accuracy of the training samples met the requirements.
  • the handwriting training model was determined as a handwriting recognition model for identifying handwriting images.
  • the handwriting recognition model has high recognition accuracy.
  • a handwriting model training device corresponds to the handwriting model training method in the above embodiment.
  • the handwriting model training device includes a training sample acquisition module 10, a training sample processing module 20, a training model acquisition module 30, and a recognition model acquisition module 40.
  • the functional modules are described in detail as follows:
  • the training sample acquisition module 10 is configured to acquire a handwriting training sample.
  • the handwriting training sample includes a handwriting image and a label Chinese character associated with the handwriting image.
  • the training sample processing module 20 is configured to divide a handwriting training sample into a training set and a test set.
  • the training model acquisition module 30 is configured to input the training set into the convolutional recurrent neural network model, and obtain the forward output of the convolutional recurrent neural network model. According to the forward output of the convolutional recurrent neural network model, a batch-based gradient descent is adopted. The back-propagation algorithm updates the weights and biases in the convolutional recurrent neural network model to obtain a handwritten training model.
  • the recognition model acquisition module 40 is configured to input a test set into a handwriting training model, obtain the recognition Chinese characters corresponding to each handwriting image, and obtain a recognition accuracy rate based on the recognition Chinese characters and the tag Chinese characters. If the recognition accuracy rate is greater than a preset accuracy rate, Then the handwriting training model is determined as the handwriting recognition model.
  • the convolutional recurrent neural network model includes a convolutional neural network model and a recurrent neural network model.
  • the training model acquisition module 30 includes an image feature acquisition unit 31, a forward output acquisition unit 32, a loss function construction unit 33, and a training model acquisition unit 34.
  • An image feature obtaining unit 31 is configured to input a training set into a convolutional neural network model, and obtain an image feature corresponding to a handwritten image in the training set.
  • a forward output acquisition unit 32 is configured to input image features corresponding to the handwritten image in the training set into a recurrent neural network model for training, and obtain a forward output of the recurrent neural network model.
  • the loss function construction unit 33 is configured to construct a loss function according to the forward output of the recurrent neural network model and the labeled Chinese characters.
  • the specific expression of the loss function is: Among them, N is the number of handwriting images in the handwriting training sample, E loss ( ⁇ ) is the average of the total errors corresponding to all handwriting images in the N handwriting training samples, and M is the handwriting in the handwriting training samples.
  • the number of sequential labels carried by the image Represents the forward output of the handwritten image corresponding to the mth sequential label in the nth handwritten training sample, Represents the label Chinese character corresponding to the mth sequential label in the nth handwriting training sample, and ⁇ represents the set of weights and biases.
  • the training model acquisition unit 34 is configured to update the recursive neural network model and the convolutional neural network model based on the loss function and use the back-propagation algorithm based on batch gradient descent to obtain the handwriting training model.
  • a Chinese character recognition method specifically includes the following steps:
  • S51 Acquire an original image.
  • the original image includes handwriting and a background image.
  • the original image refers to a specific image that has not undergone any processing, and the specific image refers to an image that needs to include handwriting.
  • the original images in this embodiment include handwriting and background images.
  • the background image refers to an image corresponding to a background pattern on the original image.
  • the method for acquiring the original image includes, but is not limited to, crawling from a webpage or acquiring the database connected to the server, and the original image on the database may be an image uploaded in advance by the terminal device.
  • the effective image refers to the image after the original image is preprocessed.
  • the specific steps for the server to obtain a valid image are: (1) determine whether the original image is a color image, and if the original image is a color image, perform grayscale processing on the original image to obtain a grayscale image so that each pixel in the color image corresponds to The three components of R (red), G (green), and B (blue) can be replaced with one value, which helps to simplify the complexity of subsequent extreme normalization processing. Understandably, if the original image is not a color image, the original image is a grayscale image, and no further graying process is required. (2) Perform a range normalization process on the pixel matrix corresponding to the grayscale image to obtain a valid image. Performing the range normalization processing on the pixel matrix corresponding to the grayscale image can preserve the relative relationship in the pixel matrix while improving the calculation speed.
  • S53 Use the kernel density estimation algorithm and the erosion method to process the effective image, remove the background image, and obtain a target image including handwriting.
  • Kernel density estimation algorithm is a non-parametric method that studies the data distribution characteristics from the data sample itself and is used to estimate the probability density function.
  • the specific formula of the kernel density estimation algorithm is
  • the etching method refers to a method of performing an etching treatment on an image, wherein the etching refers to removing a portion of the background image in the image and leaving only the handwritten portion.
  • the formula of the kernel density estimation algorithm is used to process the frequency distribution histogram corresponding to the effective image, to obtain the smooth curve corresponding to the frequency distribution histogram, and to obtain the minimum value according to the minimum value and the maximum value on the smoothed curve.
  • the layered image is corroded to remove the background image. Keep the handwritten part.
  • the layered and corroded images are superimposed to obtain the target image.
  • the superposition processing refers to a process of superimposing the layered image with only the handwritten portion into an image, thereby achieving the purpose of obtaining a target image.
  • S54 The vertical projection method is used to perform single font cutting on the target image to obtain a single font image.
  • the vertical projection method refers to a method in which each line of handwritten characters is projected in a vertical direction to obtain a vertical projection histogram.
  • the vertical projection histogram is a graph that reflects the number of pixels of the target image in the vertical direction.
  • the abscissa axis of the vertical projection histogram represents the width of the target image, and the ordinate indicates the distribution of the number of pixels of the target image.
  • the cutting threshold cuts the target image to obtain a single font image.
  • a single font image refers to an image corresponding to a single font.
  • the cutting threshold refers to a preset handwriting for cutting a target image to obtain a single font.
  • the target image is cut with a single font. If the preset cutting threshold is 10, when the number of pixels in the vertical projection histogram corresponding to the target image is less than or equal to 10 (0, 9 and 10), then the number of pixels (0, 9 and 10) corresponds to The position of the abscissa is a dividing point between two adjacent handwritings, and the target image is cut by a single font at the dividing point to obtain a single font image corresponding to the target image. Understandably, the pixels corresponding to each handwriting are relatively concentrated, and the pixels corresponding to the gap between Chinese characters are sparse. The density of the pixels is reflected in the corresponding vertical projection histogram, which means that there are Chinese characters. The number of pixels corresponding to pixels is relatively high, and the number of pixels corresponding to pixels without Chinese characters is relatively low.
  • the vertical projection method can effectively perform single font cutting on the target image, obtain single font images, and provide technical support for subsequent model recognition.
  • the single-font image is input to a handwriting recognition model for recognition, and a recognition result corresponding to the single-font image is obtained.
  • the handwriting recognition model is obtained by using the above-mentioned handwriting model training method.
  • the handwriting recognition model is a pre-trained model for identifying handwriting.
  • the recognition result refers to an output with a recognition probability greater than a preset probability.
  • the preset probability refers to a preset probability used to determine whether the recognition probability meets the requirements.
  • a single font image is input into a handwriting recognition model, and a recognition probability corresponding to each single font image is obtained.
  • the recognition probability refers to a probability that the single font image may be a specific Chinese character.
  • the recognition probability is compared with a preset probability. If the recognition probability is greater than the preset probability, the corresponding recognition result is obtained, which is helpful to improve the accuracy of the recognition result.
  • the preset probability is 85%
  • a single font image corresponding to “sea” is input into the handwriting recognition model to obtain a recognition result with a recognition probability greater than the preset probability.
  • the recognition result may be " ⁇ ” or " ⁇ ", That is, the recognition probability of the single-font image corresponding to “sea” as “ ⁇ ” or “ ⁇ ” is greater than 85%, so two recognition results “ ⁇ ” or “ ⁇ ” may be output.
  • the semantic database is a preset knowledge base for performing semantic analysis on the recognition results. Semantic analysis is an analysis of the context-dependent nature of the recognition results.
  • the semantic library is composed of a large number of Chinese sentences.
  • the target Chinese character is the Chinese character corresponding to the single-font image that matches the semantics after querying the semantic database.
  • the target Chinese character needs to be further determined according to the semantic database, such as the recognition result corresponding to the four single-font images of "sea”, “dead”, “stone”, and “rotten” as “ ⁇ ” or ““ “ “Sea”, “dead”, “stone”, and “rotten” or “column”.
  • the semantic database needs to be queried.
  • the Chinese sentence judgments included are more accurate recognition results.
  • the Chinese character recognition method provided in this embodiment obtains a valid image by preprocessing the original image, and processes the valid image by using a kernel density estimation algorithm and an erosion method to remove the part of the background image and retain the target containing only handwriting Image, providing data source for subsequent single font cutting.
  • the vertical projection method is used to cut the single font of the target image to obtain the single font image.
  • the obtained single font image is input to the handwriting recognition model for recognition, and the recognition result is obtained based on the recognition probability value corresponding to the single font image.
  • Using handwriting recognition model to identify single font images can improve the recognition accuracy.
  • the semantic database is queried based on the recognition result, and the target Chinese character corresponding to the single font image is obtained according to the Chinese sentences stored in the semantic database, and the target Chinese character is the Chinese character corresponding to the single font image.
  • the target Chinese character is the Chinese character corresponding to the single font image.
  • the target Chinese characters corresponding to the single font image can be filtered accurately. The accuracy of the handwriting recognition can be improved through the judgment of the handwriting recognition model and the semantic library.
  • step S52 preprocessing the original image to obtain a valid image, specifically includes the following steps:
  • the size of the handwriting itself is relatively small compared to the background image.
  • the handwriting is easily mishandled. Therefore, in order to ensure that the handwriting is no longer grayscale It was mistakenly cleared during the processing, and each pixel corresponding to the original image needs to be enlarged.
  • the size of the nth pixel in the original image is x n
  • each pixel in the original image is power-magnified to make x n becomes .
  • enlarging the pixels in the original image can effectively avoid handwriting being mistakenly processed when the original image is grayed out.
  • the original image is enlarged, if the original image is not a grayscale image but a color image, it is necessary to perform grayscale processing on the original image to obtain a grayscale image. Understandably, if the original image is a grayscale image, no grayscale processing is required.
  • Sampling pixels according to which the grayscale image is formed; wherein R (red), G (green), and B (blue) are the three components in the original image, and the sampling pixels are the grayscale images used to replace the color image The pixels corresponding to the three components of R, G, and B.
  • Graying the original image as a color image effectively reduces the amount of data and computational complexity required to obtain valid images in subsequent steps.
  • S522 Perform range normalization processing on the pixel matrix corresponding to the grayscale image to obtain a valid image, where the range normalization formula is: x is the pixel of the effective image before normalization, x ′ is the pixel of the effective image after normalization, M min is the smallest pixel in the pixel matrix M corresponding to the grayscale image, and M max is the largest pixel in the pixel matrix M corresponding to the grayscale image .
  • the range standardization processing is a processing method for processing data to make the data compressed in the range of (0, 1). Standardizing the spread of the pixel matrix corresponding to the grayscale image and multiplying it by 255 can facilitate the processing of the data in the pixel matrix, while retaining the relationship between the pixels in the pixel matrix.
  • the background image and each handwriting have their own corresponding pixel matrix. After obtaining the background image in the grayscale image and the pixel matrix corresponding to each handwriting, the pixel matrix is subjected to a range normalization process to obtain an effective image corresponding to the pixel matrix after the range normalization process. Performing the range normalization processing on the pixel matrix can improve the processing speed of obtaining the target image.
  • steps S521-S522 by performing an enlargement process on the original image, it is possible to effectively avoid a situation where the handwriting is mishandled when the original image is grayed out in the next step.
  • the grayscale processing is performed on the original image, and obtaining a grayscale image can reduce the amount of data that needs to be processed in subsequent steps. Performing the range standardization processing on the grayscale image can improve the processing speed of obtaining the target image.
  • step S53 uses a kernel density estimation algorithm and an erosion method to process the effective image, removes the background image, and obtains a target image including handwriting, which specifically includes the following steps:
  • S531 Count the number of occurrences of pixels in the effective image, and obtain a frequency distribution histogram corresponding to the effective image.
  • the horizontal axis of the frequency distribution histogram represents continuous values of the sample data, and each cell on the horizontal axis corresponds to the group distance of a group as the bottom edge of the small rectangle; the vertical axis represents the ratio of the frequency to the group distance, and uses this
  • the ratio is the height of a small rectangle, and a group of graphs composed of multiple small rectangles is called a frequency histogram.
  • the horizontal axis of the frequency histogram indicates that the pixels are continuous values between (0, 255), the group distance corresponding to each small rectangle on the horizontal axis is 1, and the vertical axis indicates the corresponding value of the small rectangle.
  • the ratio is the height of the corresponding small rectangle.
  • the frequency distribution histogram can vividly display the number of occurrences of pixels in the effective image, so that the distribution of the data can be reflected at a glance.
  • S532 The Gaussian kernel density estimation method is used to process the frequency distribution histogram to obtain the frequency maximum and frequency minimum corresponding to the frequency distribution histogram, and obtain corresponding pixels according to the frequency maximum and frequency minimum.
  • Gaussian kernel density estimation method refers to a kernel density estimation method whose kernel function is a Gaussian kernel.
  • the function corresponding to the Gaussian kernel is Among them, K (x) refers to a Gaussian kernel function in which pixels (independent variables) are x, x refers to pixels, and e and ⁇ are constants.
  • the frequency maximum value refers to the frequency value whose frequency value is the maximum value in the frequency distribution histogram; the frequency minimum value refers to the frequency value whose frequency value is the minimum value in the frequency distribution histogram.
  • a Gaussian kernel density function estimation method is used to perform Gaussian smoothing on the frequency distribution histogram corresponding to the obtained effective image, and obtain a Gaussian smooth curve corresponding to the frequency distribution histogram. Based on the frequency maximum and the frequency minimum on the Gaussian smooth curve, pixels corresponding to the frequency maximum and the frequency minimum on the horizontal axis are obtained. In this embodiment, pixels corresponding to the maximum frequency value and the minimum frequency value are acquired, which facilitates subsequent hierarchical differentiation of valid images and acquires a hierarchical image.
  • S533 Perform layer processing on the effective image based on the pixels corresponding to the maximum frequency and the minimum frequency to obtain a layered image.
  • Layered image refers to the image obtained by layering the effective image based on the frequency maximum and frequency minimum. Obtain the pixels corresponding to the frequency maximum and frequency minimum, and layer the effective image according to the pixels corresponding to the frequency maximum. How many frequency maximums are in the effective image, the pixels of the corresponding effective image are aggregated. As many classes as there are, the effective image is divided into several layers. Then the pixels corresponding to the minimum frequency are used as the boundary values between the classes. According to the boundaries between the classes, the pixels corresponding to each layer of the layered image can be obtained.
  • the pixels corresponding to the maximum frequency in the effective image are 12, 54, 97, 113, 159, and 172, and the pixels corresponding to the minimum frequency are 26, 69, 104, 139, and 163.
  • the number of frequency maxima can determine that the pixels of the effective image can be divided into 6 categories, and the effective image can be divided into 6 layers.
  • the pixels corresponding to the minimum frequency are used as the boundary value between the classes. Is 0, and the largest pixel is 255.
  • a layered image with a pixel of 12 can be determined, and the layered image corresponds to a pixel range of [0,26); with a pixel of 54, A layered image with a corresponding pixel range of [26,69); a layered image with 97 pixels and a corresponding pixel range of [69,104); a layered image with 113 pixels.
  • the layered image corresponds to a pixel range of [104,139); a layered image with a pixel of 159 corresponds to a pixel range of [139,163); a layered image with a pixel of 172 corresponds to a layered image
  • the pixel range is [163,255].
  • S534 Etching the layered image, and superimposing the layered image after the etching process to obtain a target image.
  • the layered image is binarized.
  • the binarization process refers to a process in which pixels on an image are set to 0 (black) or 1 (white), and the entire image presents a clear black and white effect.
  • the binarized layered image is etched to remove the background image portion and retain the handwritten portion on the layered image.
  • the etching process is an operation for removing the content of a part of an image in morphology. Because the pixels on each layered image are pixels belonging to different ranges, after the layered image is etched, each layered image also needs to be superimposed to generate a target image containing only handwriting.
  • steps S531-S534 a frequency distribution histogram corresponding to the effective image is obtained, and pixels corresponding to the maximum frequency value and the minimum frequency value are obtained according to the frequency distribution histogram, thereby obtaining a layered image. Finally, the layered image is binarized, eroded, and superimposed to complete the recognition of handwriting and background image in the original image. The background image is removed to obtain the target image of handwriting.
  • step S534 the layered image is etched, which specifically includes the following steps:
  • the layered binarized image refers to an image obtained by binarizing the layered image. Specifically, after obtaining the layered image, comparing the sampled pixels of the layered image with a pre-selected threshold, and setting the pixels whose sampling is greater than or equal to the threshold to 1, and the pixels less than the threshold to 0.
  • 0 represents a background pixel
  • 1 represents a target pixel (handwriting pixel).
  • This threshold can be obtained by calculating the inter-class variance of the layered image, or it can be obtained based on empirical values.
  • the size of the threshold will affect the effect of binarizing the layered image. If the threshold is selected properly, the effect of binarizing the layered image is better. Accordingly, if the threshold is not selected properly, the layered image will be affected. The effect of binarization.
  • the threshold in this embodiment is determined based on empirical values.
  • S5342 Detect pixels in the layered binarized image to obtain a connected area corresponding to the layered binarized image.
  • the connected area refers to an area surrounded by adjacent pixels around a specific pixel. If a certain pixel is 0 and its neighboring pixels are 1, the area surrounded by the neighboring pixels is regarded as the connected area.
  • the pixel matrix corresponding to the layered binarized image is scanned progressively, and the pixel directions that meet the connectivity rule (4 neighborhood connectivity or 8 neighborhood connectivity) are scanned. Identical numbers are marked.
  • 4 neighborhood connectivity refers to the situation where a specific pixel is the same as the pixels adjacent in the four directions of up, down, left, and right;
  • 8 neighborhood connectivity refers to a specific pixel up, down, left, right, upper left, lower left, upper right, and right The case where the adjacent pixels in the next eight directions are the same.
  • the pixel matrix includes rows and columns.
  • the specific process of detecting and labeling the pixels in the binarized image is: (1) Scan the pixel matrix line by line, and form a sequence of pixels (target pixels) that are consecutively 1 in each line. This sequence is called a cluster, and it is labeled well. The start, end, and line number of the group. The starting point of the group refers to the first pixel of the group, and the ending point of the group refers to the last pixel of the group. (2) For the clusters in the remaining rows except the first row in the pixel matrix, compare whether the clusters in a specific residual row and all clusters in the previous row have coincident regions.
  • the associated group refers to the group on the previous line that has a coincident area with the group of the specific remaining line; the equivalent pair refers to the label on the group that is connected to each other.
  • the specific residual in a pixel matrix is the third row.
  • the cluster A and the two clusters in the second row are labeled 1
  • the smallest number 1 of the two groups in the second row is assigned to the A group, the number of the A group is 1, and the corresponding numbers of the A group, the 1 group, and the 2 group are recorded as Price pairs, that is, (1, 2) will be recorded as equivalent pairs.
  • the clique labeled 1 and 2 are called a connected region.
  • the imerode function in MATLAB or the cvErode function in OpenCV is used to etch the connected regions of the layered binary image. Specifically, one structural pixel is selected. In this embodiment, 8 pixels adjacent to a characteristic pixel in the pixel matrix are used as the connected regions of the characteristic pixel. Therefore, the selected structural pixel is a 3 ⁇ 3 pixel matrix. Use the structured pixels to scan the pixel matrix of the layered binary image, and compare whether the pixel matrix in the layered binary image is completely consistent with the structured pixels.
  • the corresponding 9 pixels in the pixel matrix are all Becomes 1; if they are not completely consistent, the corresponding 9 pixels in the pixel matrix will all become 0, where 0 (black) is the corroded part of the layered binary image.
  • the layered binarized image is filtered based on the preset corrosion resistance range of the handwritten area, and the layered binary image that is not within the range of the corrosion resistance of the handwritten area is partially deleted to obtain The area within the corrosion resistance of the handwriting area.
  • the anti-corrosion ability of the hand-written area can adopt the formula: Calculated, s 1 represents the total area after being eroded in the layered binary image, and s 2 represents the total area before being eroded in the layered binary image.
  • the preset anti-corrosion range of the handwriting area is [0.05,0.8], according to the formula Calculate the ratio of the total area of each layered binary image after being etched to the total area of the layered binary image before being etched.
  • the ratio of the total area after corrosion to the total area before corrosion in a layered binary image is not within the preset corrosion resistance range of the handwritten area, indicating that the layered binary image of the area is Handwriting needs to be kept.
  • the ratio of the total area after erosion to the total area before erosion in the layered binarized image is in the range of [0.05,0.8], which means that the layered binarized image in the area is handwritten and needs to be retained.
  • Steps S5341-S5343 Binarize the layered image to obtain a layered binary image, and then detect the pixels in the layered binary image to obtain the pixel matrix corresponding to the layered binary image.
  • the connected area of each pixel uses structured pixels to detect the connected area of each pixel.
  • the pixels in the pixel matrix that are not completely consistent with the structured pixels become 0, and the layered binary image with 0 pixels is black.
  • the black part is the corroded part of the layered binary image.
  • the ratio is determined by calculating the ratio of the total area of the layered binary image after being eroded and the total area of the layered binary image before being eroded. Whether the anti-corrosion capability range of the preset handwriting area is removed, the background image is removed, and handwriting is retained to achieve the purpose of obtaining the target image.
  • the Chinese character recognition method obtains a grayscale image by enlarging and graying the original image, and then performs standardization processing on the price difference to obtain a valid image. It is convenient for subsequent steps to use the Gaussian kernel density estimation algorithm to layer, binarize, corrode and superimpose the effective image, remove the background image, and retain the target image containing only handwriting.
  • the vertical projection method is used to cut the single font of the target image to obtain the single font image.
  • the obtained single font image is input to the handwriting recognition model for recognition, and the recognition result is obtained based on the recognition probability value corresponding to the single font image.
  • Query the semantic database based on the recognition results, and obtain the target Chinese characters corresponding to the single font image according to the Chinese sentences stored in the semantic database.
  • the accuracy of the handwriting recognition can be improved through the handwriting recognition model and the judgment and screening of the semantic database.
  • a Chinese character recognition device is provided, and the Chinese character recognition device corresponds to the Chinese character recognition method in the above-mentioned one-to-one correspondence.
  • the Chinese character recognition device includes an original image acquisition module 51, a valid image acquisition module 52, a target image acquisition module 53, a single font image acquisition module 54, a recognition result acquisition module 55, and a target Chinese character confirmation module 56.
  • the detailed description of each function module is as follows:
  • the original image obtaining module 51 is configured to obtain an original image, and the original image includes a handwriting and a background image.
  • An effective image acquisition module 52 is configured to pre-process the original image to obtain an effective image.
  • a target image acquisition module 53 is configured to process a valid image by using a kernel density estimation algorithm and an erosion method, remove a background image, and obtain a target image including handwriting.
  • the single-font image acquisition module 54 is configured to perform single-font cutting on a target image by using a vertical projection method to obtain a single-font image.
  • the recognition result acquisition module 55 is configured to input a single font image into a handwriting recognition model for recognition, and obtain a recognition result corresponding to the single font image.
  • the handwriting recognition model is obtained by using the above handwriting model training method.
  • the target Chinese character confirmation module 56 is configured to query a semantic library based on the recognition result to obtain a target Chinese character corresponding to a single font image.
  • the effective image acquisition module 52 includes a grayscale image acquisition unit 521 and a range normalization processing unit 522.
  • a grayscale image acquisition unit 521 is configured to perform enlargement and grayscale processing on an original image to obtain a grayscale image.
  • the range standardization processing unit 522 is configured to perform range standardization processing on a pixel matrix corresponding to a grayscale image to obtain a valid image, where the formula of the range standardization processing is x is the pixel of the effective image before normalization, x ′ is the pixel of the effective image after normalization, M min is the smallest pixel in the pixel matrix M corresponding to the grayscale image, and M max is the largest pixel in the pixel matrix M corresponding to the grayscale image .
  • the target image acquisition module 53 includes a first processing unit 531, a second processing unit 532, a layered image acquisition unit 533, and a layered image processing unit 534.
  • the first processing unit 531 is configured to count the number of occurrences of pixels in the effective image, and obtain a frequency distribution histogram corresponding to the effective image.
  • a second processing unit 532 is configured to process the frequency distribution histogram by using a Gaussian kernel density estimation method to obtain a frequency maximum and a frequency minimum corresponding to the frequency distribution histogram, and according to the frequency maximum and the frequency minimum Get the corresponding pixels.
  • a layered image acquisition unit 533 is configured to perform a layered processing on the effective image based on the pixels corresponding to the maximum frequency value and the minimum frequency value to obtain a layered image.
  • a layered image processing unit 534 is configured to perform an erosion process on the layered image, and superimpose the layered image after the erosion process to obtain a target image.
  • the layered image processing unit 534 includes a binarization processing unit 5341, a connected region acquisition unit 5342, and a connected region processing unit 5343.
  • a binarization processing unit 5341 is configured to perform binarization processing on the layered image to obtain a layered binarized image.
  • the connected region obtaining unit 5342 is configured to detect pixels in the layered binarized image to obtain a connected region corresponding to the layered binarized image.
  • the connected region processing unit 5343 is configured to perform an erosion process on the connected region corresponding to the layered binary image.
  • a computer device is provided.
  • the computer device may be a server, and the internal structure diagram may be as shown in FIG. 10.
  • the computer device includes a processor, a memory, a network interface, and a database connected through a system bus.
  • the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer-readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer-readable instructions in a non-volatile storage medium.
  • the computer equipment database is used to store handwriting recognition models.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer-readable instructions are executed by a processor to implement a handwriting model training method.
  • a computer device which includes a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor.
  • the processor executes the computer-readable instructions, the processor implements the following steps: obtaining handwriting Character training samples, handwriting training samples include handwriting images and labeled Chinese characters associated with handwriting images; dividing the handwriting training samples into training sets and test sets; inputting the training sets into a convolutional recurrent neural network model to obtain volumes The forward output of the convolutional recurrent neural network model.
  • a back-propagation algorithm based on batch gradient descent is used to update the weights and offsets in the convolutional recurrent neural network model to obtain handwriting.
  • Training model input the test set into the handwriting training model, obtain the recognition Chinese characters corresponding to each handwriting image, and obtain the recognition accuracy rate based on the recognized Chinese characters and label Chinese characters. If the recognition accuracy rate is greater than the preset accuracy rate, determine the handwriting training
  • the model is a handwriting recognition model.
  • the convolutional recurrent neural network model includes a convolutional neural network model and a recurrent neural network model
  • the training set is input into the convolutional neural network model to obtain The image features corresponding to the handwritten images in the training set
  • the image features corresponding to the handwritten images in the training set are input to the recurrent neural network model for training to obtain the forward output of the recurrent neural network model; according to the forward output of the recurrent neural network model and Label Chinese characters and construct a loss function.
  • N is the number of handwriting images in the handwriting training sample
  • E loss ( ⁇ ) is the average of the total errors corresponding to all handwriting images in the N handwriting training samples
  • M is the handwriting in the handwriting training samples.
  • the number of sequential labels carried by the image Represents the forward output of the handwritten image corresponding to the mth sequential label in the nth handwritten training sample, Represents the label Chinese character corresponding to the mth sequential label in the nth handwriting training sample, and ⁇ represents the set of weights and biases; according to the loss function, the back-propagation algorithm based on batch gradient descent is used to update the recurrent neural network model and The weights and biases in the convolutional neural network model are used to obtain the handwriting training model.
  • one or more non-volatile storage media storing computer-readable instructions are provided, and when the computer-readable instructions are executed by one or more processors, the one or more processors implement The following steps: Obtain a handwriting training sample, which includes a handwriting image and a label Chinese character associated with the handwriting image; divide the handwriting training sample into a training set and a test set; and input the training set to a convolutional recurrent neural network
  • the forward output of the convolutional recurrent neural network model is obtained, and according to the forward output of the convolutional recurrent neural network model, the weight and bias in the convolutional recurrent neural network model are updated using a back-propagation algorithm based on batch gradient descent.
  • Get the handwriting training model input the test set into the handwriting training model, get the recognition Chinese characters corresponding to each handwriting image, and get the recognition accuracy rate based on the recognized Chinese characters and label Chinese characters. If the recognition accuracy rate is greater than the preset accuracy rate, Then the handwriting training model is determined as the handwriting recognition model.
  • the convolutional recurrent neural network model includes a convolutional neural network model and a recurrent neural network model; and the training set is input into the convolutional neural network model, Obtain the image features corresponding to the handwritten images in the training set; input the image features corresponding to the handwritten images in the training set to the recurrent neural network model for training to obtain the forward output of the recurrent neural network model; according to the forward output of the recurrent neural network model And label Chinese characters to build a loss function.
  • N is the number of handwriting images in the handwriting training sample
  • E loss ( ⁇ ) is the average of the total errors corresponding to all handwriting images in the N handwriting training samples
  • M is the handwriting in the handwriting training samples.
  • the number of sequential labels carried by the image Represents the forward output of the handwritten image corresponding to the mth sequential label in the nth handwritten training sample, Represents the label Chinese character corresponding to the mth sequential label in the nth handwriting training sample, and ⁇ represents the set of weights and biases; according to the loss function, the back-propagation algorithm based on batch gradient descent is used to update the recurrent neural network model and The weights and biases in the convolutional neural network model are used to obtain the handwriting training model.
  • a computer device including a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor.
  • the processor executes the computer-readable instructions, the processor implements the following steps: obtaining the original Image, original image includes handwriting and background image; preprocess the original image to obtain a valid image; use kernel density estimation algorithms and erosion methods to process the effective image, remove the background image, and obtain the target image including handwriting; use vertical
  • the projection method performs single font cutting on the target image to obtain a single font image.
  • the single font image is input to a handwriting recognition model for recognition, and the corresponding recognition result of the single font image is obtained.
  • the handwriting recognition model uses the above handwriting model training method. Obtained; query the semantic library based on the recognition results to obtain the target Chinese characters corresponding to the single font image.
  • the processor executes the computer-readable instructions, the following steps are further implemented: the original image is enlarged and grayed out to obtain a grayscale image; the pixel matrix corresponding to the grayscale image is subjected to extreme standardization processing to obtain Valid image, where the formula for range normalization is x is the pixel of the effective image before normalization, x ′ is the pixel of the effective image after normalization, M min is the smallest pixel in the pixel matrix M corresponding to the grayscale image, and M max is the largest pixel in the pixel matrix M corresponding to the grayscale image .
  • the processor when the processor executes the computer-readable instructions, the processor further implements the following steps: counting the number of occurrences of pixels in the effective image, obtaining a frequency distribution histogram corresponding to the effective image; and adopting a Gaussian kernel density estimation method for frequency distribution.
  • the histogram is processed to obtain the frequency maximum and frequency minimum corresponding to the frequency distribution histogram, and to obtain corresponding pixels according to the frequency maximum and frequency minimum; based on the frequency maximum and frequency minimum corresponding
  • the pixels perform a layered process on the effective image to obtain a layered image; a layered image is subjected to an erosion process, and the layered image after the erosion process is superimposed to obtain a target image.
  • the processor when the processor executes the computer-readable instructions, the following steps are further implemented: performing a binarization process on the layered image to obtain a layered binarized image; and detecting and marking pixels in the layered binarized image. To obtain the connected area corresponding to the layered binary image; and perform the corrosion treatment on the connected area corresponding to the layered binary image.
  • one or more non-volatile storage media storing computer-readable instructions are provided, and when the computer-readable instructions are executed by one or more processors, the one or more processors implement The following steps: obtain the original image, including the handwriting and background image; preprocess the original image to obtain a valid image; use the kernel density estimation algorithm and the erosion method to process the effective image, remove the background image, and obtain the handwritten image
  • the target image the single-font cutting is performed on the target image using the vertical projection method to obtain the single-font image
  • the single-font image is input to the handwriting recognition model for recognition, and the recognition result corresponding to the single-font image is obtained.
  • the handwriting recognition model adopts the above Obtained by the handwriting model training method; query the semantic library based on the recognition results to obtain the target Chinese characters corresponding to the single font image.
  • the following steps are further implemented: the original image is enlarged and gray-scaled to obtain a gray-scale image; the pixel matrix corresponding to the gray-scale image is subjected to extreme standardization processing, Obtain a valid image, where the formula for range normalization is x is the pixel of the effective image before normalization, x ′ is the pixel of the effective image after normalization, M min is the smallest pixel in the pixel matrix M corresponding to the grayscale image, and M max is the largest pixel in the pixel matrix M corresponding to the grayscale image .
  • the following steps are also implemented: counting the number of times the pixels in the effective image appear, obtaining a frequency distribution histogram corresponding to the effective image; using a Gaussian kernel density estimation method to measure the frequency
  • the distribution histogram is processed to obtain the frequency maximum and frequency minimum corresponding to the frequency distribution histogram, and the corresponding pixels are obtained according to the frequency maximum and frequency minimum; corresponding to the frequency maximum and frequency minimum
  • the effective pixels are layered to obtain a layered image; the layered image is etched; and the layered image after the etched layer is superimposed to obtain a target image.
  • the following steps are further implemented: performing a binarization process on the layered image to obtain a layered binarized image; and detecting pixels in the layered binarized image. Mark, to obtain the connected area corresponding to the layered binary image; and to corrode the connected area corresponding to the layered binary image.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Character Discrimination (AREA)

Abstract

一种手写字模型训练方法、汉字识别方法、装置、设备及介质。该方法包括:获取手写字训练样本,手写字训练样本包括手写字图像和与所述手写字图像关联的标签汉字(S10);将手写字训练样本划分成训练集和测试集(S20);将训练集输入到卷积循环神经网络模型中,获取卷积循环神经网络模型的前向输出,根据卷积循环神经网络模型的前向输出,采用基于批量梯度下降的反向传播算法更新卷积循环神经网络模型中的权值和偏置,获取手写字训练模型(S30);将测试集输入到手写字训练模型中,获取每一手写字图像对应的识别汉字,基于识别汉字和标签汉字获取识别准确率,若识别准确率大于预设准确率,则确定手写字训练模型为手写字识别模型(S40)。该手写字识别模型对手写字的识别具有较高的准确性。

Description

手写字模型训练方法、汉字识别方法、装置、设备及介质
本申请以2018年6月4日提交的申请号为201810563511.2,名称为“手写字模型训练方法、汉字识别方法、装置、设备及介质”的中国发明专利申请为基础,并要求其优先权。
技术领域
本申请涉及手写字识别领域,尤其涉及一种手写字模型训练方法、汉字识别方法、装置、设备及介质。
背景技术
传统汉字的识别方法大多会采用OCR(Optical Character Recognition,光学字符识别)技术进行识别。由于汉字的类别繁多,比如“宋体、楷体、姚体和仿宋”,而且部分汉字的结构比较复杂,比如“魑、魅”,并且汉字中存在着较多的结构相似的字,比如“受和爱”,使得汉字识别准确性无法保证。对标准的、书写简单且规范的句子,采用OCR(光学字符识别)技术可以识别,但是对于手写的字组成的句子,由于每个人的书写习惯不相同且不是标准的横竖撇捺组成的汉字,采用OCR技术识别时,会存在识别不准确的情况,极大限制了识别系统的性能,造成识别的精确度不高,使得识别效果不理想。
发明内容
基于此,有必要针对上述技术问题,提供一种可以提高识别准确度的手写字模型训练方法、装置、设备及介质。
一种手写字模型训练方法,包括:
获取手写字训练样本,所述手写字训练样本包括手写字图像和与所述手写字图像关联的标签汉字;
将所述手写字训练样本划分成训练集和测试集;
将所述训练集输入到卷积循环神经网络模型中,获取卷积循环神经网络模型的前向输出,根据所述卷积循环神经网络模型的前向输出,采用基于批量梯度下降的反向传播算法更新所述卷积循环神经网络模型中的权值和偏置,获取手写字训练模型;
将所述测试集输入到所述手写字训练模型中,获取每一手写字图像对应的识别汉字,基于所述识别汉字和所述标签汉字获取识别准确率,若所述识别准确率大于预设准确率,则确定所述手写字训练模型为手写字识别模型。
一种手写字模型训练装置,包括:
训练样本获取模块,用于获取手写字训练样本,所述手写字训练样本包括手写字图像和与所述手写字图像关联的标签汉字;
训练样本处理模块,用于将所述手写字训练样本划分成训练集和测试集;
训练模型获取模块,用于将所述训练集输入到卷积循环神经网络模型中,获取卷积循环神经网络模型的前向输出,根据所述卷积循环神经网络模型的前向输出,采用基于批量梯度下降的反向传播算法更新所述卷积循环神经网络模型中的权值和偏置,获取手写字训练模型;
识别模型获取模块,用于将所述测试集输入到所述手写字训练模型中,获取每一手写字图像对应的识别汉字,基于所述识别汉字和所述标签汉字获取识别准确率,若所述识别准确率大于预设准确率,则确定所述手写字训练模型为手写字识别模型。
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
获取手写字训练样本,所述手写字训练样本包括手写字图像和与所述手写字图像关联的标签汉字;
将所述手写字训练样本划分成训练集和测试集;
将所述训练集输入到卷积循环神经网络模型中,获取卷积循环神经网络模型的前向输出,根据所述卷积循环神经网络模型的前向输出,采用基于批量梯度下降的反向传播算法更新所述卷积循环神经网络模型中的权值和偏置,获取手写字训练模型;
将所述测试集输入到所述手写字训练模型中,获取每一手写字图像对应的识别汉字,基于所述识别汉字和所述标签汉字获取识别准确率,若所述识别准确率大于预设准确率,则确定所述手写字训练模型 为手写字识别模型。
一个或多个存储有计算机可读指令的非易失性可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器实现如下步骤:
获取手写字训练样本,所述手写字训练样本包括手写字图像和与所述手写字图像关联的标签汉字;
将所述手写字训练样本划分成训练集和测试集;
将所述训练集输入到卷积循环神经网络模型中,获取卷积循环神经网络模型的前向输出,根据所述卷积循环神经网络模型的前向输出,采用基于批量梯度下降的反向传播算法更新所述卷积循环神经网络模型中的权值和偏置,获取手写字训练模型;
将所述测试集输入到所述手写字训练模型中,获取每一手写字图像对应的识别汉字,基于所述识别汉字和所述标签汉字获取识别准确率,若所述识别准确率大于预设准确率,则确定所述手写字训练模型为手写字识别模型。
基于此,有必要针对上述技术问题,提供一种识别准确度较高的汉字识别方法、装置、设备及介质。
一种汉字识别方法,包括:
获取原始图像,所述原始图像包括手写字和背景图像;
对所述原始图像进行预处理,获取有效图像;
采用核密度估计算法和腐蚀方法对所述有效图像进行处理,去除背景图像,获取包括所述手写字的目标图像;
采用垂直投影方法对所述目标图像进行单字体切割,获取单字体图像;
将所述单字体图像输入到手写字识别模型中进行识别,获取所述单字体图像对应的识别结果,所述手写字识别模型是采用上述手写字模型训练方法获取到的;
基于所述识别结果查询语义库,获取所述单字体图像对应的目标汉字。
一种汉字识别装置,包括:
原始图像获取模块,用于获取原始图像,所述原始图像包括手写字和背景图像;
有效图像获取模块,用于对所述原始图像进行预处理,获取有效图像;
目标图像获取模块,用于采用核密度估计算法和腐蚀方法对所述有效图像进行处理,去除背景图像,获取包括所述手写字的目标图像;
单字体图像获取模块,用于采用垂直投影方法对所述目标图像进行单字体切割,获取单字体图像;
识别结果获取模块,用于将所述单字体图像输入到手写字识别模型中进行识别,获取所述单字体图像对应的识别结果,所述手写字识别模型是采用上述手写字模型训练方法获取到的;
目标汉字确认模块,用于基于所述识别结果查询语义库,获取所述单字体图像对应的目标汉字。
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
获取原始图像,所述原始图像包括手写字和背景图像;
对所述原始图像进行预处理,获取有效图像;
采用核密度估计算法和腐蚀方法对所述有效图像进行处理,去除背景图像,获取包括所述手写字的目标图像;
采用垂直投影方法对所述目标图像进行单字体切割,获取单字体图像;
将所述单字体图像输入到手写字识别模型中进行识别,获取所述单字体图像对应的识别结果,所述手写字识别模型是采用上述手写字模型训练方法获取到的;
基于所述识别结果查询语义库,获取所述单字体图像对应的目标汉字。
一个或多个存储有计算机可读指令的非易失性可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器实现如下步骤:
获取原始图像,所述原始图像包括手写字和背景图像;
对所述原始图像进行预处理,获取有效图像;
采用核密度估计算法和腐蚀方法对所述有效图像进行处理,去除背景图像,获取包括所述手写字的目标图像;
采用垂直投影方法对所述目标图像进行单字体切割,获取单字体图像;
将所述单字体图像输入到手写字识别模型中进行识别,获取所述单字体图像对应的识别结果,所述手写字识别模型是采用上述手写字模型训练方法获取到的;
基于所述识别结果查询语义库,获取所述单字体图像对应的目标汉字。
本申请的一个或多个实施例的细节在下面的附图及描述中提出。本申请的其他特征和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请一实施例中手写字模型训练方法的一应用场景图;
图2是本申请一实施例中手写字模型训练方法的一流程图;
图3是图2中步骤S30的一具体流程图;
图4是本申请一实施例中手写字模型训练装置的一示意图;
图5是本申请一实施例中汉字识别方法的一流程图;
图6是图5中步骤S52的一具体流程图;
图7是图5中步骤S53的一具体流程图;
图8是图7中步骤S534的一具体流程图;
图9是本申请一实施例中汉字识别装置的一示意图;
图10是本申请一实施例中计算机设备的一示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请实施例提供的手写字模型训练方法,可应用在如图1的应用环境中。该手写字模型训练方法的应用环境包括服务器和客户端,其中,客户端通过网络与服务器进行通信,客户端是可与用户进行人机交互的设备,包括但不限于电脑、智能手机和平板等设备。本申请实施例提供的手写字模型训练方法应用于服务器。
在一实施例中,如图2所示,提供一种手写字模型训练方法,该手写字模型训练方法包括如下步骤:
S10:获取手写字训练样本,手写字训练样本包括手写字图像和与手写字图像关联的标签汉字。
具体地,服务器从数据库中获取手写字训练样本,为后续模型训练提供数据来源。其中,手写字训练样本指预先存储在数据库中的用于训练神经网络模型的手写字样本。手写字训练样本中包括手写字图像和与手写字图像关联的标签汉字。该手写字图像指携带有不同人手写的汉字的图像,为了方便训练手写字识别模型,本实施例中一个手写字图像对应一个手写字,每个手写字图像上携带有对应的顺序标签,该顺序标签指用于表示手写字图像顺序的标签。如有N个手写字训练样本,每个手写字训练样本中有M个手写字图像,n为非0自然数,对应的顺序标签则为1、2、3……M。标签汉字指从二级中文字库获取的与手写字图像匹配的标准字体的汉字,标准字体包括但不限于宋体、楷体和仿宋。将标签汉字与手写字图像关联,可以方便识别手写字图像属于什么字。本实施例中,将手写字图像和标签汉字关联时,标签汉字可以只选用使用较普遍的宋体、楷体或仿宋的汉字与手写字图像关联,节省存储空间同时简化模型训练量。如一个手写字图像上的手写字是“我”,该手写字图像关联的标签汉字是二级中文字库中收录的宋体、楷体或仿宋等字体的“我”。
S20:将手写字训练样本划分成训练集和测试集。
其中,训练集(training set)是用于调整卷积循环神经网络模型中的参数的数据。测试集(test set)是用于测试训练好的卷积循环神经网络模型的识别准确率的数据。具体地,采用十折交叉验证方法将手写字训练样本划分成训练集和测试集。其中,十折交叉验证方法是一种常用的测试算法准确性的方法。本实施例中,采用十折交叉验证方法将手写字训练样本按照9:1的比例对进行分类,即将手写字训练 样本分为10组,其中的9组手写字训练样本作为训练集,用于训练卷积循环神经网络模型,剩余的1组手写字训练样本作为测试集,用于验证训练好的卷积循环神经网络模型的准确率。
S30:将训练集输入到卷积循环神经网络模型中,获取卷积循环神经网络模型的前向输出,根据卷积循环神经网络模型的前向输出,采用基于批量梯度下降的反向传播算法更新卷积循环神经网络模型中的权值和偏置,获取手写字训练模型。
其中,卷积循环神经网络(Convolutional-Recurrent Neural Networks,简称C-RNN)模型是由卷积神经网络(Convolutional Neural Networks,简称CNN)模型和循环神经网络(Recurrent Neural Networks,简称RNN)模型组成的一种神经网络模型。卷积循环神经网络模型的前向输出即就是循环神经网络模型的前向输出。批量梯度下降(Batch Gradient Descent,简称BGD)指在根据反向传播算法更新卷积循环神经网络模型的权值和偏置时,基于训练集中的所有手写字训练样本中的手写字图像在卷积循环神经网络模型中获取的前向输出对卷积循环神经网络模型中的权值和偏置进行更新的情况。反向传播(Back Propagation)算法是指按照时序状态的反向顺序调整隐藏层与输出层之间的权值和偏置、以及输入层与隐藏层之间的权值和偏置的算法。手写字训练模型指将训练集输入到卷积循环神经网络模型中进行训练后的模型。
具体地,服务器将训练集输入到卷积循环神经网络模型中,采用卷积神经网络模型中的卷积层和池化层对训练集中的手写字图像进行特征提取和处理,获取每个手写字图像的图像特征。该图像特征指手写字图像经过卷积层和池化层计算后得到的像素矩阵。然后,将该图像特征输入到循环神经网络模型中进行训练,获取循环神经网络模型的前向输出,即获取卷积循环神经网络模型的前向输出。其中,循环神经网络模型的前向输出指手写字图像的图像特征在经过循环神经网络模型的处理,在输出层输出的像素矩阵。
获取卷积循环神经网络模型的前向输出,便于基于卷积循环神经网络模型的前向输出更新卷积循环神经网络模型中的权值和偏重。获取训练集中所有手写字训练样本的手写字图像的前向输出后,基于该前向输出和与手写字图像关联的标签汉字构建一个误差函数,利用误差函数求偏导更新卷积循环神经网络模型中的权值和偏置,从而获取手写字训练模型。采用批量梯度下降的反向传播算法更新卷积循环神经网络模型中的权值和偏置,使得权值和偏置的更新是根据训练集中的所有手写字图像构建的误差函数进行的更新,保证了卷积循环神经网络模型中的参数的充分更新,从而提高了手写字训练模型的识别准确率。
S40:将测试集输入到手写字训练模型中,获取每一手写字图像对应的识别汉字,基于识别汉字和标签汉字获取识别准确率,若识别准确率大于预设准确率,则确定手写字训练模型为手写字识别模型。
其中,手写字识别模型指经过测试集对手写字训练模型进行测试后确定的识别准确率符合预设准确率的模型,该手写字识别模型可用于识别手写字图像的模型。在手写字训练模型训练完成后,将测试集中每一手写字训练样本的手写字图像依次输入到手写字训练模型中,获取每个手写字图像对应的识别汉字,本实施例中的识别汉字具体指手写字图像经过手写字训练模型识别得到的汉字。
根据每一手写字图像对应的识别汉字和标签汉字判断该手写字训练模型对该手写字图像的识别是否准确,若准确,则将识别准确数量加1,然后根据公式:识别准确率=识别准确数量/测试集中手写字图像的数量,计算该手写字训练模型的识别准确率。若该手写字训练模型的识别准确率大于预设准确率,则确定该手写字训练模型为手写字识别模型;反之,若该手写字训练模型的识别准确率不大于预设准确率,则需重新进行手写字训练模型训练,直至手写字训练模型的识别准确率符合要求。其中,预设准确率是预先设置的用于评价手写字训练模型的准确率符合预设要求的阈值。例如,预设准确率为82%,测试集在经过手写字训练模型的识别后,得到的识别准确率大于82%(如85%或者90%等),则表示该手写字训练模型对手写字训练样本的识别准确率达到了要求,该手写字训练模型可以确定为手写字识别模型。
本实施例所提供的手写字模型训练方法中,将训练集输入到卷积循环神经网络模型中,获取卷积循环神经网络模型的前向输出,然后基于卷积循环神经网络模型的前向输出,采用基于批量梯度下降的反向传播算法更新卷积循环神经网络模型中的权值和偏置,获取手写字训练模型,保证了卷积循环神经网络模型中的参数的充分更新,提高了手写字训练模型的识别准确率。最后将测试集输入到手写字训练模 型中进行测试,若手写字训练模型对手写字训练样本的识别准确率大于预设准确率,则表示手写字训练模型对手写字训练样本的识别准确率达到了要求,将该手写字训练模型确定为用于识别手写字图像的手写字识别模型,以使获得的手写字识别模型对手写字进行识别,具有较高的识别准确性。
在一实施例中,由于卷积循环神经网络模型是由卷积神经网络模型和循环神经网络模型组成的一种神经网络模型,因此在基于卷积循环神经网络模型训练手写字训练模型时,需采用卷积神经网络模型和循环神经网络模型进行模型训练。如图3所示,步骤S30,将训练集输入到卷积循环神经网络模型中,获取卷积循环神经网络模型的前向输出,根据卷积循环神经网络模型的前向输出,采用基于批量梯度下降的反向传播算法更新卷积循环神经网络模型中的权值和偏置,获取手写字训练模型,具体包括如下步骤:
S31:将训练集输入到卷积神经网络模型中,获取训练集中手写字图像对应的图像特征。
具体地,卷积神经网络模型包括多层卷积层和池化层。获取训练集后,将对应的手写字训练样本中的手写字图像输入卷积神经网络模型中进行训练,通过每一层卷积层的计算,获取每一层的卷积层的输出,卷积层的输出可以通过公式
Figure PCTCN2018094403-appb-000001
计算,其中,a m l表示第l层卷积层的第m个顺序标签的输出,z m l表示未采用激活函数处理前的第m个顺序标签的输出,a m l-1表示l-1层卷积层的第m个顺序标签输出(即上一层的输出),σ表示激活函数,对于卷积层采用的激活函数σ为ReLu(Rectified Linear Unit,线性整流函数),相比其他激活函数的效果会更好,*表示卷积运算,W l表示第l层卷积层的卷积核(权值),b l表示第l层卷积层的偏置。若第l层是池化层,则在池化层采用最大池化的下样采样对卷积层的输出进行降维处理,具体公式为a m l=pool(a m l-1),其中pool是指下采样计算,该下采样计算可以选择最大池化的方法,最大池化实际上就是在m*m的样本中取最大值。最后通过公式
Figure PCTCN2018094403-appb-000002
获取输出层的输出,T (m)表示卷积神经网络模型输出层的输出,该输出即是要获取第m个顺序标签所对应的手写字图像的图像特征,该图像特征携带有顺序标签,该图像特征的顺序标签与该图像标签对应的手写字图像的顺序标签一致。
S32:将训练集中手写字图像对应的图像特征输入到循环神经网络模型中进行训练,获取循环神经网络模型的前向输出。
具体地,卷积神经网络模型将训练集中手写字图像对应的图像特征输入到循环神经网络模型隐藏层中,根据公式h (m)=σ'(U'T (m-1)+W'T (m)+b')获取该循环神经网络模型的隐藏层的输出,其中,h (m)表示第m个顺序标签在循环神经网络模型中的隐藏层的输出,σ'表示循环神经网络模型的隐藏层的激活函数,U'表示卷积神经网络模型的卷积层和循环神经网络模型的隐藏层之间的权值,若第l层是池化层,则U'表示卷积神经网络模型的池化层和循环神经网络模型的隐藏层之间的权值。W'表示隐藏层和隐藏层之间的权值,b'表示输入层和隐藏层之间的偏置,T (m)表示循环神经网络模型的输入层获取的第m个顺序标签所对应的手写字图像的图像特征。
然后,将循环神经网络模型的隐藏层的输出h (m)通过公式o (m)=V'h (m)+c'计算输入到循环神 经网络模型中的输出层,获取循环神经网络模型的输出。其中,o (m)表示循环神经网络模型的隐藏层输入给输出层的输入,V'表示循环神经网络模型的隐藏层和输出层之间的权值,c'表示隐藏层和输出层之间的偏置。该循环神经网络模型的输出层根据公式y (m)=σ”(o (m))对循环神经网络模型中的输出层的输入o (m)进行计算,获取循环神经网络模型的前向输出y (m),该循环神经网络模型的前向输出y (m)具体指训练集中的手写字图像输入到循环神经网络模型中得到的手写字图像对应的前向输出,σ”表示循环神经网络模型的输出层的激活函数,一般为softmax函数。
S33:根据循环神经网络模型的前向输出和标签汉字,构建损失函数,损失函数的具体表达式为:
Figure PCTCN2018094403-appb-000003
其中,N表示手写字训练样本中手写字图像的个数,E loss(θ)表示N个手写字训练样本中所有手写字图像对应的总误差的平均值,M表示手写字训练样本中手写字图像携带的顺序标签的个数,
Figure PCTCN2018094403-appb-000004
表示第n个手写字训练样本中第m个顺序标签对应的手写字图像的前向输出,
Figure PCTCN2018094403-appb-000005
表示第n个手写字训练样本中第m个顺序标签对应的标签汉字,θ表示权值和偏置的集合。
具体地,θ表示卷积神经网络模型中的权值和偏置与循环神经网络模型的权值和偏置的集合。
S34:根据损失函数,采用基于批量梯度下降的反向传播算法更新调整循环神经网络模型和卷积神经网络模型中的权值和偏置,获取手写字训练模型。
其中,批量梯度下降的反向传播算法指获取N个手写字训练样本中的所有手写字图像的误差值后,对误差值取平均值,利用反向传播算法更新调整循环神经网络模型和卷积神经网络模型中的权值和偏置的方法。
本实施例中,获取手写字训练样本中的一个手写字图像的前向输出,根据该前向输出和对应的标签汉字获取误差,然后将该手写字训练样本中的所有手写字图像的误差进行累加,获取样本误差,其中样本误差指样本中所有手写字图像的误差累加得到的总和。最后将训练集中的所有手写字训练样本的样本误差进行累加,获取训练集的总误差,对训练集的总误差进行平均值计算,得到E loss(θ)。采用基于批量梯度下降的反向传播算法更新调整循环神经网络模型和卷积神经网络模型中的权值和偏置,可以使得训练集中的所有手写字图像产生的误差都能参与卷积循环神经网络模型(循环神经网络模型和卷积神经网络模型)的权值和偏置的更新和调整,保证了手写字训练模型的训练全面性,提高了手写字训练模型的准确性。
在获取E loss(θ)后,通过对E loss(θ)求偏导,更新调整循环神经网络模型和卷积神经网络模型中的权值和偏置,获取手写字训练模型。其中,求偏导的公式为
Figure PCTCN2018094403-appb-000006
步骤S31-S34,通过卷积神经网络模型获取训练集中手写字图像对应的图像特征,然后将图像特征输入到循环神经网络模型中进行训练,获取循环神经网络模型的前向输出,并根据前向输出和标签汉字 构建损失函数。最后根据损失函数,采用基于批量梯度下降的反向传播算法更新调整循环神经网络模型和卷积神经网络模型中的权值和偏置,获取手写字训练模型。保证了手写字训练模型的训练全面性,从而提高了手写字训练模型的准确性。
本实施例所提供的手写字模型训练方法中,将训练集输入到卷积循环神经网络模型中,通过卷积神经网络模型,获取手写字图像对应的图像特征,然后将图像特征输入到循环神经网络模型中,获取循环神经网络模型的前向输出,然后根据循环神经网络模型的前向输出和标签汉字构建损失函数,利用损失函数,采用基于批量梯度下降的反向传播算法更新卷积循环神经网络模型中的权值和偏置,获取手写字训练模型,保证了卷积循环神经网络模型中的参数的充分更新,提高了手写字训练模型的识别准确率。为了进一步验证手写字训练模型的准确性,将测试集输入到手写字训练模型中进行测试,若手写字训练模型对手写字训练样本的识别准确率大于预设准确率,则表示手写字训练模型对手写字训练样本的识别准确率达到了要求,该手写字训练模型确定为用于识别手写字图像的手写字识别模型,该手写字识别模型具有较高的识别准确性。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
在一实施例中,提供一种手写字模型训练装置,该手写字模型训练装置与上述实施例中手写字模型训练方法一一对应。如图4所示,该手写字模型训练装置包括训练样本获取模块10、训练样本处理模块20、训练模型获取模块30和识别模型获取模块40,各功能模块详细说明如下:
训练样本获取模块10,用于获取手写字训练样本,手写字训练样本包括手写字图像和与手写字图像关联的标签汉字。
训练样本处理模块20,用于将手写字训练样本划分成训练集和测试集。
训练模型获取模块30,用于将训练集输入到卷积循环神经网络模型中,获取卷积循环神经网络模型的前向输出,根据卷积循环神经网络模型的前向输出,采用基于批量梯度下降的反向传播算法更新卷积循环神经网络模型中的权值和偏置,获取手写字训练模型。
识别模型获取模块40,用于将测试集输入到手写字训练模型中,获取每一手写字图像对应的识别汉字,基于识别汉字和标签汉字获取识别准确率,若识别准确率大于预设准确率,则确定手写字训练模型为手写字识别模型。
具体地,卷积循环神经网络模型包括卷积神经网络模型及循环神经网络模型。
训练模型获取模块30包括图像特征获取单元31、前向输出获取单元32、损失函数构建单元33和训练模型获取单元34。
图像特征获取单元31,用于将训练集输入到卷积神经网络模型中,获取训练集中手写字图像对应的图像特征。
前向输出获取单元32,用于将训练集中手写字图像对应的图像特征输入到循环神经网络模型中进行训练,获取循环神经网络模型的前向输出。
损失函数构建单元33,用于根据循环神经网络模型的前向输出和标签汉字,构建损失函数,损失函数的具体表达式为:
Figure PCTCN2018094403-appb-000007
其中,N表示手写字训练样本中手写字图像的个数,E loss(θ)表示N个手写字训练样本中所有手写字图像对应的总误差的平均值,M表示手写字训练样本中手写字图像携带的顺序标签的个数,
Figure PCTCN2018094403-appb-000008
表示第n个手写字训练样本中第m个顺序标签对应的手写字图像的前向输出,
Figure PCTCN2018094403-appb-000009
表示第n个手写字训练样本中第m个顺序标签对应的标签汉字,θ表示权值和偏置的集合。
训练模型获取单元34,用于根据损失函数,采用基于批量梯度下降的反向传播算法更新调整循环 神经网络模型和卷积神经网络模型中的权值和偏置,获取手写字训练模型。
在一实施例中,如图5所示,提供一种汉字识别方法,该汉字识别方法具体包括如下步骤:
S51:获取原始图像,原始图像包括手写字和背景图像。
其中,原始图像指没有经过任何处理的特定图像,该特定图像是指需要包括手写字的图像。本实施例中的原始图像包括手写字和背景图像。其中,背景图像是指原始图像上的背景图案对应的图像。该原始图像的获取方式包括但不限于从网页上爬取或者通过访问与服务器相连的数据库上获取,该数据库上的原始图像可以是终端设备预先上传的图像。
S52:对原始图像进行预处理,获取有效图像。
其中,有效图像指原始图像经过预处理后的图像。服务器获取有效图像的具体步骤为:(1)判断原始图像是否为彩色图像,若原始图像为彩色图像,则对原始图像进行灰度化处理,获取灰度图像,使得彩色图像中每个像素对应的三个分量R(红色)、G(绿色)和B(蓝色)可以用一个值替代,有助于简化后续进行极差标准化处理的复杂度。可以理解地,若原始图像不为彩色图像,则原始图像为灰度图像,无需再进行灰度化处理。(2)对灰度图像对应的像素矩阵进行极差标准化处理,获取有效图像。对灰度图像对应的像素矩阵进行极差标准化处理可以在保留像素矩阵中相对关系,同时又可以提高计算速度。
S53:采用核密度估计算法和腐蚀方法对有效图像进行处理,去除背景图像,获取包括手写字的目标图像。
目标图像指仅包含手写字部分的图像。核密度估计算法是一种从数据样本本身出发研究数据分布特征,用于估计概率密度函数的非参数方法。核密度估计算法的具体公式为
Figure PCTCN2018094403-appb-000010
表示像素的估计概率密度,K(.)为核函数,h为像素范围,x为要估计概率密度的像素,x i为h范围内的第i个像素,n为h范围内像素为x的个数。腐蚀方法指对图像进行腐蚀处理的方法,其中,腐蚀指去除图像中背景图像的部分,仅保留手写字的部分。
本实施例中,采用核密度估计算法的公式对有效图像对应的频率分布直方图进行处理,获取频率分布直方图对应的平滑曲线,根据平滑曲线上的极小值和极大值,获取极小值和极大值对应的像素,然后根据极大值和极小值对应的像素对有效图像进行分层处理,在分层处理后,对分层处理后的图像进行腐蚀处理,去除背景图像,保留手写字部分。最后将经过分层和腐蚀处理的图像进行叠加处理,获取目标图像。其中,叠加处理指将分层后的仅保留有手写字部分的图像叠加成一个图像的处理过程,从而实现获取目标图像的目的。
S54:采用垂直投影方法对目标图像进行单字体切割,获取单字体图像。
其中,垂直投影方法是指将每一行手写字进行垂直方向的投影,获取垂直投影直方图的方法。垂直投影直方图是指反映目标图像在垂直方向上的像素数量的图,垂直投影直方图的横坐标轴表示目标图像的宽度,纵坐标表示目标图像的像素数量分布情况。
具体地,逐行扫描目标图像中的每一行手写字并获取每一行手写字对应的像素的数量,基于像素和像素的数量形成垂直投影直方图,再根据该垂直投影直方图,按照预先设置的切割阈值对目标图像进行切割,获取单字体图像。单字体图像指单个字体对应的图像。其中,切割阈值指预先设置好的用于切割目标图像中的手写字,获取单字体。当扫描到目标图像对应的垂直投影直方图中的纵坐标上的像素数量小于等于阈值时,则表示对应的横坐标的位置是两个相邻手写字之间的分隔点,在该分隔点对目标图像进行单字体切割。如预先设置的切割阈值为10,当扫描到目标图像对应的垂直投影直方图中像素数量为小于等于10时(0、9和10),则该像素数量值(0、9和10)对应的横坐标所在的位置是两个相邻手写字之间的分割点,在该分割点对目标图像进行单字体切割,获取该目标图像对应的单字体图像。可以理解地,每一个手写字对应的像素是比较集中的,汉字与汉字之间的间隙对应的像素是比较稀疏的,像素的密集程度反应在对应的垂直投影直方图中,则为有汉字的像素对应的像素数量比较高,没有汉字 的像素对应的像素数量比较低,通过垂直投影方法能够有效对目标图像进行单字体切割,获取单字体图像,为后续进行模型识别提供技术支持。
S55:将单字体图像输入到手写字识别模型中进行识别,获取单字体图像对应的识别结果,手写字识别模型是采用上述手写字模型训练方法获取到的。
其中,手写字识别模型是预先训练好的用于识别手写字的模型。识别结果指识别概率大于预设概率的输出。预设概率指预先设置的用于判断识别概率是否满足要求的概率。具体地,将单字体图像输入到手写字识别模型中,获取每一单字体图像对应的识别概率,该识别概率是指该单字体图像可能为某一具体汉字的概率。将识别概率和预设概率进行比较,若识别概率大于预设概率,则获取对应的识别结果,有助于提高识别结果的准确性。
如预设概率为85%,将“海”对应的单字体图像输入到手写字识别模型中,获取识别概率大于预设概率对应的识别结果,该识别结果可能为“诲”或“海”,即“海”对应的单字体图像识别为“诲”或“海”的识别概率均大于85%,因此可能输出两个识别结果“诲”或“海”。
S56:基于识别结果查询语义库,获取单字体图像对应的目标汉字。
其中,语义库是预先设置的用于对识别结果进行语义分析的知识库。语义分析是对识别结果进行上下文有关性质的分析。语义库是由大量的中文句子组成。目标汉字是查询语义库后符合语义的单字体图像所对应的汉字。
具体地,在获取识别结果后,还需要根据语义库进一步确定目标汉字,如“海”“枯”、“石”以及“烂”这四个单字体图像对应的识别结果为“诲”或“海”、“枯”、“石”以及“烂”或“栏”,为了进一步确定存在两个或两个以上识别结果对应的单字体图像的目标汉字,因此需查询语义库,根据语义库中收录的中文句子判断更加准确的识别结果。通过查询语义库“海枯石烂”符合语义,则确定每一单字体图像对应的目标汉字为“海”“枯”“石”“烂”,根据语义库确定目标汉字,可以提高对单字体图像识别的准确率。
本实施例所提供的汉字识别方法,通过对原始图像进行预处理,获取有效图像,并采用核密度估计算法和腐蚀方法对有效图像进行处理,去除背景图像的部分,保留仅含有手写字的目标图像,为后续进行单字体切割提供数据来源。采用垂直投影方法对目标图像进行单字体切割,获取单字体图像,将获取的单字体图像输入到手写字识别模型中识别,基于单字体图像对应的识别概率值,获取识别结果。采用手写字识别模型对单字体图像进行识别,可以提高识别准确率。基于识别结果查询语义库,根据语义库中存储的中文句子获取单字体图像对应的目标汉字,该目标汉字则为单字体图像对应的汉字。利用语义库获取目标汉字,可以筛选出该单字体图像精准对应的目标汉字,通过手写字识别模型和语义库的判断可以提高手写字识别的精准度。
在一实施例中,如图6所示,步骤S52,对原始图像进行预处理,获取有效图像,具体包括如下步骤:
S521:对原始图像进行放大和灰度化处理,获取灰度图像。
由于在原始图像中,手写字本身的尺寸相对于背景图像而言较小,在对原始图像进行灰度化处理时,手写字容易被误处理掉,因此,为了保证手写字不会再灰度化处理时被误清除,需要对原始图像对应的每个像素进行放大处理,如原始图像中第n个像素的大小为x n,对原始图像中的每个像素进行幂次放大处理,使得x n变为
Figure PCTCN2018094403-appb-000011
。本实施例中,将原始图像中的像素进行放大处理,可以有效避免在对原始图像进行灰度化处理时,手写字被误处理掉。
在原始图像进行放大处理后,若原始图像不是灰度图像而是彩色图像时,则需要对原始图像进行灰度化处理,获取灰度图像。可以理解地,若原始图像为灰度图像,则不需要进行灰度化处理。当原始图像为彩色图像时,对原始图像进行灰度化处理的具体步骤为:采用公式Y=0.299R+0.587G+0.114B对原始图像中的每个像素进行处理,获取每个像素对应的采样像素,依据该采样像素形成灰度图像;其中,R(红色)、G(绿色)和B(蓝色)是原始图像中的三个分量, 采样像素是灰度图像中用于替换彩色图像中R、G和B三个分量对应的像素。
对原始图像为彩色图像进行灰度化处理,有效减少了后续步骤获取有效图像时需要处理的数据量和计算的复杂度。
S522:对灰度图像对应的像素矩阵进行极差标准化处理,获取有效图像,其中,极差标准化处理的公式为
Figure PCTCN2018094403-appb-000012
x是标准化前有效图像的像素,x'是标准化后有效图像的像素,M min是灰度图像对应的像素矩阵M中最小的像素,M max是灰度图像对应的像素矩阵M中最大的像素。
其中,极差标准化处理是对数据进行处理,使数据压缩在(0,1)范围内的处理方法。对灰度图像对应的像素矩阵进行价差标准化处理并乘上255,可以方便对像素矩阵中的数据进行处理,同时保留像素矩阵中各像素的相互关系。灰度图像中,背景图像和每个手写字都有各自对应的像素矩阵。在获取灰度图像中的背景图像和每个手写字对应的像素矩阵后,对像素矩阵进行极差标准化处理,获取极差标准化处理后的像素矩阵对应的有效图像。对像素矩阵进行极差标准化处理,能够提高获取目标图像的处理速度。
步骤S521-S522,通过对原始图像进行放大处理,可以有效避免在对原始图像在下一个步骤中对原始图像进行灰度化处理时,将手写字误处理掉的情况发生。对原始图像进行灰度化处理,获取灰度图像可以减少后续步骤中需要处理的数据量。对灰度图像进行极差标准化处理,能够提高获取目标图像的处理速度。
在一实施例中,如图7所示,步骤S53,采用核密度估计算法和腐蚀方法对有效图像进行处理,去除背景图像,获取包括手写字的目标图像,具体包括如下步骤:
S531:对有效图像中的像素出现的次数进行统计,获取有效图像对应的频率分布直方图。
其中,频率分布直方图的横轴表示样本数据的连续值,横轴上的每个小区间对应一个组的组距,作为小矩形的底边;纵轴表示频率与组距的比值,并用该比值作为小矩形的高,以多个小矩形构成的一组图称为频率直方图。具体地,获取有效图像后,在频率直方图的横轴表示像素为(0,255)之间的连续值,横轴上每个小矩形对应的组距为1,纵轴表示小矩形对应的像素出现的频率与组距的比值,该比值即为对应的小矩形的高。该频率分布直方图可以形象地将有效图像中的像素出现的次数展示出来,使得数据的分布情况一目了然地反映出来。
S532:采用高斯核密度估算方法对频率分布直方图进行处理,获取频率分布直方图对应的频率极大值和频率极小值,并根据频率极大值和频率极小值获取对应的像素。
高斯核密度估算方法指核函数为高斯核的核密度估算方法。其中,高斯核对应的函数为
Figure PCTCN2018094403-appb-000013
其中,K (x)指像素(自变量)为x的高斯核函数,x指像素,e和π为常数。频率极大值指在频率分布直方图中,频率值大小为极大值的频率值;频率极小值指在频率分布直方图中,频率值大小为极小值的频率值。具体地,采用高斯核密度函数估算方法对获取的有效图像对应的频率分布直方图进行高斯平滑处理,获取该频率分布直方图对应的高斯平滑曲线。基于该高斯平滑曲线上的频率极大值和频率极小值,获取频率极大值和频率极小值对应横轴上的像素。本实施例中,获取频率极大值和频率极小值对应的像素,便于后续对有效图像进行分层区分,获取分层图像。
S533:基于频率极大值和频率极小值对应的像素对有效图像进行分层处理,获取分层图像。
分层图像指基于频率极大值和频率极小值对有效图像进行分层处理得到的图像。获取频率极大值和频率极小值对应的像素,根据频率极大值对应的像素对有效图像进行分层处理,有效图像中有多少个频率极大值,对应的有效图像的像素就被聚类为多少类,该有效图像就会被分为几层。然后以频率极小值对应的像素作为类之间的边界值,根据类之间的边界则可以每一层分层图像对应的像素。
如有效图像中的频率极大值对应的像素分别为12、54、97、113、159、172,频率极小值对应的像素分别为26、69、104、139和163,根据有效图像中的频率极大值的个数可以确定该有效图像的像素 可以被分为6类,该有效图像可以被分为6层,频率极小值对应的像素作为类之间的边界值,由于最小的像素为0,最大的像素为255,因此,根据类之间的边界值则可以确定以像素为12的分层图像,该分层图像对应的像素范围为[0,26);以像素为54的分层图像,该分层图像对应的像素范围为[26,69);以像素为97的分层图像,该分层图像对应的像素范围为[69,104);以像素为113的分层图像,该分层图像对应的像素范围为[104,139);以像素为159的分层图像,该分层图像对应的像素范围为[139,163);以像素为172的分层图像,该分层图像对应的像素范围为[163,255]。
S534:对分层图像进行腐蚀处理,并将腐蚀处理后的分层图像进行叠加处理,获取目标图像。
获取分层图像后,对分层图像进行二值化处理。其中,二值化处理是指将图像上的像素设置为0(黑色)或1(白色),将整个图像呈现出明显的黑白效果的处理。对分层图像进行二值化处理后,对二值化处理后的分层图像进行腐蚀处理,去除背景图像部分,保留分层图像上的手写字部分。其中,腐蚀处理是用于形态学中去除图像的某部分的内容的操作。由于每个分层图像上的像素是属于不同范围的像素,因此,对分层图像进行腐蚀处理后,还需要将每个分层图像叠加,生成仅含有手写字的目标图像。
步骤S531-S534,通过获取有效图像对应的频率分布直方图,并根据频率分布直方图获取频率极大值和频率极小值对应的像素,从而获取分层图像。最后对分层图像进行二值化、腐蚀和叠加处理,完成对原始图像中手写字和背景图像的识别,去除背景图像,获取手写字的目标图像。
在一实施例中,如图8所示,步骤S534中,对分层图像进行腐蚀处理,具体包括如下步骤:
S5341:对分层图像进行二值化处理,获取分层二值化图像。
分层二值化图像指对分层图像进行二值化处理获取的图像。具体地,获取分层图像后,基于分层图像的采样像素和预先选取的阈值进行比较,将采样大于等于阈值的像素设置为1,小于阈值的像素设置为0的过程。本实施例中,0代表背景像素,1代表目标像素(手写字像素)。该阈值可以通过计算分层图像的类间方差获取,也可以根据经验值获取。阈值的大小会影响分层图像二值化处理的效果,若阈值选取合适,则对分层图像进行二值化处理的效果就比较好,相应地,若阈值选取不合适,则影响分层图像二值化处理的效果。为了方便操作,简化计算过程,本实施例中的阈值根据经验值确定。
S5342:对分层二值化图像中的像素进行检测标记,获取分层二值化图像对应的连通区域。
其中,连通区域是指某一特定像素周围的邻接像素所围成的区域。如某特定像素为0,其周围的邻接像素为1,则将邻接像素所围成的区域作为连通区域。
获取每个分层图像对应的分层二值化图像后,对分层二值化图像对应的像素矩阵进行逐行扫描,将符合连通规则(4邻域连通或者8邻域连通)的像素向相同的标号标记出来。4邻域连通指一个特定像素与上、下、左、右四个方向相邻的像素相同的情况;8邻域连通指一个特定像素上、下、左、右、左上、左下、右上、右下八个方向相邻的像素相同的情况。
具体地,像素矩阵包括行和列。对二值化图像中的像素进行检测标记的具体过程为:(1)逐行扫描像素矩阵,把每一行中连续为1的像素(目标像素)组成一个序列,该序列称为团,标记好该团的起点、终点以及所在的行号。团的起点指团的第一个像素,团的终点指团的最后一个像素。(2)对像素矩阵中除了第一行外的剩余行里的团,比较某一特定剩余行中的团与前一行中的所有团是否有重合区域,若没有重合区域,则给该特定剩余行中的团一个新的标号;如果该特定剩余行中的团仅与上一行中一个团有重合区域,则将上一行的该团的标号赋给它;如果该特定剩余行与上一行中有两个以上的团有重合区域,则给对应的团赋一个相关联团的最小标号,并将上一行的这几个团中的标记写入等价对,说明它们属于一类。其中,相关联团指与特定剩余行的团有重合区域的上一行的团;等价对指相互连通的团上的标号。
例如,一像素矩阵中的特定剩余行为第三行,该第三行中有两个团(A,B),其中A团与第二行中的两个团(该两个团的标号为1,2)有重合区域,则将第二行中的两个团的最小标号1赋给该A团,A团的标号为1,并将A团、1团和2团对应的标号记为等价对,即将(1,2)记为等价对。标号为1和标号为2的团则称为一个连通区域。
S5343:对分层二值化图像对应的连通区域进行腐蚀处理。
采用MATLAB中的imerode函数或者Open CV中的cvErode函数对分层二值化图像的连通区域进行腐蚀处理。具体地,选取一个结构像素,本实施例是以像素矩阵中某个特征像素相邻的8个像素作为该 特征像素的连通区域的,因此,选取的结构像素3×3的像素矩阵。使用结构像素对分层二值化图像的像素矩阵进行扫描,比较分层二值化图像中的像素矩阵与结构像素是否完全一致,若完全一致时,则像素矩阵中对应的9个像素为都变为1;若不完全一致,则像素矩阵中对应的9个像素都变为0,其中,0(黑色)则为分层二值化图像被腐蚀的部分。
基于预先设置的手写字区域抗腐蚀能力范围对分层二值化图像进行筛选,对于不在手写字区域抗腐蚀能力范围内的分层二值化图像部分删除,获取分层二值化图像中在手写字区域抗腐蚀能力范围内的部分。对筛选出的符合手写字区域抗腐蚀能力范围的每个分层二值化图像部分对应的像素矩阵进行叠加,就可以获取到仅含有手写字的目标图像。其中,手写字区域抗腐蚀能力可以采用公式:
Figure PCTCN2018094403-appb-000014
计算,s 1表示分层二值化图像中被腐蚀后的总面积,s 2表示分层二值化图像中被腐蚀前的总面积。
如预先设置的手写字区域抗腐蚀能力范围为[0.05,0.8],根据公式
Figure PCTCN2018094403-appb-000015
计算每个分层二值化图像被腐蚀后的总面积和分层二值化图像被腐蚀前的总面积的比值。通过计算,分层二值化图像中某区域腐蚀后的总面积和腐蚀前的总面积的比值不在预先设置的手写字区域抗腐蚀能力范围内,则表示该区域的分层二值化图像是手写字,需要保留。分层二值化图像中的某区域腐蚀后的总面积和腐蚀前的总面积的比值在[0.05,0.8]范围内,则表示该区域的分层二值化图像是手写字,需要保留。对每个分层二值化图像对应的像素矩阵进行叠加,则可以获取含有手写字的目标图像。
步骤S5341-S5343,对分层图像进行二值化处理,获取分层二值化图像,然后对分层二值化图像中的像素进行检测标记,获取分层二值化图像对应的像素矩阵中每个像素的连通区域,采用结构像素对每个像素的连通区域进行检测,对与结构像素不完全一致的像素矩阵中的像素都变为0,像素为0的分层二值化图像为黑色,该黑色部分则是分层二值化图像被腐蚀的部分,通过计算分层二值化图像被腐蚀后的总面积和分层二值化图像被腐蚀前的总面积的比值,判断该比值是否在预先设置的手写字区域抗腐蚀能力范围,去除背景图像,保留手写字,达到获取目标图像的目的。
该汉字识别方法通过对原始图像进行放大和灰度化处理,获取灰度图像,然后对灰度图像进行价差标准化处理,获取有效图像。方便后续步骤采用高斯核密度估计算法对有效图像进行分层、二值化、腐蚀和叠加处理,去除背景图像,保留只含有手写字的目标图像。采用垂直投影方法对目标图像进行单字体切割,获取单字体图像,将获取的单字体图像输入到手写字识别模型中识别,基于单字体图像对应的识别概率值,获取识别结果。基于识别结果查询语义库,根据语义库中存储的中文句子获取单字体图像对应的目标汉字,通过手写字识别模型和语义库的判断筛选可以提高手写字识别的精准度。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
在一实施例中,提供一种汉字识别装置,该汉字识别装置与上述实施例中汉字识别方法一一对应。如图9所示,该汉字识别装置包括原始图像获取模块51、有效图像获取模块52、目标图像获取模块53、单字体图像获取模块54、识别结果获取模块55和目标汉字确认模块56。各功能模块详细说明如下:
原始图像获取模块51,用于获取原始图像,原始图像包括手写字和背景图像。
有效图像获取模块52,用于对原始图像进行预处理,获取有效图像。
目标图像获取模块53,用于采用核密度估计算法和腐蚀方法对有效图像进行处理,去除背景图像,获取包括手写字的目标图像。
单字体图像获取模块54,用于采用垂直投影方法对目标图像进行单字体切割,获取单字体图像。
识别结果获取模块55,用于将单字体图像输入到手写字识别模型中进行识别,获取单字体图像对应的识别结果,手写字识别模型是采用上述手写字模型训练方法获取到的。
目标汉字确认模块56,用于基于识别结果查询语义库,获取单字体图像对应的目标汉字。
具体地,有效图像获取模块52包括灰度图像获取单元521和极差标准化处理单元522。
灰度图像获取单元521,用于对原始图像进行放大和灰度化处理,获取灰度图像。
极差标准化处理单元522,用于对灰度图像对应的像素矩阵进行极差标准化处理,获取有效图像,其中,极差标准化处理的公式为
Figure PCTCN2018094403-appb-000016
x是标准化前有效图像的像素,x'是标准化后有效图像的像素,M min是灰度图像对应的像素矩阵M中最小的像素,M max是灰度图像对应的像素矩阵M中最大的像素。
具体地,目标图像获取模块53包括第一处理单元531、第二处理单元532、分层图像获取单元533和分层图像处理单元534。
第一处理单元531,用于对有效图像中的像素出现的次数进行统计,获取有效图像对应的频率分布直方图。
第二处理单元532,用于采用高斯核密度估算方法对频率分布直方图进行处理,获取频率分布直方图对应的频率极大值和频率极小值,并根据频率极大值和频率极小值获取对应的像素。
分层图像获取单元533,用于基于频率极大值和频率极小值对应的像素对有效图像进行分层处理,获取分层图像。
分层图像处理单元534,用于对分层图像进行腐蚀处理,并将腐蚀处理后的分层图像进行叠加处理,获取目标图像。
具体地,分层图像处理单元534包括二值化处理单元5341、连通区域获取单元5342和连通区域处理单元5343。
二值化处理单元5341,用于对分层图像进行二值化处理,获取分层二值化图像。
连通区域获取单元5342,用于对分层二值化图像中的像素进行检测标记,获取分层二值化图像对应的连通区域。
连通区域处理单元5343,用于对分层二值化图像对应的连通区域进行腐蚀处理。
在一实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图10所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储手写字识别模型。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种手写字模型训练方法。
在一实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,处理器执行计算机可读指令时实现以下步骤:获取手写字训练样本,手写字训练样本包括手写字图像和与手写字图像关联的标签汉字;将手写字训练样本划分成训练集和测试集;将训练集输入到卷积循环神经网络模型中,获取卷积循环神经网络模型的前向输出,根据卷积循环神经网络模型的前向输出,采用基于批量梯度下降的反向传播算法更新卷积循环神经网络模型中的权值和偏置,获取手写字训练模型;将测试集输入到手写字训练模型中,获取每一手写字图像对应的识别汉字,基于识别汉字和标签汉字获取识别准确率,若识别准确率大于预设准确率,则确定手写字训练模型为手写字识别模型。
在一实施例中,处理器执行计算机可读指令时还实现以下步骤:卷积循环神经网络模型包括卷积神经网络模型及循环神经网络模型;将训练集输入到卷积神经网络模型中,获取训练集中手写字图像对应的图像特征;将训练集中手写字图像对应的图像特征输入到循环神经网络模型中进行训练,获取循环神经网络模型的前向输出;根据循环神经网络模型的前向输出和标签汉字,构建损失函数,损失函数的具 体表达式为:
Figure PCTCN2018094403-appb-000017
其中,N表示手写字训练样本中手写字图像的个数,E loss(θ)表示N个手写字训练样本中所有手写字图像对应的总误差的平均值,M表示手写字训练样本中手写字图像携带的顺序标签的个数,
Figure PCTCN2018094403-appb-000018
表示第n个手写字训练样本中第m个顺序标签对应的手写字图像的前向输出,
Figure PCTCN2018094403-appb-000019
表示第n个手写字训练样本中第m个顺序标签对应的标签汉字,θ表示权值和偏置的集合;根据损失函数,采用基于批量梯度下降的反向传播算法更新调整循环神经网络模型和卷积神经网络模型中的权值和偏置,获取手写字训练模型。
在一实施例中,提供了一个或多个存储有计算机可读指令的非易失性可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器实现如下步骤:获取手写字训练样本,手写字训练样本包括手写字图像和与手写字图像关联的标签汉字;将手写字训练样本划分成训练集和测试集;将训练集输入到卷积循环神经网络模型中,获取卷积循环神经网络模型的前向输出,根据卷积循环神经网络模型的前向输出,采用基于批量梯度下降的反向传播算法更新卷积循环神经网络模型中的权值和偏置,获取手写字训练模型;将测试集输入到手写字训练模型中,获取每一手写字图像对应的识别汉字,基于识别汉字和标签汉字获取识别准确率,若识别准确率大于预设准确率,则确定手写字训练模型为手写字识别模型。
在一实施例中,计算机可读指令被处理器执行时还实现以下步骤:卷积循环神经网络模型包括卷积神经网络模型及循环神经网络模型;将训练集输入到卷积神经网络模型中,获取训练集中手写字图像对应的图像特征;将训练集中手写字图像对应的图像特征输入到循环神经网络模型中进行训练,获取循环神经网络模型的前向输出;根据循环神经网络模型的前向输出和标签汉字,构建损失函数,损失函数的具体表达式为:
Figure PCTCN2018094403-appb-000020
其中,N表示手写字训练样本中手写字图像的个数,E loss(θ)表示N个手写字训练样本中所有手写字图像对应的总误差的平均值,M表示手写字训练样本中手写字图像携带的顺序标签的个数,
Figure PCTCN2018094403-appb-000021
表示第n个手写字训练样本中第m个顺序标签对应的手写字图像的前向输出,
Figure PCTCN2018094403-appb-000022
表示第n个手写字训练样本中第m个顺序标签对应的标签汉字,θ表示权值和偏置的集合;根据损失函数,采用基于批量梯度下降的反向传播算法更新调整循环神经网络模型和卷积神经网络模型中的权值和偏置,获取手写字训练模型。
在一实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,处理器执行计算机可读指令时实现以下步骤:获取原始图像,原始图像包括手写字和背景图像;对原始图像进行预处理,获取有效图像;采用核密度估计算法和腐蚀方法对有效图像进行处理,去除背景图像,获取包括手写字的目标图像;采用垂直投影方法对目标图像进行单字体切割,获取单字体图像;将单字体图像输入到手写字识别模型中进行识别,获取单字体图像对应的识别结果,手写字识别模型是采用上述手写字模型训练方法获取到的;基于识别结果查询语义库,获取单字体图像对应的目标汉字。
在一实施例中,处理器执行计算机可读指令时还实现以下步骤:对原始图像进行放大和灰度化处理,获取灰度图像;对灰度图像对应的像素矩阵进行极差标准化处理,获取有效图像,其中,极差标准化处 理的公式为
Figure PCTCN2018094403-appb-000023
x是标准化前有效图像的像素,x'是标准化后有效图像的像素,M min是灰度图像对应的像素矩阵M中最小的像素,M max是灰度图像对应的像素矩阵M中最大的像素。
在一实施例中,处理器执行计算机可读指令时还实现以下步骤:对有效图像中的像素出现的次数进行统计,获取有效图像对应的频率分布直方图;采用高斯核密度估算方法对频率分布直方图进行处理,获取频率分布直方图对应的频率极大值和频率极小值,并根据频率极大值和频率极小值获取对应的像素;基于频率极大值和频率极小值对应的像素对有效图像进行分层处理,获取分层图像;对分层图像进行腐蚀处理,并将腐蚀处理后的分层图像进行叠加处理,获取目标图像。
在一实施例中,处理器执行计算机可读指令时还实现以下步骤:对分层图像进行二值化处理,获取分层二值化图像;对分层二值化图像中的像素进行检测标记,获取分层二值化图像对应的连通区域;对分层二值化图像对应的连通区域进行腐蚀处理。
在一实施例中,提供了一个或多个存储有计算机可读指令的非易失性可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器实现如下步骤:获取原始图像,原始图像包括手写字和背景图像;对原始图像进行预处理,获取有效图像;采用核密度估计算法和腐蚀方法对有效图像进行处理,去除背景图像,获取包括手写字的目标图像;采用垂直投影方法对目标图像进行单字体切割,获取单字体图像;将单字体图像输入到手写字识别模型中进行识别,获取单字体图像对应的识别结果,手写字识别模型是采用上述手写字模型训练方法获取到的;基于识别结果查询语义库,获取单字体图像对应的目标汉字。
在一实施例中,计算机可读指令被处理器执行时还实现以下步骤:对原始图像进行放大和灰度化处理,获取灰度图像;对灰度图像对应的像素矩阵进行极差标准化处理,获取有效图像,其中,极差标准化处理的公式为
Figure PCTCN2018094403-appb-000024
x是标准化前有效图像的像素,x'是标准化后有效图像的像素,M min是灰度图像对应的像素矩阵M中最小的像素,M max是灰度图像对应的像素矩阵M中最大的像素。
在一实施例中,计算机可读指令被处理器执行时还实现以下步骤:对有效图像中的像素出现的次数进行统计,获取有效图像对应的频率分布直方图;采用高斯核密度估算方法对频率分布直方图进行处理,获取频率分布直方图对应的频率极大值和频率极小值,并根据频率极大值和频率极小值获取对应的像素;基于频率极大值和频率极小值对应的像素对有效图像进行分层处理,获取分层图像;对分层图像进行腐蚀处理,并将腐蚀处理后的分层图像进行叠加处理,获取目标图像。
在一实施例中,计算机可读指令被处理器执行时还实现以下步骤:对分层图像进行二值化处理,获取分层二值化图像;对分层二值化图像中的像素进行检测标记,获取分层二值化图像对应的连通区域;对分层二值化图像对应的连通区域进行腐蚀处理。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。
以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种手写字模型训练方法,其特征在于,包括:
    获取手写字训练样本,所述手写字训练样本包括手写字图像和与所述手写字图像关联的标签汉字;
    将所述手写字训练样本划分成训练集和测试集;
    将所述训练集输入到卷积循环神经网络模型中,获取卷积循环神经网络模型的前向输出,根据所述卷积循环神经网络模型的前向输出,采用基于批量梯度下降的反向传播算法更新所述卷积循环神经网络模型中的权值和偏置,获取手写字训练模型;
    将所述测试集输入到所述手写字训练模型中,获取每一手写字图像对应的识别汉字,基于所述识别汉字和所述标签汉字获取识别准确率,若所述识别准确率大于预设准确率,则确定所述手写字训练模型为手写字识别模型。
  2. 如权利要求1所述的手写字模型训练方法,其特征在于,所述卷积循环神经网络模型包括卷积神经网络模型及循环神经网络模型;
    所述将所述训练集输入到卷积循环神经网络模型中,获取卷积循环神经网络模型的前向输出,根据所述卷积循环神经网络模型的前向输出,采用基于批量梯度下降的反向传播算法更新所述卷积循环神经网络模型中的权值和偏置,获取手写字训练模型,包括:
    将所述训练集输入到卷积神经网络模型中,获取训练集中手写字图像对应的图像特征;
    将所述训练集中手写字图像对应的图像特征输入到循环神经网络模型中进行训练,获取所述循环神经网络模型的前向输出;
    根据所述循环神经网络模型的前向输出和所述标签汉字,构建损失函数,所述损失函数的具体表达式为:
    Figure PCTCN2018094403-appb-100001
    其中,N表示手写字训练样本中手写字图像的个数,E loss(θ)表示N个手写字训练样本中所有手写字图像对应的总误差的平均值,M表示手写字训练样本中手写字图像携带的顺序标签的个数,
    Figure PCTCN2018094403-appb-100002
    表示第n个手写字训练样本中第m个顺序标签对应的手写字图像的前向输出,
    Figure PCTCN2018094403-appb-100003
    表示第n个手写字训练样本中第m个顺序标签对应的标签汉字,θ表示权值和偏置的集合;
    根据所述损失函数,采用基于批量梯度下降的反向传播算法更新调整所述循环神经网络模型和所述卷积神经网络模型中的权值和偏置,获取手写字训练模型。
  3. 一种汉字识别方法,其特征在于,包括:
    获取原始图像,所述原始图像包括手写字和背景图像;
    对所述原始图像进行预处理,获取有效图像;
    采用核密度估计算法和腐蚀方法对所述有效图像进行处理,去除背景图像,获取包括所述手写字的目标图像;
    采用垂直投影方法对所述目标图像进行单字体切割,获取单字体图像;
    将所述单字体图像输入到手写字识别模型中进行识别,获取所述单字体图像对应的识别结果,所述手写字识别模型是采用权利要求1或2所述手写字模型训练方法获取到的;
    基于所述识别结果查询语义库,获取所述单字体图像对应的目标汉字。
  4. 如权利要求3所述的汉字识别方法,其特征在于,所述对所述原始图像进行预处理,获取有效图像,包括:
    对所述原始图像进行放大和灰度化处理,获取灰度图像;
    对所述灰度图像对应的像素矩阵进行极差标准化处理,获取有效图像,其中,所述极差标准化处理 的公式为
    Figure PCTCN2018094403-appb-100004
    x是标准化前有效图像的像素,x′是标准化后有效图像的像素,M min是所述灰度图像对应的像素矩阵M中最小的像素,M max是所述灰度图像对应的像素矩阵M中最大的像素。
  5. 如权利要求3所述的汉字识别方法,其特征在于,所述采用核密度估计算法和腐蚀方法对所述有效图像进行处理,去除背景图像,获取包括所述手写字的目标图像,包括:
    对所述有效图像中的像素出现的次数进行统计,获取所述有效图像对应的频率分布直方图;
    采用高斯核密度估算方法对所述频率分布直方图进行处理,获取所述频率分布直方图对应的频率极大值和频率极小值,并根据所述频率极大值和频率极小值获取对应的像素;
    基于所述频率极大值和所述频率极小值对应的像素对有效图像进行分层处理,获取分层图像;
    对所述分层图像进行腐蚀处理,并将所述腐蚀处理后的分层图像进行叠加处理,获取目标图像。
  6. 如权利要求5所述的汉字识别方法,其特征在于,所述对所述分层图像进行腐蚀处理,包括:
    对所述分层图像进行二值化处理,获取分层二值化图像;
    对所述分层二值化图像中的像素进行检测标记,获取所述分层二值化图像对应的连通区域;
    对所述分层二值化图像对应的连通区域进行腐蚀处理。
  7. 一种手写字模型训练装置,其特征在于,包括:
    训练样本获取模块,用于获取手写字训练样本,所述手写字训练样本包括手写字图像和与所述手写字图像关联的标签汉字;
    训练样本处理模块,用于将所述手写字训练样本划分成训练集和测试集;
    训练模型获取模块,用于将所述训练集输入到卷积循环神经网络模型中,获取卷积循环神经网络模型的前向输出,根据所述卷积循环神经网络模型的前向输出,采用基于批量梯度下降的反向传播算法更新所述卷积循环神经网络模型中的权值和偏置,获取手写字训练模型;
    识别模型获取模块,用于将所述测试集输入到所述手写字训练模型中,获取每一手写字图像对应的识别汉字,基于所述识别汉字和所述标签汉字获取识别准确率,若所述识别准确率大于预设准确率,则确定所述手写字训练模型为手写字识别模型。
  8. 一种汉字识别装置,其特征在于,包括:
    原始图像获取模块,用于获取原始图像,所述原始图像包括手写字和背景图像;
    有效图像获取模块,用于对所述原始图像进行预处理,获取有效图像;
    目标图像获取模块,用于采用核密度估计算法和腐蚀方法对所述有效图像进行处理,去除背景图像,获取包括所述手写字的目标图像;
    单字体图像获取模块,用于采用垂直投影方法对所述目标图像进行单字体切割,获取单字体图像;
    识别结果获取模块,用于将所述单字体图像输入到手写字识别模型中进行识别,获取所述单字体图像对应的识别结果,所述手写字识别模型是采用权利要求1或2所述手写字模型训练方法获取到的;
    目标汉字确认模块,用于基于所述识别结果查询语义库,获取所述单字体图像对应的目标汉字。
  9. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现如下步骤:
    获取手写字训练样本,所述手写字训练样本包括手写字图像和与所述手写字图像关联的标签汉字;
    将所述手写字训练样本划分成训练集和测试集;
    将所述训练集输入到卷积循环神经网络模型中,获取卷积循环神经网络模型的前向输出,根据所述卷积循环神经网络模型的前向输出,采用基于批量梯度下降的反向传播算法更新所述卷积循环神经网络模型中的权值和偏置,获取手写字训练模型;
    将所述测试集输入到所述手写字训练模型中,获取每一手写字图像对应的识别汉字,基于所述识别汉字和所述标签汉字获取识别准确率,若所述识别准确率大于预设准确率,则确定所述手写字训练模型为手写字识别模型。
  10. 如权利要求9所述的计算机设备,其特征在于,所述卷积循环神经网络模型包括卷积神经网络模型及循环神经网络模型;
    所述将所述训练集输入到卷积循环神经网络模型中,获取卷积循环神经网络模型的前向输出,根据所述卷积循环神经网络模型的前向输出,采用基于批量梯度下降的反向传播算法更新所述卷积循环神经网络模型中的权值和偏置,获取手写字训练模型,包括:
    将所述训练集输入到卷积神经网络模型中,获取训练集中手写字图像对应的图像特征;
    将所述训练集中手写字图像对应的图像特征输入到循环神经网络模型中进行训练,获取所述循环神经网络模型的前向输出;
    根据所述循环神经网络模型的前向输出和所述标签汉字,构建损失函数,所述损失函数的具体表达式为:
    Figure PCTCN2018094403-appb-100005
    其中,N表示手写字训练样本中手写字图像的个数,E loss(θ)表示N个手写字训练样本中所有手写字图像对应的总误差的平均值,M表示手写字训练样本中手写字图像携带的顺序标签的个数,
    Figure PCTCN2018094403-appb-100006
    表示第n个手写字训练样本中第m个顺序标签对应的手写字图像的前向输出,
    Figure PCTCN2018094403-appb-100007
    表示第n个手写字训练样本中第m个顺序标签对应的标签汉字,θ表示权值和偏置的集合;
    根据所述损失函数,采用基于批量梯度下降的反向传播算法更新调整所述循环神经网络模型和所述卷积神经网络模型中的权值和偏置,获取手写字训练模型。
  11. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现如下步骤:
    获取原始图像,所述原始图像包括手写字和背景图像;
    对所述原始图像进行预处理,获取有效图像;
    采用核密度估计算法和腐蚀方法对所述有效图像进行处理,去除背景图像,获取包括所述手写字的目标图像;
    采用垂直投影方法对所述目标图像进行单字体切割,获取单字体图像;
    将所述单字体图像输入到手写字识别模型中进行识别,获取所述单字体图像对应的识别结果,所述手写字识别模型是采用权利要求1或2所述手写字模型训练方法获取到的;
    基于所述识别结果查询语义库,获取所述单字体图像对应的目标汉字。
  12. 如权利要求11所述的计算机设备,其特征在于,所述对所述原始图像进行预处理,获取有效图像,包括:
    对所述原始图像进行放大和灰度化处理,获取灰度图像;
    对所述灰度图像对应的像素矩阵进行极差标准化处理,获取有效图像,其中,所述极差标准化处理的公式为
    Figure PCTCN2018094403-appb-100008
    x是标准化前有效图像的像素,x′是标准化后有效图像的像素,M min是所述灰度图像对应的像素矩阵M中最小的像素,M max是所述灰度图像对应的像素矩阵M中最大的像素。
  13. 如权利要求11所述的计算机设备,其特征在于,所述采用核密度估计算法和腐蚀方法对所述有效图像进行处理,去除背景图像,获取包括所述手写字的目标图像,包括:
    对所述有效图像中的像素出现的次数进行统计,获取所述有效图像对应的频率分布直方图;
    采用高斯核密度估算方法对所述频率分布直方图进行处理,获取所述频率分布直方图对应的频率极 大值和频率极小值,并根据所述频率极大值和频率极小值获取对应的像素;
    基于所述频率极大值和所述频率极小值对应的像素对有效图像进行分层处理,获取分层图像;
    对所述分层图像进行腐蚀处理,并将所述腐蚀处理后的分层图像进行叠加处理,获取目标图像。
  14. 如权利要求13所述的计算机设备,其特征在于,所述对所述分层图像进行腐蚀处理,包括:
    对所述分层图像进行二值化处理,获取分层二值化图像;
    对所述分层二值化图像中的像素进行检测标记,获取所述分层二值化图像对应的连通区域;
    对所述分层二值化图像对应的连通区域进行腐蚀处理。
  15. 一个或多个存储有计算机可读指令的非易失性可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器实现如下步骤:
    获取手写字训练样本,所述手写字训练样本包括手写字图像和与所述手写字图像关联的标签汉字;
    将所述手写字训练样本划分成训练集和测试集;
    将所述训练集输入到卷积循环神经网络模型中,获取卷积循环神经网络模型的前向输出,根据所述卷积循环神经网络模型的前向输出,采用基于批量梯度下降的反向传播算法更新所述卷积循环神经网络模型中的权值和偏置,获取手写字训练模型;
    将所述测试集输入到所述手写字训练模型中,获取每一手写字图像对应的识别汉字,基于所述识别汉字和所述标签汉字获取识别准确率,若所述识别准确率大于预设准确率,则确定所述手写字训练模型为手写字识别模型。
  16. 如权利要求15所述的非易失性可读存储介质,其特征在于,所述卷积循环神经网络模型包括卷积神经网络模型及循环神经网络模型;
    所述将所述训练集输入到卷积循环神经网络模型中,获取卷积循环神经网络模型的前向输出,根据所述卷积循环神经网络模型的前向输出,采用基于批量梯度下降的反向传播算法更新所述卷积循环神经网络模型中的权值和偏置,获取手写字训练模型,包括:
    将所述训练集输入到卷积神经网络模型中,获取训练集中手写字图像对应的图像特征;
    将所述训练集中手写字图像对应的图像特征输入到循环神经网络模型中进行训练,获取所述循环神经网络模型的前向输出;
    根据所述循环神经网络模型的前向输出和所述标签汉字,构建损失函数,所述损失函数的具体表达式为:
    Figure PCTCN2018094403-appb-100009
    其中,N表示手写字训练样本中手写字图像的个数,E loss(θ)表示N个手写字训练样本中所有手写字图像对应的总误差的平均值,M表示手写字训练样本中手写字图像携带的顺序标签的个数,
    Figure PCTCN2018094403-appb-100010
    表示第n个手写字训练样本中第m个顺序标签对应的手写字图像的前向输出,
    Figure PCTCN2018094403-appb-100011
    表示第n个手写字训练样本中第m个顺序标签对应的标签汉字,θ表示权值和偏置的集合;
    根据所述损失函数,采用基于批量梯度下降的反向传播算法更新调整所述循环神经网络模型和所述卷积神经网络模型中的权值和偏置,获取手写字训练模型。
  17. 一个或多个存储有计算机可读指令的非易失性可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器实现如下步骤:
    获取原始图像,所述原始图像包括手写字和背景图像;
    对所述原始图像进行预处理,获取有效图像;
    采用核密度估计算法和腐蚀方法对所述有效图像进行处理,去除背景图像,获取包括所述手写字的目标图像;
    采用垂直投影方法对所述目标图像进行单字体切割,获取单字体图像;
    将所述单字体图像输入到手写字识别模型中进行识别,获取所述单字体图像对应的识别结果,所述手写字识别模型是采用权利要求1或2所述手写字模型训练方法获取到的;
    基于所述识别结果查询语义库,获取所述单字体图像对应的目标汉字。
  18. 如权利要求17所述的非易失性可读存储介质,其特征在于,所述对所述原始图像进行预处理,获取有效图像,包括:
    对所述原始图像进行放大和灰度化处理,获取灰度图像;
    对所述灰度图像对应的像素矩阵进行极差标准化处理,获取有效图像,其中,所述极差标准化处理的公式为
    Figure PCTCN2018094403-appb-100012
    x是标准化前有效图像的像素,x′是标准化后有效图像的像素,M min是所述灰度图像对应的像素矩阵M中最小的像素,M max是所述灰度图像对应的像素矩阵M中最大的像素。
  19. 如权利要求17所述的非易失性可读存储介质,其特征在于,所述采用核密度估计算法和腐蚀方法对所述有效图像进行处理,去除背景图像,获取包括所述手写字的目标图像,包括:
    对所述有效图像中的像素出现的次数进行统计,获取所述有效图像对应的频率分布直方图;
    采用高斯核密度估算方法对所述频率分布直方图进行处理,获取所述频率分布直方图对应的频率极大值和频率极小值,并根据所述频率极大值和频率极小值获取对应的像素;
    基于所述频率极大值和所述频率极小值对应的像素对有效图像进行分层处理,获取分层图像;
    对所述分层图像进行腐蚀处理,并将所述腐蚀处理后的分层图像进行叠加处理,获取目标图像。
  20. 如权利要求19所述的非易失性可读存储介质,其特征在于,所述对所述分层图像进行腐蚀处理,包括:
    对所述分层图像进行二值化处理,获取分层二值化图像;
    对所述分层二值化图像中的像素进行检测标记,获取所述分层二值化图像对应的连通区域;
    对所述分层二值化图像对应的连通区域进行腐蚀处理。
PCT/CN2018/094403 2018-06-04 2018-07-04 手写字模型训练方法、汉字识别方法、装置、设备及介质 WO2019232872A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810563511.2 2018-06-04
CN201810563511.2A CN109086652A (zh) 2018-06-04 2018-06-04 手写字模型训练方法、汉字识别方法、装置、设备及介质

Publications (1)

Publication Number Publication Date
WO2019232872A1 true WO2019232872A1 (zh) 2019-12-12

Family

ID=64839309

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/094403 WO2019232872A1 (zh) 2018-06-04 2018-07-04 手写字模型训练方法、汉字识别方法、装置、设备及介质

Country Status (2)

Country Link
CN (1) CN109086652A (zh)
WO (1) WO2019232872A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111414917A (zh) * 2020-03-18 2020-07-14 民生科技有限责任公司 一种低像素密度文本的识别方法
CN112052852A (zh) * 2020-09-09 2020-12-08 国家气象信息中心 一种基于深度学习的手写气象档案资料的字符识别方法
CN112364860A (zh) * 2020-11-05 2021-02-12 北京字跳网络技术有限公司 字符识别模型的训练方法、装置和电子设备
CN113343814A (zh) * 2021-05-31 2021-09-03 太原理工大学 一种基于单节点光子储备池计算的手写数字图像识别方法
CN113686031A (zh) * 2020-05-19 2021-11-23 山东大学 一种基于机器学习的环路热管太阳能液位模式识别方法
CN115880782A (zh) * 2023-02-16 2023-03-31 广州佰锐网络科技有限公司 基于ai的签字动作识别定位方法、识别训练方法及系统

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110378372A (zh) * 2019-06-11 2019-10-25 中国科学院自动化研究所南京人工智能芯片创新研究院 图数据识别方法、装置、计算机设备和存储介质
CN110363086A (zh) * 2019-06-11 2019-10-22 中国科学院自动化研究所南京人工智能芯片创新研究院 图数据识别方法、装置、计算机设备和存储介质
CN110363303B (zh) * 2019-06-14 2023-07-07 平安科技(深圳)有限公司 智能分配模型训练内存方法、装置及计算机可读存储介质
CN111414844B (zh) * 2020-03-17 2023-08-29 北京航天自动控制研究所 一种基于卷积循环神经网络的集装箱箱号识别方法
CN111428715A (zh) * 2020-03-26 2020-07-17 广州市南方人力资源评价中心有限公司 一种基于神经网络的文字识别方法
CN112434699A (zh) * 2020-11-25 2021-03-02 杭州六品文化创意有限公司 手写汉字或偏旁、笔划的自动提取及智能评分系统
CN112632979A (zh) * 2020-12-31 2021-04-09 上海臣星软件技术有限公司 文字生成方法、装置、设备及介质
CN113176830A (zh) * 2021-04-30 2021-07-27 北京百度网讯科技有限公司 识别模型训练、识别方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5812697A (en) * 1994-06-10 1998-09-22 Nippon Steel Corporation Method and apparatus for recognizing hand-written characters using a weighting dictionary
CN104268541A (zh) * 2014-09-15 2015-01-07 青岛高校信息产业有限公司 一种设备铭牌和能效标识的智能化图像识别方法
CN105139036A (zh) * 2015-06-19 2015-12-09 四川大学 一种基于稀疏编码的手写体数字识别方法
CN107784316A (zh) * 2016-08-26 2018-03-09 阿里巴巴集团控股有限公司 一种图像识别方法、装置、系统和计算设备
CN107909564A (zh) * 2017-10-23 2018-04-13 昆明理工大学 一种基于深度学习的全卷积网络图像裂纹检测方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105184226A (zh) * 2015-08-11 2015-12-23 北京新晨阳光科技有限公司 数字识别方法和装置及神经网络训练方法和装置
CN107122809B (zh) * 2017-04-24 2020-04-28 北京工业大学 基于图像自编码的神经网络特征学习方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5812697A (en) * 1994-06-10 1998-09-22 Nippon Steel Corporation Method and apparatus for recognizing hand-written characters using a weighting dictionary
CN104268541A (zh) * 2014-09-15 2015-01-07 青岛高校信息产业有限公司 一种设备铭牌和能效标识的智能化图像识别方法
CN105139036A (zh) * 2015-06-19 2015-12-09 四川大学 一种基于稀疏编码的手写体数字识别方法
CN107784316A (zh) * 2016-08-26 2018-03-09 阿里巴巴集团控股有限公司 一种图像识别方法、装置、系统和计算设备
CN107909564A (zh) * 2017-10-23 2018-04-13 昆明理工大学 一种基于深度学习的全卷积网络图像裂纹检测方法

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111414917A (zh) * 2020-03-18 2020-07-14 民生科技有限责任公司 一种低像素密度文本的识别方法
CN111414917B (zh) * 2020-03-18 2023-05-12 民生科技有限责任公司 一种低像素密度文本的识别方法
CN113686031A (zh) * 2020-05-19 2021-11-23 山东大学 一种基于机器学习的环路热管太阳能液位模式识别方法
CN113686031B (zh) * 2020-05-19 2022-06-24 山东大学 一种基于机器学习的环路热管太阳能液位模式识别方法
CN112052852A (zh) * 2020-09-09 2020-12-08 国家气象信息中心 一种基于深度学习的手写气象档案资料的字符识别方法
CN112052852B (zh) * 2020-09-09 2023-12-29 国家气象信息中心 一种基于深度学习的手写气象档案资料的字符识别方法
CN112364860A (zh) * 2020-11-05 2021-02-12 北京字跳网络技术有限公司 字符识别模型的训练方法、装置和电子设备
CN113343814A (zh) * 2021-05-31 2021-09-03 太原理工大学 一种基于单节点光子储备池计算的手写数字图像识别方法
CN113343814B (zh) * 2021-05-31 2022-06-14 太原理工大学 一种基于单节点光子储备池计算的手写数字图像识别方法
CN115880782A (zh) * 2023-02-16 2023-03-31 广州佰锐网络科技有限公司 基于ai的签字动作识别定位方法、识别训练方法及系统
CN115880782B (zh) * 2023-02-16 2023-08-08 广州佰锐网络科技有限公司 基于ai的签字动作识别定位方法、识别训练方法及系统

Also Published As

Publication number Publication date
CN109086652A (zh) 2018-12-25

Similar Documents

Publication Publication Date Title
WO2019232872A1 (zh) 手写字模型训练方法、汉字识别方法、装置、设备及介质
WO2019232873A1 (zh) 文字模型训练方法、文字识别方法、装置、设备及介质
WO2019232874A1 (zh) 汉字模型训练方法、汉字识别方法、装置、设备及介质
WO2019232849A1 (zh) 汉字模型训练方法、手写字识别方法、装置、设备及介质
WO2019232853A1 (zh) 中文模型训练、中文图像识别方法、装置、设备及介质
WO2019232843A1 (zh) 手写模型训练、手写图像识别方法、装置、设备及介质
CN110569830B (zh) 多语言文本识别方法、装置、计算机设备及存储介质
WO2019232870A1 (zh) 手写字训练样本获取方法、装置、计算机设备及存储介质
CN111325203B (zh) 一种基于图像校正的美式车牌识别方法及系统
WO2019232852A1 (zh) 手写字训练样本获取方法、装置、设备及介质
WO2019232850A1 (zh) 手写汉字图像识别方法、装置、计算机设备及存储介质
CN106446896B (zh) 一种字符分割方法、装置及电子设备
CN109165589B (zh) 基于深度学习的车辆重识别方法和装置
WO2019232847A1 (zh) 手写模型训练方法、手写字识别方法、装置、设备及介质
CN110647829A (zh) 一种票据的文本识别方法及系统
JP2020527260A (ja) テキスト検出分析方法、装置及びデバイス
CN109740606B (zh) 一种图像识别方法及装置
WO2019232869A1 (zh) 手写模型训练方法、文本识别方法、装置、设备及介质
CN110543906B (zh) 基于Mask R-CNN模型的肤质自动识别方法
CN106372624B (zh) 人脸识别方法及系统
CN112508857B (zh) 基于改进型Cascade R-CNN的铝材表面缺陷检测方法
CN115082934B (zh) 一种金融票据中手写汉字分割识别方法
WO2019232855A1 (zh) 手写模型训练方法、手写字识别方法、装置、设备及介质
CN113158977A (zh) 改进FANnet生成网络的图像字符编辑方法
US20220254148A1 (en) Defect detecting method based on dimensionality reduction of data, electronic device, and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18921693

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 11/03/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18921693

Country of ref document: EP

Kind code of ref document: A1