WO2019232853A1

WO2019232853A1 - Chinese model training method, chinese image recognition method, device, apparatus and medium

Info

Publication number: WO2019232853A1
Application number: PCT/CN2018/094235
Authority: WO
Inventors: 高梁梁; 周罡
Original assignee: 平安科技（深圳）有限公司
Priority date: 2018-06-04
Filing date: 2018-07-03
Publication date: 2019-12-12
Also published as: CN109102037B; CN109102037A

Abstract

Disclosed are a Chinese model training method, a Chinese image recognition method, a device, an apparatus and a medium. The Chinese model training method comprises: obtaining Chinese images for training handwriting (S11); dividing, according to a preset proportion, the Chinese images for training handwriting into a training set and a test set (S12); labeling the Chinese images for training handwriting in the training set in sequence, and inputting the labeled Chinese images for training handwriting into a convolutional neural network-long and short time memory neural network; using the timing classification algorithm to update network parameters of the convolutional neural network-long and short time memory neural network to obtain a primitive handwriting recognition model (S13); and testing the primitive handwriting recognition model by using the Chinese images for training handwriting in the test set, and when the test accuracy is higher than a preset accuracy, obtaining a target handwriting recognition model (S14). The Chinese model training method has the advantages of high training efficiency and high recognition precision.

Description

Chinese model training, Chinese image recognition method, device, equipment and medium

This patent application is based on a Chinese invention patent application filed on June 4, 2018 with the application number 201810563508.0 and entitled "Chinese Model Training, Chinese Image Recognition Method, Device, Equipment, and Medium", and claims priority.

Technical field

The present application relates to the field of image recognition, and in particular, to a Chinese model training, a Chinese image recognition method, a device, a device, and a medium.

Background technique

With the development of the information age, artificial intelligence technology is increasingly used as a core technology to solve specific problems in people's lives. At present, when recognizing handwritten Chinese character images, because the output of traditional convolutional neural networks or recurrent neural networks is a fixed length, it cannot meet the end-to-end handwriting recognition, and the positioning and segmentation of the text in the training pictures are required in advance. , Obtain a single font image, and then train a single font image, the training efficiency is low.

Summary of the Invention

Based on this, it is necessary to provide a Chinese model training method, device, device, and medium that solve the current technical problems of handwriting recognition models with low training efficiency in response to the above technical problems.

A Chinese model training method includes:

Obtain training handwritten Chinese images;

Dividing the training handwritten Chinese image into a training set and a test set according to a preset ratio;

Sequentially label the trained handwritten Chinese images in the training set, and input the labeled trained handwritten Chinese images into a convolutional neural network-long and short-term memory neural network for training, and adopt a time-series classification algorithm to the convolutional neural network -The network parameters of the long-term and short-term memory neural network are updated to obtain the original handwriting recognition model;

The original handwriting recognition model is tested using the trained handwritten Chinese images in the test set, and the target handwriting recognition model is obtained when the test accuracy is greater than a preset accuracy rate.

A Chinese model training device includes:

Training handwritten Chinese image acquisition module for acquiring training handwritten Chinese images;

A training handwritten Chinese image division module, configured to divide the trained handwritten Chinese image into a training set and a test set according to a preset ratio;

The original handwriting recognition model acquisition module is used to sequentially label the trained handwritten Chinese images in the training set, and input the labeled trained handwritten Chinese images into a convolutional neural network-long-term short-term memory neural network for training. The time series classification algorithm updates the network parameters of the convolutional neural network-long-term and short-term memory neural network to obtain the original handwriting recognition model;

A target handwriting recognition model acquisition module is used to test the original handwriting recognition model using the trained handwritten Chinese images in the test set, and obtain a target handwriting recognition model when the test accuracy rate is greater than a preset accuracy rate.

A computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, the following steps are implemented:

Obtain training handwritten Chinese images;

A non-volatile storage medium stores a computer program. When the computer program is executed by a processor, the following steps are implemented:

Obtain training handwritten Chinese images;

Based on this, it is necessary to provide a method, a device, a device and a medium for recognizing Chinese images that cannot be output end-to-end by handwriting recognition in view of the above technical problems.

A Chinese image recognition method includes:

Obtaining a Chinese image to be identified, where the Chinese image to be identified includes handwritten Chinese characters and background pictures;

Preprocessing the Chinese image to be identified to obtain an original image;

Processing the original image using a kernel density estimation algorithm, removing the background picture, and obtaining a target image including the handwritten Chinese character;

Text positioning the target image using text positioning technology to obtain the text area to be recognized;

The text area to be recognized is input into a target handwriting recognition model for recognition, and handwritten Chinese characters corresponding to each of the text area to be recognized are obtained; wherein the target handwriting recognition model is obtained by using the Chinese model training method.

A Chinese image recognition device includes:

A to-be-recognized Chinese image acquisition module, configured to obtain the to-be-recognized Chinese image, wherein the to-be-recognized Chinese image includes handwritten Chinese characters and a background picture;

An original image acquisition module, configured to pre-process the Chinese image to be identified to obtain an original image;

A target image acquisition module, configured to process the original image by using a kernel density estimation algorithm, remove a background picture, and obtain a target image including the handwritten Chinese character;

A to-be-recognized text area acquisition module, configured to use the text positioning technology to perform text positioning on the target image to obtain the to-be-recognized text area;

A handwritten Chinese character acquisition module is configured to input a text area to be recognized into a target handwriting recognition model for recognition, and obtain a handwritten Chinese character corresponding to each of the text area to be recognized; wherein the target handwriting recognition model adopts the Chinese model Obtained by the training method.

Preprocessing the Chinese image to be identified to obtain an original image;

One or more non-volatile readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the following steps:

Preprocessing the Chinese image to be identified to obtain an original image;

The text area to be recognized is input to a target handwriting recognition model for recognition, and handwritten Chinese characters corresponding to each of the text area to be recognized are obtained; wherein the target handwriting recognition model is obtained by using the Chinese model training method.

Details of one or more embodiments of the present application are set forth in the accompanying drawings and description below. Other features and advantages of the application will become apparent from the description, the drawings, and the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to explain the technical solutions of the embodiments of the present application more clearly, the drawings used in the description of the embodiments of the application will be briefly introduced below. Obviously, the drawings in the following description are just some embodiments of the application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without paying creative labor.

FIG. 1 is an application scenario diagram of a Chinese model training method or a Chinese image recognition method in an embodiment of the present application;

2 is a flowchart of a Chinese model training method according to an embodiment of the present application;

FIG. 3 is a specific flowchart of step S13 in FIG. 2;

4 is a schematic diagram of a Chinese model training device according to an embodiment of the present application;

5 is a flowchart of a Chinese image recognition method according to an embodiment of the present application;

6 is a specific flowchart of step S22 in FIG. 5;

FIG. 7 is a specific flowchart of step S23 in FIG. 5;

8 is a specific flowchart of step S234 in FIG. 7;

9 is a schematic diagram of a Chinese image recognition device according to an embodiment of the present application;

FIG. 10 is a schematic diagram of a computer device according to an embodiment of the present application.

Detailed ways

In the following, the technical solutions in the embodiments of the present application will be clearly and completely described with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of this application.

The Chinese model training method provided in the embodiment of the present application can be applied in the application environment shown in FIG. 1. The application environment of the Chinese model training method includes a server and a computer device, wherein the computer device communicates with the server through a network, and the computer device is a device that can interact with the user, including, but not limited to, a computer, a smart phone, and a tablet device. . The Chinese model training method provided in the embodiment of the present application is applied to a server.

In an embodiment, as shown in FIG. 2, a Chinese model training method is provided. The method is applied to the server in FIG. 1 as an example, and includes the following steps:

S11: Obtain training handwritten Chinese images.

The training handwritten Chinese image is a sample image collected from an open source library for model training in advance. The training handwritten Chinese image includes N (N is a positive integer) handwriting samples corresponding to each Chinese in the Chinese secondary word library. The Chinese secondary character library is a very useful Chinese character library that is coded in the order of radical strokes of Chinese characters. Specifically, N handwriting samples handwritten by different people in the open source library are collected to enable the server to obtain training handwritten Chinese images. Because different users have different writing habits, N handwriting samples (that is, training handwritten Chinese images) are used for Training greatly improves the generalization of the model.

S12: Divide the training handwritten Chinese image into a training set and a test set according to a preset ratio.

Among them, the training set is a learning sample data set, which is to establish a classifier by matching some parameters, that is, training the machine learning model using the target training text data in the training set to determine the parameters of the machine learning model. A test set is used to test the discrimination capabilities of a trained machine learning model, such as accuracy. The preset ratio is a preset ratio for dividing the training handwritten Chinese image. In this embodiment, the training handwritten Chinese image can be divided according to a ratio of 9: 1, that is, 90% of the training handwritten Chinese image can be used as the training set, and the remaining 10% of the training handwritten Chinese image can be used as the test set.

S13: Annotate the training handwritten Chinese images in the training set in sequence, and input the labeled trained handwritten Chinese images into the convolutional neural network-long-term and short-term memory neural network for training, and use a time-series classification algorithm for the convolutional neural network-length The network parameters of the memory neural network are updated to obtain the original handwriting recognition model.

Among them, the original handwriting recognition model is a model obtained through multiple iterations of long-term and short-term memory neural networks. Long-short-term memory neural (LSTM) network is a kind of time recursive neural network, which is suitable for processing and predicting important events with time series, and the time series interval and delay are relatively long. Convolutional neural network (CNN) is a locally connected network. Compared with a fully connected network, its biggest feature is local connectivity and weight sharing. For a certain pixel p in an image, the closer the pixel p to the pixel p is, the more influence it has, that is, the greater the local connectivity. In addition, according to the statistical characteristics of natural images, the weight of a certain area can also be used for another area, that is, the weight sharing. Weight sharing can be understood as convolution kernel sharing. In a convolutional neural network (CNN), a convolution operation can be performed on a given image to extract a Chinese image feature. Different convolution kernels can be extracted. Different Chinese image features. Due to the local connectivity of the convolutional neural network, the complexity of the model is reduced, and the efficiency of model training is improved; and because the weights of the convolutional neural network are shared, the convolutional neural network can learn in parallel, further improving the efficiency of model training . Temporal classification algorithm (Connectionist Temporal Classification) (CTC) is used to solve the time series problem of uncertain alignment relationship between input features and output labels. It is an algorithm that can simultaneously optimize the model parameters and the boundary of the alignment and segmentation from end to end. .

Specifically, the server performs labeling according to the chronological order of the training handwritten Chinese images, and inputs the labeled training handwritten Chinese images into a convolutional neural network-long-term short-term memory neural network for training to obtain the original handwriting recognition model. Understandably, each training handwritten Chinese image is arranged in order. For example, the training handwritten Chinese image is "I am very happy today", then each training handwritten Chinese image can be labeled with Arabic numerals from left to right, that is, "Today (1) days (2) very (3) open (4) heart (5)", so that the training handwritten Chinese image has timeliness, so that the original handwriting recognition model can be trained in connection with the context and improve the accuracy of the model . Among them, (1), (2), (3), (4), and (5) are sequential tags.

Long-term short-term memory neural network has three layers of network structure: input layer, hidden layer and output layer. The input layer is the first layer of the long-term and short-term memory neural network, which is used to receive external signals, that is, it is responsible for receiving training handwritten Chinese images. The output layer is the last layer of the long-term and short-term memory neural network, which is used to output signals to the outside world, that is, it is responsible for outputting the calculation results of the long-term and short-term memory neural network. Hidden layers are layers other than the input layer and the output layer of the long-term and short-term memory neural network. They are used to process the Chinese image features extracted by the convolutional neural network to obtain the calculation results of the long-term and short-term memory neural network. Understandably, using long-short-term memory neural network for model training increases the timeliness of training handwritten Chinese images, so as to train the training handwritten Chinese images according to the context, thereby improving the accuracy of the target handwriting recognition model.

In an embodiment, as shown in FIG. 3, in step S13, the training handwritten Chinese images in the training set are sequentially labeled, and the labeled training handwritten Chinese images are input to a convolutional neural network-long-term and short-term memory neural network. The training is performed in time series, and the time series classification algorithm is used to update the network parameters of the convolutional neural network-long and short-term memory neural network to obtain the original handwriting recognition model. The specific steps include the following steps:

S131: Perform feature extraction on the trained handwritten Chinese image in a convolutional neural network to obtain Chinese image features.

Chinese image features are image features corresponding to the training handwritten Chinese image obtained by extracting the features of the training handwritten Chinese image using a convolutional neural network. The convolutional neural network model includes a convolutional layer and a pooling layer. The trained handwritten Chinese image is input into the convolutional neural network model for training. The output of the convolutional layer of each layer is obtained through the calculation of the convolutional layer of each layer. The output of the convolutional layer can be calculated by the formula a _m ^l = σ ( z _m ^l ) = σ (a _m ^l-1 * W ^l + b ^l ) calculation, where a _m ^l represents the output of the m-th sequential label of the l-th convolution layer, that is, the Chinese image feature, z _m ^l Represents the output of the m-th sequential label before the activation function is processed, and a _m ^l-1 indicates the output of the m-th sequential label of the layer 1-1 (that is, the Chinese image of the training handwritten Chinese image corresponding to the m-th sequential label (Characteristics), σ represents the activation function, and the activation function σ used for the convolution layer is ReLu (Rectified Linear Unit, linear rectification function), which has a better effect than other activation functions), * represents the convolution operation, and W ^l represents The convolution kernel (weight) of the first layer, and b ^l represents the offset of the first layer. If the first layer is a pooling layer, the maximum pooling downsampling is used to reduce the output of the convolution layer in the pooling layer. The specific dimension reduction formula is a _m ^l = pool (a _m ^l-1 ) Among them, pool refers to the downsampling calculation. The downsampling calculation can choose the maximum pooling method. The maximum pooling is actually taking the maximum value in the m * m sample. Understandably, the Chinese image feature carries an order label, and the order label of the Chinese image feature is consistent with the order label of the training handwritten Chinese image corresponding to the Chinese image feature.

S132: In the hidden layer of the long-term and short-term memory neural network, the first activation function is used to process the features of the Chinese image to obtain the neurons carrying the identification of the activation state.

Among them, each neuron in the hidden layer of the long-term and short-term memory neural network includes three gates, which are an input gate, a forgetting gate, and an output gate, respectively. The forget gate determines the past information to be discarded in the neuron. The input gate determines the information to be added to the neuron. The output gate determines the information to be output in the neuron. The first activation function is a function for activating a neuron state. The state of the neuron determines the information discarded, added, and output by each gate (ie, input gate, forget gate, and output gate). The activation status flag includes a pass flag and a fail flag. The identifiers corresponding to the input gate, the forget gate, and the output gate in this embodiment are i, f, and o, respectively.

In this embodiment, the Sigmoid (S-shaped growth curve) function is specifically selected as the first activation function. The Sigmoid function is a S-shaped function common in biology. In information science, due to its single increase and inverse function single increase In other properties, the Sigmoid function is often used as a threshold function for neural networks, mapping variables between 0-1. The calculation formula for its activation function is

Among them, z represents the output value of the forget gate.

Specifically, the forgetting gate includes a forgetting threshold. By calculating an activation state of each neuron (Chinese image feature), a neuron carrying an activation state identifier as a pass identifier is obtained. Among them, the calculation formula of the forgetting gate is f _t = σ (W _f · [h _t-1 , x _t ] + b _f ) to calculate which information of the forgetting gate is received (that is, only the neurons carrying the activation status flag as the pass flag are received). ), F _t represents the forgetting threshold (that is, the activation state), W _f represents the weight matrix of the forgetting gate, b _f represents the weight bias term of the forgetting gate, h _t-1 represents the output of the neuron at the previous moment, and x _t represents Input data (ie Chinese image features) at time t, where t is the current time and t-1 is the previous time. The forgetting gate also includes the forgetting threshold. Calculating the Chinese image features through the calculating formula of the forgetting gate will obtain a 0-1 interval scalar. This scalar determines the past information received by the neuron based on the comprehensive judgment of the current state and the past state. Proportion to achieve data reduction, reduce the amount of calculation, and improve training efficiency.

S133: In the hidden layer of the long-term and short-term memory neural network, a second activation function is used to process the neuron carrying the identification of the activation state to obtain the output of the long-term and short-term memory neural network output layer.

Specifically, in the input gate in the hidden layer of the long-term and short-term memory neural network, the second activation function is used to carry the activation state identifier to perform calculation through the identified neurons to obtain the output of the hidden layer. In this embodiment, because the expressive ability of the linear model is insufficient, a tanh (hyperbolic tangent) function is used as the activation function of the input gate (ie, the second activation function). Non-linear factors can be added to make the trained target handwriting recognition model Able to solve more complex problems. In addition, the activation function tanh (hyperbolic tangent) has the advantage of fast convergence speed, which can save training time and improve training efficiency.

Specifically, the output of the input gate is calculated by a calculation formula of the input gate. Wherein the input gate further includes a calculation formula input threshold, the input gate is _{_{i t = σ (W i ·}} [h t-1, x t] + b i), W i is the weight of input gates value matrix, i _t Represents the input threshold, b _i represents the bias term of the input gate, and calculating the Chinese image features through the calculation formula of the input gate will obtain a 0-1 interval scalar (that is, the input threshold). This scalar controls the neuron according to the current state Comprehensively judge with the past state the proportion of the current information received, that is, the proportion of the newly input information, to reduce the amount of calculation and improve the training efficiency.

Then, the calculation formula of the state of the neuron is adopted.

with

Calculate the current states of neurons; wherein, W _i represents the weight calculation unit state weight matrix, i _t represents the input threshold, b _i represents a bias term input gates, b _c indicates cell state to the right input gates value matrix, W _c Bias term,

Represents the state of the neuron at the last moment, and C _t represents the state of the neuron at time t. By performing a dot product operation on the state of the neuron and the forgetting threshold (input threshold), the model can only output the required information, thereby improving the efficiency of model learning.

Finally, the output gate calculation formula o _t = σ (W _o [h _t-1 , x _t ] + b _o ) is used to calculate which information is output in the output gate, and then the formula h _t = o _t * tanh (C _t ) Calculate the output of the neuron at time t, where o _t represents the output threshold, W _o represents the weight matrix of the output gate, _bo represents the bias term of the output gate, and h _t represents the output of the neuron at time t (that is, short-term memory Output of the neural network output layer). After the hidden layer inputs the forward output of the hidden layer of the long-term and short-term memory neural network and the backward output of the hidden layer of the long-term and short-term memory neural network into the output layer of the long-term and short-term memory neural network, the formula is adopted at the output layer of the long-term and short-term memory neural network ln (a + b) = lna + ln (1 + e ^lnb-lna ) Logarithmic calculation of the forward output of the hidden layer of the long-term and short-term memory neural network and the backward output of the hidden layer of the long-term and short-term memory neural network, to obtain the target output To construct the loss function. Among them, the target output is the output of the long-term and short-term memory neural network output layer, a is the forward output of the long- and short-term memory neural network hidden layer, and b is the backward output of the long- and short-term memory neural network hidden layer.

The forward output of the hidden layer of the long-term and short-term memory neural network refers to the probability of the Chinese image features corresponding to the u-th order labels output by the hidden layer of the long-term and short-term memory neural network in time sequence. Backward output refers to the probability of Chinese image features corresponding to the u-th order label output by the hidden layer of the memory neural network in reverse order in time. For example, "I'm in a good mood today" assuming that the Chinese image corresponding to the u-th sequential label feature is "day", and the output of the hidden layer of the memory neural network at time t-1 is "today". The output of the hidden layer of the neural network is "now" and the length of the short-term memory at time t. The input of the neural network input layer is "day". The output of the hidden layer of the memory time at time t may be calculated. ", Then the forward output of the hidden layer of the long-term memory neural network refers to the probability that the output of the hidden layer of the long-term memory neural network at time t is" day ". Assume that the output of the hidden layer of the short-term memory neural network at time t + 1 is "heart", and according to the output of the hidden layer of the short-term memory neural network at time t + 1 and the input of the input layer of the short-term memory neural network at time t " "Day" calculates the output of the hidden layer of the memory neural network at time t, and the output of the hidden layer of the memory neural network at time t may include "day, Yamato and wood", then the backward output of the hidden layer of the long-term memory neural network Refers to the probability of "day" output at time t.

S134: According to the output of the long-term and short-term memory neural network output layer, a time series classification algorithm is used to update the network parameters of the convolutional neural network and the long- and short-term memory neural network to obtain a target handwriting recognition model.

The network parameters of the convolutional neural network-long-short-term memory neural network are weights and biases. First, according to the forward output formula of the hidden layer of the long-term memory neural network

Calculate the forward output of the Chinese image features corresponding to the u-th order label at time t in the hidden layer of the memory neural network. among them,

Indicates the probability that the output is a space at time t, a (t-1, i) indicates the forward output of the i-th Chinese image feature at time t-1, and l 'indicates the number of sequential labels. Formula for backward output of hidden layer of long-term short-term memory neural network

Calculate the backward output of the Chinese image feature corresponding to the uth order label at time t in the hidden layer of the memory neural network.

Represents the probability of output as a space at time (t + 1), and a (t + 1, i) represents the backward output of the Chinese image feature corresponding to the ith sequence label at time t + 1 in the hidden layer of the memory neural network. Spaces indicate white space characters in the output layer of the memory neural network output layer.

Specifically, a loss function is constructed by using a formula of a time-series classification algorithm according to the output of the output layer of the long-term and short-term memory neural network. The formula of the time-series classification algorithm is specifically: E _loss = -ln∑ _{(x, z) ∈ S} p (z | x), p (z | x) = a (t, u) b (t, u), where , P (z | x) represents the probability that the input Chinese image feature x will be z in the output layer of the memory neural network in the short and long term, and a (t, u) represents the Chinese image feature corresponding to the uth sequence label at time t. The forward output of the hidden layer of the memory neural network, and b (t, u) represents the backward output of the Chinese image feature corresponding to the uth sequential label at the t-th time in the hidden layer of the memory neural network. Finally, after obtaining E _loss , the original handwriting recognition model is obtained by updating the network parameters in the short-term memory neural network and the convolutional neural network by obtaining a partial derivative of E _loss . Among them, the formula for finding partial derivatives is

θ is a network parameter, specifically weights and biases in a network of a convolutional neural network and a long-short-term memory neural network.

S14: The original handwriting recognition model is tested by using the training handwritten Chinese images in the test set, and the target handwriting recognition model is obtained when the test accuracy is greater than a preset accuracy rate.

Specifically, in step S14, all the training handwritten Chinese images in the test set are input to the original handwriting recognition model for testing, and the test accuracy rate is obtained (that is, the number of accurate prediction results is divided by the number of all training handwritten Chinese images in the training set). Then judge whether the test accuracy rate is greater than the preset accuracy rate. If the test accuracy rate is greater than the preset accuracy rate, the original handwriting recognition model is deemed to be more accurate, and the original handwriting recognition model is used as the target handwriting recognition model; otherwise, If the test accuracy rate is not greater than the preset accuracy rate, it is determined that the prediction result of the original handwriting recognition model is not accurate enough, and it is still necessary to use steps S11-S13 for training, and then test again until the test accuracy rate reaches the preset accuracy rate. , Stop training, and further improve the accuracy of the target handwriting recognition model.

In this embodiment, the training handwritten Chinese image is first obtained, and the training handwritten Chinese image is divided into a training set and a test set according to a preset ratio, so that the training handwritten Chinese image in the training set is labeled sequentially so that the training handwritten Chinese image has Timing. The labeled handwritten Chinese image is input to the convolutional neural network-long and short-term memory neural network for training. According to the time series of the trained handwritten Chinese image, the convolutional neural network-long and short-term memory neural network trains the handwritten Chinese according to the context. Image training, using time series classification algorithm to update the network parameters of convolutional neural network-long-term short-term memory neural network, to obtain the original handwriting recognition model, solve the time series problem of uncertain alignment relationship between input features and output labels, Realize end-to-end output and improve the generalization of the original handwriting recognition model. Finally, the original handwriting recognition model is tested using the training handwritten Chinese images in the test set. When the test accuracy is greater than the preset accuracy rate, the target handwriting recognition model is obtained, which further improves the accuracy of the target handwriting recognition model.

In one embodiment, a Chinese model training device is provided, and the Chinese model training device corresponds to the Chinese model training method in the above embodiment one-to-one. As shown in FIG. 4, the Chinese model training device includes a training handwritten Chinese image acquisition module 11, a training handwritten Chinese image division module 12, an original handwriting recognition model acquisition module 13 and a target handwriting recognition model acquisition module 14, each functional module is detailed described as follows:

A training handwritten Chinese image acquisition module 11 is configured to acquire a training handwritten Chinese image.

The training handwritten Chinese image division module 12 is configured to divide the training handwritten Chinese image into a training set and a test set according to a preset ratio.

The original handwriting recognition model acquisition module 13 is used for sequentially labeling the training handwritten Chinese images in the training set, and inputting the labeled training handwritten Chinese images into the convolutional neural network-long and short-term memory neural network for training, using time series The classification algorithm updates the network parameters of the convolutional neural network-long-term short-term memory neural network to obtain the original handwriting recognition model.

Specifically, the original handwriting recognition model acquisition module 13 includes a Chinese image feature acquisition unit 131, an activation state neuron acquisition unit 132, an output layer output acquisition unit 133, and a target recognition model acquisition unit 134.

The Chinese image feature acquiring unit 131 is configured to perform feature extraction on a trained handwritten Chinese image in a convolutional neural network to acquire Chinese image features.

The activation state neuron acquisition unit 132 is configured to process a Chinese image feature using a first activation function in a hidden layer of a long-term and short-term memory neural network to acquire a neuron carrying an activation state identifier.

The output layer output obtaining unit 133 is configured to process the neuron carrying the activation state identifier in the hidden layer of the long-term and short-term memory neural network to obtain the output of the output layer of the long-term and short-term memory neural network.

The target recognition model acquisition unit 134 is configured to update the network parameters of the convolutional neural network-long-term and short-term memory neural network by using a time-series classification algorithm according to the output of the long-term and short-term memory neural network output layer to obtain a target handwriting recognition model.

The target handwriting recognition model acquisition module 14 is used to test the original handwriting recognition model using the training handwritten Chinese images in the test set. When the test accuracy is greater than a preset accuracy rate, the target handwriting recognition model is obtained.

Specifically, the formula of the time-series classification algorithm is: E _loss = -ln∏ _{(x, z) ∈ S} p (z | x), p (z | x) = a (t, u) b (t, u), Among them, p (z | x) represents the probability that the Chinese image feature x is input, and the output of the output layer of the memory neural network is z in the short-term and long-term, and a (t, u) represents the corresponding u-th order label at time t. Forward of the Chinese image features in the hidden layer of the long-term and short-term memory neural network, b (t, u) represents the Chinese image features corresponding to the u-th order label at time t in the hidden layer of the long-term and short-term memory neural network. Backward output.

For the specific limitation of the Chinese model training device, refer to the limitation on the Chinese model training method described above, which is not repeated here. Each module in the aforementioned Chinese model training device may be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules may be embedded in the hardware in or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.

In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 10. The computer device includes a processor, a memory, a network interface, and a database connected through a system bus. The processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for running the operating system and computer programs in a non-volatile storage medium. The database of the computer equipment is used to store data generated or obtained during the execution of the Chinese model training method, such as a target handwriting recognition model. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer program is executed by a processor to implement a Chinese model training method.

In one embodiment, a computer device is provided, including a memory, a processor, and a computer program stored on the memory and executable on the processor. When the processor executes the computer program, the following steps are performed: acquiring a training handwritten Chinese image; Divide the training handwritten Chinese image into a training set and a test set according to a preset ratio; annotate the training handwritten Chinese image in the training set in sequence, and input the labeled training handwritten Chinese image to the convolutional neural network-long-term memory neural network The training was performed in time series, and the time series classification algorithm was used to update the network parameters of the convolutional neural network and long-term short-term memory neural network to obtain the original handwriting recognition model. The original handwriting recognition model was tested using the training handwritten Chinese images in the test set. When the test accuracy is greater than a preset accuracy, a target handwriting recognition model is obtained.

In one embodiment, when the processor executes the computer program, the following steps are further implemented: feature extraction of the trained handwritten Chinese image in the convolutional neural network to obtain the Chinese image features; and the first activation of the hidden layer of the long-term memory neural network using the first activation The function processes the Chinese image features to obtain the neurons carrying the activation state identifier. In the hidden layer of the long-term and short-term memory neural network, the second activation function is used to process the neurons carrying the activation state identifier to obtain the long-term and short-term memory neural network output layer. According to the output of the long- and short-term memory neural network output layer, a time series classification algorithm is used to update the network parameters of the convolutional neural network-long-and-short-term memory neural network to obtain the target handwriting recognition model.

Specifically, the formula of the time-series classification algorithm is: E _loss = -ln∏ _{(x, z) ∈ S} p (z | x), p (z | x) = a (t, u) b (t, u), Among them, p (z | x) represents the input Chinese image feature x, the probability that the output of the output layer of the memory neural network is z in a short time, and a (t, u) represents the Chinese corresponding to the uth order label at time t. The forward output of the image feature in the hidden layer of the long-term memory neural network, b (t, u) represents the backward output of the Chinese image feature corresponding to the u-th order label at the t-th time in the hidden layer of the short-term memory neural network.

In one embodiment, one or more non-volatile readable storage media storing computer-readable instructions are provided, and when the computer-readable instructions are executed by one or more processors, the one or more Each processor executes the following steps: obtaining a training handwritten Chinese image; dividing the training handwritten Chinese image into a training set and a test set according to a preset ratio; sequentially labeling the training handwritten Chinese image in the training set, and labeling the labeled training handwritten Chinese image The image is input to the convolutional neural network-long and short-term memory neural network for training, and the time series classification algorithm is used to update the network parameters of the convolutional neural network-long and short-term memory neural network to obtain the original handwriting recognition model; training in the test set is used The handwritten Chinese image is used to test the original handwriting recognition model. When the test accuracy is greater than a preset accuracy rate, the target handwriting recognition model is obtained.

In one embodiment, when the computer-readable instructions are executed by one or more processors, the execution of the one or more processors further implements the following steps: performing training on a handwritten Chinese image in a convolutional neural network Feature extraction to obtain Chinese image features; the first activation function is used to process the Chinese image features in the hidden layer of the long-term and short-term memory neural network to obtain the neurons carrying the activation status identifier; the second layer is used in the hidden layer of the long-term and short-term memory neural network. The activation function processes the neurons carrying the identification of the activation state to obtain the output of the long-term and short-term memory neural network output layer; according to the output of the long-term and short-term memory neural network output layer, a time-series classification algorithm is used for the convolutional neural network-long-and-short-term memory neural network The network parameters are updated to obtain the target handwriting recognition model.

In one embodiment, as shown in FIG. 5, a Chinese image recognition method is provided. The method is applied to the server in FIG. 1 as an example, and includes the following steps:

S21: Acquire a Chinese image to be identified. The Chinese image to be identified includes handwritten Chinese characters and background pictures.

The Chinese image to be identified is an unprocessed image containing handwritten Chinese characters collected by a collection module on a computer device. The Chinese image to be recognized includes handwritten Chinese characters and background pictures. The background picture is a noise picture other than handwritten Chinese characters in the Chinese image to be identified. Noise pictures are pictures that interfere with handwritten Chinese characters. In this embodiment, the user can collect the Chinese image to be recognized containing handwritten Chinese characters and upload it to the server through the acquisition module on the computer device, so that the server acquires the Chinese image to be recognized. The acquisition module includes but is not limited to camera shooting and local upload.

S22: Preprocess the Chinese image to be recognized to obtain the original image.

Among them, the original image is an image obtained by pre-processing the Chinese image to be identified and excluding interference factors. Specifically, as the Chinese image to be identified may contain multiple interference factors, such as numerous colors, it is not conducive to subsequent identification. Therefore, the Chinese image to be identified needs to be pre-processed to obtain the original image that excludes interference factors. The original image can be understood as the image obtained after the background image is excluded from the Chinese image to be identified.

In an embodiment, as shown in FIG. 6, in step S22, the Chinese image to be recognized is pre-processed to obtain the original image, which specifically includes the following steps:

S221: Enlarge and grayscale the Chinese image to be recognized to obtain a grayscale image.

The grayscale image is a grayscale image obtained after the Chinese image to be recognized is enlarged and grayscale processed. The grayed image includes a matrix of pixel values. The pixel value matrix refers to a matrix containing pixel values corresponding to each pixel in a Chinese image to be identified. In this embodiment, the server uses the imread function to read the pixel value of each pixel in the Chinese image to be identified, and performs enlargement and grayscale processing on the Chinese image to be identified to obtain a grayscale image. The imread function is a function in computer language for reading pixel values in an image file. The pixel value is a value assigned by the computer when the original image is digitized.

The Chinese image to be identified may contain multiple colors, and the color itself is very susceptible to factors such as light. There are many changes in the color of similar objects, so it is difficult for the color itself to provide key information. Therefore, it is necessary to grayscale the Chinese image to be identified. Processing to eliminate interference, reduce the complexity of the image and the amount of information processing. However, if the size of the handwritten Chinese characters in the Chinese image to be recognized is small, if the grayscale processing is directly performed, the thickness of the strokes of the handwritten Chinese characters will be too small and will be excluded as interference items. Therefore, in order to increase the thickness of the text strokes, Enlarging the Chinese image to be identified first, and then graying it to avoid the graying process directly, which leads to the problem that the thickness of strokes of handwritten Chinese characters is too small to be excluded as interference items.

Specifically, the server enlarges the original image according to the following formula: x → x ^r , where x represents an element in the matrix M, r is the number of times, and the changed element x ^r replaces x in the pixel value matrix M.

The graying process is a process for rendering the Chinese image to be recognized to have a clear black and white effect. Specifically, performing grayscale processing on the enlarged image includes: the color of each pixel in the Chinese image to be identified is determined by three components of R (red), G (green), and B (blue), and Each component has 256 values from 0 to 255 (0 is the darkest, and 255 is the brightest, white). The grayscale image is a special color image with the same three components of R, G, and B. In this embodiment, the server can directly use the imread function to read the Chinese image to be identified, and the specific values of the three components of R, G, and B corresponding to each pixel in the grayscale image can be obtained.

S222: Standardize the grayscale image to obtain the original image.

Among them, the standardization process refers to a process of performing a standard transformation process on a grayscale image to transform it into a fixed standard form. Specifically, because the pixel values of each pixel in the grayscale image are scattered, the magnitude of the data is not uniform, which will affect the accuracy of subsequent model recognition. Therefore, the grayscale image needs to be standardized to uniformize the magnitude of the data. .

Specifically, the server standardizes the grayscale image by using a formula for normalization processing to avoid the problem that the pixel values in the grayscale image are scattered and the order of data is not uniform. Among them, the standardization formula is

X is the pixel value of the grayed image M, X ′ is the pixel value of the original image, M _min is the smallest pixel value in the grayed image M, and M _max is the largest pixel value in the grayed image M.

S23: Use the kernel density estimation algorithm to process the original image, remove the background image, and obtain a target image including handwritten Chinese characters.

Among them, the kernel density estimation algorithm (kernel density estimation) is a non-parametric method that studies the data distribution characteristics from the data sample itself to estimate the probability density function. The target image refers to an image that contains only handwritten Chinese characters by processing the original image using a kernel density estimation algorithm. Specifically, the server uses a kernel density estimation algorithm to process the original image to eliminate background image interference and obtain a target image including handwritten Chinese characters.

Specifically, the calculation formula of the kernel density estimation algorithm is

Among them, K (.) Is the kernel function, h is the pixel value range, x is the pixel value of the pixel whose probability density is to be estimated, x _i is the i-th pixel value in the h range, and n is the pixel value x in the h range. Number of

Represents the estimated probability density of a pixel.

In an embodiment, as shown in FIG. 7, in step S23, the original image is processed by using a kernel density estimation algorithm to remove the background image to obtain a target image including handwritten Chinese characters, which specifically includes the following steps:

S231: Perform statistics on pixel values in the original image to obtain a histogram of the original image.

The original image histogram is a histogram obtained by statistically calculating pixel values in the original image. Histogram (Histogram) is a kind of statistical report diagram that represents the distribution of data by a series of vertical stripes or line segments of varying heights. In this embodiment, the horizontal axis of the histogram of the original image represents pixel values, and the vertical axis represents the appearance frequency corresponding to the pixel values. The server obtains the histogram of the original image by counting the pixel values in the original image, so that it can intuitively see the distribution of the pixel values in the original image, and provides technical support for subsequent Gaussian kernel density estimation algorithms.

S232: The original image histogram is processed by using a Gaussian kernel density estimation algorithm to obtain at least one frequency maximum and at least one frequency minimum corresponding to the original image histogram.

Among them, the Gaussian kernel density estimation algorithm refers to a kernel density estimation method in which the kernel function is a Gaussian kernel function. The formula of the Gaussian kernel function is

Among them, K _(x) refers to a Gaussian kernel function in which pixels (independent variables) are x, x refers to a pixel value in an effective image, and e and π are constants. Frequency maxima refer to the maxima at different frequency intervals in the frequency distribution histogram. The frequency minimum value refers to the minimum value corresponding to the frequency maximum value in the same frequency interval in the frequency distribution histogram.

Specifically, a Gaussian kernel density function estimation method is used to perform Gaussian smoothing on the frequency distribution histogram corresponding to the original image, and obtain a Gaussian smooth curve corresponding to the frequency distribution histogram. Based on the frequency maxima and frequency minima on the Gaussian smooth curve, obtain the pixel values on the horizontal axis corresponding to the frequency maxima and frequency minima in order to subsequently based on the obtained frequency maxima and frequency minima Corresponding pixel values are convenient for hierarchical segmentation of the original image to obtain a layered image.

S233: Perform hierarchical segmentation processing on the original image based on the frequency maximum and frequency minimum to obtain a layered image.

The layered image is an image obtained by performing layered segmentation processing on the original image based on the maximum and minimum values. The server first obtains the pixel values corresponding to the maximum frequency value and the minimum frequency value, and processes the original image according to the pixel values corresponding to the maximum frequency value. How many frequency maximum values are in the original image, the corresponding original image The number of pixel values is divided into classes; then the pixel value corresponding to the minimum frequency value is used as the boundary value between the classes, and the original image is layered according to the class and the boundary between the classes to obtain the layering image.

For example, the pixel values corresponding to the frequency maximum in the original image are 11, 53, 95, 116, and 158, and the pixel values corresponding to the minimum frequency are 21, 63, 105, and 135, respectively. According to the number of frequency maxima in the original image, it can be determined that the pixel values of the original image can be divided into 5 categories, the original image can be divided into 5 layers, and the pixel values corresponding to the frequency minima are used as the Boundary value, because the minimum pixel value is 0 and the maximum pixel value is 255. Therefore, according to the boundary value between classes, a layered image with a pixel value of 11 can be determined, and the pixel value corresponding to the layered image is [ 0,21); a layered image with a pixel value of 53 and the corresponding pixel value is [21,63); a layered image with a pixel value of 95 and the corresponding pixel value is [ 63,105); a layered image with a pixel value of 116 and the corresponding pixel value is [105,135); a layered image with a pixel value of 158 and the corresponding layer value is [135,255].

S234: Obtain a target image including handwritten Chinese characters based on the layered image.

After obtaining the layered image, the server performs binarization, erosion, and superposition processing on the layered image to obtain a target image including handwritten Chinese characters. The binarization process refers to a process in which the pixel value of a pixel on a layered image is set to 0 (black) or 1 (white), and the entire layered image presents an obvious black and white effect. After the layered image is binarized, the binarized layered image is corroded to remove the background image part and retain the handwritten Chinese characters on the layered image. Because the pixel values on each layered image are pixel values belonging to different ranges, after the layered image is corroded, each layered image needs to be superimposed to generate a target image containing only handwritten Chinese characters. The superimposing process refers to a process of superimposing a layered image with only a handwritten portion into an image, thereby achieving the purpose of obtaining a target image containing only handwritten Chinese characters. In this embodiment, the layered image is superimposed using the imadd function to obtain a target image containing only handwritten Chinese characters. The imadd function is a function in computer language for superimposing layered images.

In one embodiment, as shown in FIG. 8, in step S234, that is, based on the layered image, obtaining a target image including handwritten Chinese characters, specifically includes the following steps:

S2341: Binarize the layered image to obtain a binarized image.

A binarized image refers to an image obtained by binarizing a sub-image. Specifically, after the server obtains the layered image, it compares the sampled pixel value of the layered image with a preselected threshold, and sets the pixel value greater than or equal to the threshold to 1 and the pixel value less than the threshold to 0. process. The sampled pixel value is the pixel value corresponding to each pixel point in the layered image. The size of the threshold value will affect the effect of the binarization process of the layered image. When the threshold value is selected properly, the effect of the binarization process on the layered image is better; when the threshold value is not selected properly, the effect of the binarization process of the layered image will be affected. effect. To facilitate operations and simplify the calculation process, the threshold in this embodiment is determined by the developer based on experience. Binarize the layered image to facilitate subsequent corrosion treatment.

S2342: Detect pixels in the binarized image to obtain a connected area corresponding to the binarized image.

The connected area refers to an area surrounded by adjacent pixels around a specific pixel. In a binarized image, a connected region means that the neighboring pixels around it are all 0, and a specific pixel and the neighboring pixel are 1, for example, a particular pixel is 0, and the surrounding neighboring pixels are 1, and the neighboring pixels are surrounded. The resulting area is used as the connected area.

Specifically, the binarized image corresponds to a pixel matrix, which includes rows and columns. Detecting pixels in a binarized image specifically includes the following processes: (1) Scan the pixel matrix line by line, group consecutive white pixels in each line into a sequence called a cluster, and note its starting point, End point and line number. (2) For the clique in all rows except the first row, if it does not overlap with any clique in the previous row, give it a new label; if it only overlaps with a clique in the previous row , Assign the label of the group in the previous line to it; if it has a coincident area with more than 2 groups in the previous line, give the current group a minimum label of the associated group, and assign these The tokens in the clique are written into equivalent pairs, indicating that they belong to a class. For example, if there are 2 clusters (1 and 2) in the second row with overlapping areas, then the smallest number given to the 2 clusters in the previous row is 1, and the groups in the previous row are assigned The equivalence pair written by the tag will be recorded as (1, 2) equivalence pair. Equivalent pairs refer to the marks of two cliques connected to each other. For example, (1, 2) indicates that the clique of mark 1 and the clique of mark 2 are connected to each other, which is a connected region. In this embodiment, eight adjacent pixels adjacent to a specific pixel in the pixel matrix are used as the connected region of the element.

S2343: Eroding and superimposing the connected area corresponding to the binary image to obtain a target image including handwritten Chinese characters.

Among them, the etching process is an operation for removing the content of a part of an image in morphology. The built-in imerode function is used to etch the connected areas of the binary image. Specifically, etching the connected region corresponding to the binarized image includes the following steps: First, an n × n structural element is selected. In this embodiment, the value of 8 elements adjacent to each element in the pixel matrix is used as The connected region of this element is, therefore, the selected structural element is a 3 × 3 pixel matrix. The structural element is an n × n pixel matrix, where the matrix elements include 0 or 1. Scan the pixel matrix of the layered binary image to obtain pixels with a pixel value of 1, that is, pixels in the connected area, and compare whether the 8 adjacent pixels adjacent to the pixel are all 1, if all 1, Then remain unchanged; if not all 1, the 8 adjacent pixels adjacent to the pixel point in the pixel matrix will become 0 (black). The part that becomes 0 is the part where the layered binarized image is corroded. Matlab is an application software for numerical calculations in the field of mathematical technology applications.

The binarized image is filtered based on the preset anti-corrosion capability range of the hand-written region. Partial deletion of the binary image that is not within the anti-corrosion capability of the hand-written region is obtained to obtain the anti-corrosion capability of the hand-written region in the binary image Within the range. The target pixel image containing only handwritten Chinese characters can be obtained by superimposing the pixel matrix corresponding to each binarized image portion that fits the range of the corrosion resistance of the handwritten area. Among them, the anti-corrosion ability of the hand-written area can adopt the formula:

Calculated, s ₁ represents the total area after being corroded in the binarized image, s ₂ represents the total area before being corroded in the binarized image, and p is the corrosion resistance of the handwritten area.

For example, the preset anti-corrosion range of the handwriting area is [0.01, 0.5], according to the formula

Calculate the ratio p between the total area of each binarized image and the total area before the binarized image. By calculating the ratio p of the total area after erosion to the total area before erosion in the binarized image, which is not in the range of the anti-corrosion capability of the handwritten area, it means that the binarized image of the area is a background image instead of Write by hand and need to be etched to remove the background image. If the ratio p of the total area after erosion to the total area before erosion in the binarized image is in the range of [0.01, 0.5], it means that the binarized image of the region is a handwritten Chinese character and needs to be retained. The pixel matrix corresponding to the retained binary image is superimposed to obtain a target image containing handwritten Chinese characters.

In steps S2341-S2343, the binarized image is binarized to obtain a binarized image, and then pixels in the binarized image are detected and labeled to obtain a connected area corresponding to the binarized image. The elements in the identical pixel matrix all become 0, the binarized image with element 0 is black, and the black part is the corroded part of the binarized image. The total area of the binarized image is calculated by calculating And the ratio of the total area of the binarized image before being eroded, to determine whether the ratio is within the preset anti-corrosion range of the handwriting area, in order to remove the background image in each layered image, retain the handwritten Chinese characters, and finally replace each A layered image is superimposed to achieve the purpose of obtaining the target image.

S24: Use text positioning technology to perform text positioning on the target image to obtain the text area to be recognized.

The text region to be recognized refers to a region in the target image that contains only text. Since the target image also includes a non-Chinese character area, that is, an eroded part of the target image, in order to make the recognition result more accurate and save the recognition time of the model, it is necessary to perform text positioning on the target image. Text positioning technology includes, but is not limited to, text positioning using OCR technology and ctpn network (Connectionist Text Proposal Network, text detection network). Among them, the ctpn network is a commonly used network for image text detection. OCR (Optical Character Recognition, Optical Character Recognition) technology refers to the process of analyzing and recognizing image files of text data to obtain text and layout information. Generally, it is divided into two steps: text positioning, that is, finding the position of the text in the picture, and text recognition, that is, recognizing the found text. In this embodiment, only the text positioning step in the OCR technology is used.

Specifically, using OCR technology as an example to perform text positioning, the steps are as follows:

1. First use the proximity search method from the connected areas obtained in step S2342 to randomly select one connected area as the starting connected area, and calculate the remaining connected area (other connected areas except the actual area) and the starting connected area. The selected connected area whose area distance is less than a preset threshold is selected as the target connected area in order to determine the direction of the expansion operation (ie, up, down, left, and right). The preset threshold is a preset threshold used to determine a distance between two connected regions. Proximity search method refers to starting from a starting connected area, which can find the horizontal circumscribed rectangle of the starting connected area, and expand the connected area to the entire rectangle. When the distance between the starting connected area and the nearest neighboring area is less than a preset threshold At this time, the expansion operation is performed on this rectangle, and the expansion direction is the method of the direction of the nearest neighboring area. The expansion operation is performed only when the expansion direction is horizontal. The formula for calculating the area distance is

S is the initial connected region, S 'is the remaining connected region, and (x _c , y _c ) is the center vector difference between the two connected regions. Since the distance between the two connected regions is calculated according to the neighboring boundary, it needs to be subtracted. Region length, get (x _c ', y _c '), where,

(w ', z') represents the coordinate point of the lower right corner of the remaining connected area, (x ', y') represents the coordinate point of the upper left corner of the remaining connected area,

(w, z) represents the coordinate point of the lower right corner of the initial connected region, and (x, y) represents the coordinate point of the upper left corner of the initial connected region. In this embodiment, this point is used as the origin coordinate.

2. Determine the direction of the expansion operation based on the direction of the target connected area, and perform expansion processing on the initial connected area according to the determined expansion direction to obtain the text area to be recognized. The dilation process is an erosion process and is a process for expanding an image in morphology. The built-in imdilate function is used to corrode the connected areas of the binary image. Specifically, the process of expanding the initial connected region includes the following steps: selecting an n × n structural element, in this embodiment, the value of 8 elements adjacent to each element in the pixel matrix is used as the connected region of the element. Therefore, the selected structural element is a 3 × 3 pixel matrix. The structure element is an n × n pixel matrix, where the matrix elements include 0 or 1. The connected area is scanned according to the direction of the target connected area, and the structure element is connected to the connected area covered by the structure element in the direction of the target connected area. The logical AND operation remains unchanged if the results are all 0; if it is not all 0, the pixel matrix covered by the structural elements is changed to 1, and the part that becomes 1 is the expanded part of the initial connected region. The operation rule of the logical AND operation is 0 && 0 = 0, 0 && 1 = 0, 1 && 0 = 0, 1 && 1 = 1, and && is a logical AND operation symbol.

S25: Input the text area to be recognized into the target handwriting recognition model for recognition, and obtain handwritten Chinese characters corresponding to each text area to be recognized.

Among them, the target handwriting recognition model is obtained by using a Chinese model training method. Specifically, the server inputs the text area to be recognized into the target handwriting recognition model for recognition, so that the target handwriting recognition model can contact the context for recognition, obtain handwritten Chinese characters corresponding to each text area to be recognized, and improve the accuracy of recognition.

In this embodiment, the user can collect the Chinese image to be recognized containing handwritten Chinese characters and upload it to the server through the acquisition module on the computer device, so that the server acquires the Chinese image to be recognized. Then, the server preprocesses the Chinese image to be recognized, and obtains the original image that excludes interference factors. Kernel density estimation algorithm is used to process the original image, remove the background image, and obtain the target image containing only handwritten Chinese characters to further eliminate interference. The text positioning technology is used to locate the text in the target image and obtain the text area to be recognized to eliminate interference from non-Chinese characters. The server inputs the text area to be recognized into the target handwriting recognition model for recognition, so that the target handwriting recognition model can contact the context for recognition, obtain the handwritten Chinese characters corresponding to each text area to be recognized, and improve the accuracy of recognition.

It should be understood that the size of the sequence numbers of the steps in the above embodiments does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application.

In one embodiment, a Chinese image recognition device is provided, and the Chinese image recognition device corresponds to the Chinese image recognition method in the embodiment described above in a one-to-one manner. As shown in FIG. 9, the Chinese image recognition device includes a Chinese image acquisition module 21 to be identified, an original image acquisition module 22, a target image acquisition module 23, a text region acquisition module 24 and a handwritten Chinese character acquisition module 25. The detailed description of each function module is as follows:

The to-be-recognized Chinese image acquisition module 21 is configured to obtain the to-be-recognized Chinese image, and the to-be-recognized Chinese image includes handwritten Chinese characters and background pictures.

The original image acquisition module 22 is configured to preprocess the Chinese image to be recognized to obtain an original image.

A target image acquisition module 23 is configured to process the original image by using a kernel density estimation algorithm, remove the background picture, and obtain a target image including handwritten Chinese characters.

The text region to be recognized acquisition module 24 is configured to perform text positioning on the target image by using text positioning technology to acquire the text region to be recognized.

A handwritten Chinese character acquisition module 25 is configured to input a text area to be recognized into a target handwriting recognition model for recognition, and obtain a handwritten Chinese character corresponding to each text area to be recognized. The target handwriting recognition model is obtained by using the Chinese model training method in the foregoing embodiment.

Specifically, the original image acquisition module 22 includes a grayscale image acquisition unit 221 and an original image acquisition unit 222.

A grayscale image acquisition unit 221 is configured to perform enlargement and grayscale processing on an original image to obtain a grayscale image.

The original image obtaining unit 222 is configured to perform normalization processing on the grayscale image to obtain the original image. The formula of the normalization processing is:

Specifically, the target image acquisition module 23 includes an original image histogram acquisition unit 231, a frequency extreme value acquisition unit 232, a layered image acquisition unit 233, and a target image acquisition unit 234.

The original image histogram obtaining unit 231 is configured to perform statistics on pixel values in the original image to obtain a histogram of the original image.

A frequency extreme value acquisition unit 232 is configured to process a histogram of the original image by using a Gaussian kernel density estimation algorithm, and obtain at least one frequency maximum value and at least one frequency extreme value acquisition unit corresponding to the histogram of the original image. Small value.

A layered image acquisition unit 233 is configured to perform layered segmentation processing on the original image based on the frequency maximum and frequency minimum to obtain a layered image.

The target image acquisition unit 234 is configured to acquire a target image including a handwritten Chinese character based on the layered image.

Specifically, the target image acquisition unit 234 includes a binarized image acquisition subunit 2341, a connected region acquisition subunit 2342, and a target image acquisition subunit 2343.

A binarized image acquisition subunit 2341 is configured to perform binarization processing on the layered image to obtain a binarized image.

The connected region acquisition subunit 2342 is configured to detect pixels in the binarized image and obtain a connected region corresponding to the binarized image.

A target image acquisition subunit 2343 is configured to perform erosion and superposition processing on the connected areas corresponding to the binary image, and acquire a target image including handwritten Chinese characters.

For the specific limitation of the Chinese image recognition device, please refer to the limitation on the Chinese image recognition method described above, which is not repeated here. Each module in the above-mentioned Chinese image recognition device may be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules may be embedded in the hardware in or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.

In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 10. The computer device includes a processor, a memory, a network interface, and a database connected through a system bus. The processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for running the operating system and computer programs in a non-volatile storage medium. The database of the computer equipment is used to store data generated or obtained during the execution of the Chinese model training method or the Chinese image recognition method, such as the target handwriting recognition model or handwritten Chinese characters. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer program is executed by a processor to implement a Chinese image recognition method.

In one embodiment, a computer device is provided, including a memory, a processor, and a computer program stored on the memory and executable on the processor. When the processor executes the computer program, the following steps are performed: acquiring a Chinese image to be identified, The Chinese image to be recognized includes handwritten Chinese characters and background pictures; the Chinese image to be recognized is pre-processed to obtain the original image; the original image is processed by using the kernel density estimation algorithm to remove the background image to obtain the target image including the handwritten Chinese character; using text positioning technology Perform text positioning on the target image to obtain the text area to be recognized; input the text area to be recognized into the target handwriting recognition model for recognition, and obtain handwritten Chinese characters corresponding to each text area to be recognized; among them, the target handwriting recognition model uses Chinese model training method.

In one embodiment, when the processor executes the computer program, the following steps are further implemented: the pixel values in the original image are counted to obtain the original image histogram; the Gaussian kernel density estimation method is used to process the original image histogram to obtain the original image At least one frequency maximum and at least one frequency minimum corresponding to the image histogram; perform hierarchical segmentation processing on the original image based on the frequency maximum and frequency minimum to obtain a layered image; and based on the layered image, obtain Includes target images of handwritten Chinese characters.

In one embodiment, when the processor executes the computer program, the following steps are further implemented: binarizing the layered image to obtain the binarized image; detecting pixels in the binarized image to obtain a kernel density estimation algorithm Connected area corresponding to the binarized image; corroding and superimposing the connected area corresponding to the binarized image to obtain a target image including handwritten Chinese characters.

In one embodiment, one or more non-volatile readable storage media storing computer-readable instructions are provided, and when the computer-readable instructions are executed by one or more processors, the one or more The processors perform the following steps: obtaining a Chinese image to be recognized, which includes handwritten Chinese characters and background pictures; preprocessing the Chinese image to be recognized to obtain the original image; processing the original image using a kernel density estimation algorithm to remove the background image To obtain a target image including handwritten Chinese characters; use text positioning technology to perform text positioning on the target image to obtain the text area to be recognized; input the text area to be recognized into the target handwriting recognition model for recognition, and obtain the correspondence of each text area to be recognized Handwritten Chinese characters; of which, the target handwriting recognition model is obtained using Chinese model training methods.

In one embodiment, when the computer-readable instructions are executed by one or more processors, the execution of the one or more processors further implements the following steps: performing statistics on pixel values in the original image to obtain the original Image histogram; Gaussian kernel density estimation method is used to process the original image histogram to obtain at least one frequency maximum and at least one frequency minimum corresponding to the original image histogram; based on the frequency maximum and frequency minimum The original image is subjected to layered segmentation processing to obtain a layered image; based on the layered image, a target image including handwritten Chinese characters is obtained.

In one embodiment, when the computer-readable instructions are executed by one or more processors, the execution of the one or more processors further implements the following steps: binarizing the layered image to obtain two Digitized image; detect and mark the pixels in the binarized image to obtain the connected area corresponding to the kernel density estimation algorithm binarized image; etch and overlay the connected area corresponding to the binarized image to obtain handwritten Chinese characters The target image.

Those of ordinary skill in the art can understand that the implementation of all or part of the processes in the methods of the above embodiments can be completed by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage. In the medium, the computer program, when executed, may include the processes of the embodiments of the methods described above. Wherein, any reference to the storage, storage, database, or other media used in the embodiments provided in this application may include non-volatile and / or volatile storage. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

Those skilled in the art can clearly understand that, for the convenience and brevity of the description, only the above-mentioned division of functional units and modules is used as an example. In practical applications, the above functions can be assigned by different functional units, Module completion, that is, dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above.

The above-mentioned embodiments are only used to describe the technical solution of the present application, but not limited thereto. Although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that they can still implement the foregoing implementations. The technical solutions described in the examples are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in Within the scope of this application.

Claims

A Chinese model training method, comprising:

Obtain training handwritten Chinese images;

Dividing the training handwritten Chinese image into a training set and a test set according to a preset ratio;

Sequentially label the trained handwritten Chinese images in the training set, and input the labeled trained handwritten Chinese images into a convolutional neural network-long and short-term memory neural network for training, and adopt a time-series classification algorithm to the convolutional neural network -The network parameters of the long-term and short-term memory neural network are updated to obtain the original handwriting recognition model;

The original handwriting recognition model is tested using the trained handwritten Chinese images in the test set, and the target handwriting recognition model is obtained when the test accuracy is greater than a preset accuracy rate.
The Chinese model training method according to claim 1, wherein the labeled training handwritten Chinese image is input to a convolutional neural network-long and short-term memory neural network for training, and the time-series classification algorithm is used for the volume Product neural network-network parameters of long-term and short-term memory neural network are updated to obtain the original handwriting recognition model, including:

Performing feature extraction on the trained handwritten Chinese image in a convolutional neural network to obtain Chinese image features;

Processing the Chinese image features using a first activation function in a hidden layer of the long-term and short-term memory neural network to obtain a neuron carrying an activation state identifier;

Applying a second activation function to the neuron carrying the activation state identifier in the hidden layer of the long-term and short-term memory neural network to obtain the output of the long-term and short-term memory neural network output layer;

According to the output of the long-short-term memory neural network output layer, a time series classification algorithm is used to update network parameters of the convolutional neural network-long-short-term memory neural network to obtain the target handwriting recognition model.
The Chinese model training method according to claim 2, wherein the formula of the time-series classification algorithm is: E loss = -ln∏ (x, z) ∈ S p (z | x), p (z | x) = a (t, u) b (t, u), where p (z | x) represents the probability that the Chinese image feature x is input, and the output of the output layer of the memory neural network is z, a (t, u) represents the forward output of the Chinese image feature corresponding to the u-th order label at the t-th time in the hidden layer of the long-term memory neural network, and b (t, u) represents the u-th order label at the t-th time The corresponding Chinese image features are output backward in the hidden layer of the long-term and short-term memory neural network.
A Chinese image recognition method, comprising:

Obtaining a Chinese image to be identified, where the Chinese image to be identified includes handwritten Chinese characters and background pictures;

Preprocessing the Chinese image to be identified to obtain an original image;

Processing the original image using a kernel density estimation algorithm, removing the background picture, and obtaining a target image including the handwritten Chinese character;

Text positioning the target image using text positioning technology to obtain the text area to be recognized;

Input the text area to be recognized into the target handwriting recognition model for recognition, and obtain handwritten Chinese characters corresponding to each of the text area to be recognized; wherein the target handwriting recognition model adopts the Chinese language according to any one of claims 1-3 Obtained by the model training method.
The Chinese image recognition method according to claim 4, wherein processing the original image by using a kernel density estimation algorithm to obtain a target image retaining the handwritten Chinese characters comprises:

Performing statistics on pixel values in the original image to obtain a histogram of the original image;

Processing the original image histogram using a Gaussian kernel density estimation method to obtain at least one frequency maximum and at least one frequency minimum corresponding to the original image histogram;

Performing hierarchical segmentation processing on the original image based on the frequency maximum and frequency minimum to obtain a layered image;

Based on the layered image, a target image including the handwritten Chinese character is acquired.
The Chinese image recognition method according to claim 5, wherein the acquiring a target image including the handwritten Chinese character based on the layered image comprises:

Performing a binarization process on the layered image to obtain a binarized image;

Detect and mark pixels in the binarized image to obtain a connected area corresponding to the binarized image;

Eroding and superimposing the connected area corresponding to the binary image to obtain the target image including handwritten Chinese characters.
A Chinese model training device, comprising:

Training handwritten Chinese image acquisition module for acquiring training handwritten Chinese images;

A training handwritten Chinese image division module, configured to divide the trained handwritten Chinese image into a training set and a test set according to a preset ratio;

The original handwriting recognition model acquisition module is used to sequentially label the trained handwritten Chinese images in the training set, and input the labeled trained handwritten Chinese images into a convolutional neural network-long-term short-term memory neural network for training. The time series classification algorithm updates the network parameters of the convolutional neural network-long-term and short-term memory neural network to obtain the original handwriting recognition model;

A target handwriting recognition model acquisition module is used to test the original handwriting recognition model using the trained handwritten Chinese images in the test set, and obtain a target handwriting recognition model when the test accuracy rate is greater than a preset accuracy rate.
A Chinese image recognition device, comprising:

A to-be-recognized Chinese image acquisition module, configured to obtain the to-be-recognized Chinese image, wherein the to-be-recognized Chinese image includes handwritten Chinese characters and a background picture;

An original image acquisition module, configured to pre-process the Chinese image to be identified to obtain an original image;

A target image acquisition module, configured to process the original image by using a kernel density estimation algorithm, remove the background picture, and obtain a target image including the handwritten Chinese character;

A to-be-recognized text area acquisition module, configured to use the text positioning technology to perform text positioning on the target image to obtain the to-be-recognized text area;

A handwritten Chinese character acquisition module is configured to input a text area to be recognized into a target handwriting recognition model for recognition, and obtain a handwritten Chinese character corresponding to each of the text area to be recognized.
A computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the following steps when the computer program is executed:

Obtain training handwritten Chinese images;

Dividing the training handwritten Chinese image into a training set and a test set according to a preset ratio;

Sequentially label the trained handwritten Chinese images in the training set, and input the labeled trained handwritten Chinese images into a convolutional neural network-long and short-term memory neural network for training, and adopt a time-series classification algorithm to the convolutional neural network -The network parameters of the long-term and short-term memory neural network are updated to obtain the original handwriting recognition model;

The original handwriting recognition model is tested using the trained handwritten Chinese images in the test set, and the target handwriting recognition model is obtained when the test accuracy is greater than a preset accuracy rate.
The computer device according to claim 9, wherein the labeled training handwritten Chinese image is input to a convolutional neural network-long-term and short-term memory neural network for training, and the convolutional neural network is processed by a time-series classification algorithm. Network-The network parameters of the long-short-term memory neural network are updated to obtain the original handwriting recognition model, including:

Performing feature extraction on the trained handwritten Chinese image in a convolutional neural network to obtain Chinese image features;

Processing the Chinese image features using a first activation function in a hidden layer of the long-term and short-term memory neural network to obtain a neuron carrying an activation state identifier;

Applying a second activation function to the neuron carrying the activation state identifier in the hidden layer of the long-term and short-term memory neural network to obtain the output of the long-term and short-term memory neural network output layer;

According to the output of the long-short-term memory neural network output layer, a time series classification algorithm is used to update network parameters of the convolutional neural network-long-short-term memory neural network to obtain the target handwriting recognition model.
The computer device according to claim 10, wherein the formula of the time-series classification algorithm is: E loss = -ln∏ (x, z) ∈ S p (z | x), p (z | x) = A (t, u) b (t, u), where p (z | x) represents the probability of inputting the Chinese image feature x, the output of the output layer of the memory neural network is z, and a ( t, u) represents the forward output of the Chinese image feature corresponding to the u-th order label at time t in the hidden layer of the long-term memory neural network, and b (t, u) represents the u-th order label at time t The Chinese image features are output backward in the hidden layer of the long-term and short-term memory neural network.
A computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the following steps when the computer program is executed:

Obtaining a Chinese image to be identified, where the Chinese image to be identified includes handwritten Chinese characters and background pictures;

Preprocessing the Chinese image to be identified to obtain an original image;

Processing the original image using a kernel density estimation algorithm, removing the background picture, and obtaining a target image including the handwritten Chinese character;

Text positioning the target image using text positioning technology to obtain the text area to be recognized;

Input the text area to be recognized into the target handwriting recognition model for recognition, and obtain handwritten Chinese characters corresponding to each of the text area to be recognized; wherein the target handwriting recognition model adopts the Chinese language according to any one of claims 1-3 Obtained by the model training method.
The computer device according to claim 12, wherein processing the original image using a kernel density estimation algorithm to obtain a target image retaining the handwritten Chinese characters comprises:

Performing statistics on pixel values in the original image to obtain a histogram of the original image;

Processing the original image histogram using a Gaussian kernel density estimation method to obtain at least one frequency maximum and at least one frequency minimum corresponding to the original image histogram;

Performing hierarchical segmentation processing on the original image based on the frequency maximum and frequency minimum to obtain a layered image;

Based on the layered image, a target image including the handwritten Chinese character is acquired.
The computer device according to claim 13, wherein the acquiring a target image including the handwritten Chinese character based on the layered image comprises:

Performing a binarization process on the layered image to obtain a binarized image;

Detect and mark pixels in the binarized image to obtain a connected area corresponding to the binarized image;

Eroding and superimposing the connected area corresponding to the binary image to obtain the target image including handwritten Chinese characters.
One or more non-volatile readable storage media storing computer readable instructions, characterized in that when the computer readable instructions are executed by one or more processors, the one or more processors are caused to execute The following steps:

Obtain training handwritten Chinese images;

Dividing the training handwritten Chinese image into a training set and a test set according to a preset ratio;

Sequentially label the trained handwritten Chinese images in the training set, and input the labeled trained handwritten Chinese images into a convolutional neural network-long and short-term memory neural network for training, and adopt a time-series classification algorithm to the convolutional neural network -The network parameters of the long-term and short-term memory neural network are updated to obtain the original handwriting recognition model;

The original handwriting recognition model is tested using the trained handwritten Chinese images in the test set, and the target handwriting recognition model is obtained when the test accuracy is greater than a preset accuracy rate.
The non-volatile readable storage medium according to claim 15, wherein the labeled training handwritten Chinese image is input to a convolutional neural network-long-short-term memory neural network for training, and a time-series classification algorithm is used. Updating the network parameters of the convolutional neural network-long and short-term memory neural network to obtain an original handwriting recognition model, including:

Performing feature extraction on the trained handwritten Chinese image in a convolutional neural network to obtain Chinese image features;

Processing the Chinese image features using a first activation function in a hidden layer of the long-term and short-term memory neural network to obtain a neuron carrying an activation state identifier;

Applying a second activation function to the neuron carrying the activation state identifier in the hidden layer of the long-term and short-term memory neural network to obtain the output of the long-term and short-term memory neural network output layer;

According to the output of the long-short-term memory neural network output layer, a time series classification algorithm is used to update network parameters of the convolutional neural network-long-short-term memory neural network to obtain the target handwriting recognition model.
The non-volatile readable storage medium according to claim 16, wherein the formula of the time series classification algorithm is: E loss = -ln∏ (x, z) ∈ S p (z | x), p (z | x) = a (t, u) b (t, u), where p (z | x) represents the input of the Chinese image feature x, and the output of the output layer of the memory neural network in the short-term and long-term is The probability of z, a (t, u) represents the forward output of the Chinese image feature corresponding to the uth order label at time t in the hidden layer of the memory neural network, and b (t, u) represents the The Chinese image features corresponding to the u sequential labels are output backward in the hidden layer of the long-term and short-term memory neural network.
One or more non-volatile readable storage media storing computer readable instructions, characterized in that when the computer readable instructions are executed by one or more processors, the one or more processors are caused to execute The following steps:

Obtaining a Chinese image to be identified, where the Chinese image to be identified includes handwritten Chinese characters and background pictures;

Preprocessing the Chinese image to be identified to obtain an original image;

Processing the original image using a kernel density estimation algorithm, removing the background picture, and obtaining a target image including the handwritten Chinese character;

Text positioning the target image using text positioning technology to obtain the text area to be recognized;

Input the text area to be recognized into the target handwriting recognition model for recognition, and obtain handwritten Chinese characters corresponding to each of the text area to be recognized; wherein the target handwriting recognition model adopts the Chinese language according to any one of claims 1-3 Obtained by the model training method.
The non-volatile readable storage medium of claim 18, wherein processing the original image using a kernel density estimation algorithm to obtain a target image retaining the handwritten Chinese characters comprises:

Performing statistics on pixel values in the original image to obtain a histogram of the original image;

Processing the original image histogram using a Gaussian kernel density estimation method to obtain at least one frequency maximum and at least one frequency minimum corresponding to the original image histogram;

Performing hierarchical segmentation processing on the original image based on the frequency maximum and frequency minimum to obtain a layered image;

Based on the layered image, a target image including the handwritten Chinese character is acquired.
The non-volatile readable storage medium according to claim 19, wherein the acquiring a target image including the handwritten Chinese character based on the layered image comprises:

Performing a binarization process on the layered image to obtain a binarized image;

Detect and mark pixels in the binarized image to obtain a connected area corresponding to the binarized image;

Eroding and superimposing the connected area corresponding to the binary image to obtain the target image including handwritten Chinese characters.