WO2019232850A1

WO2019232850A1 - Method and apparatus for recognizing handwritten chinese character image, computer device, and storage medium

Info

Publication number: WO2019232850A1
Application number: PCT/CN2018/094222
Authority: WO
Inventors: 高梁梁; 周罡
Original assignee: 平安科技（深圳）有限公司
Priority date: 2018-06-04
Filing date: 2018-07-03
Publication date: 2019-12-12
Also published as: CN109002756A

Abstract

A method and apparatus for recognizing a handwritten Chinese character image, a computer device, and a storage medium. The method for recognizing a handwritten Chinese character image comprises: obtaining an original image, the original image comprising a handwritten Chinese character and a background picture (S10); pre-processing the original image to obtain a valid image (S20); processing the valid image using a kernel density estimation algorithm, and removing the background picture to obtain a target image comprising the handwritten Chinese character (S30); performing single character-based cutting on the target image using a vertical projection method to obtain a single character image to be recognized (S40); and inputting the single character image to be recognized to a target handwritten character recognition model of a long short-term memory neural network for recognition to obtain a handwritten Chinese character corresponding to the single character image to be recognized (S50). The method for recognizing a handwritten Chinese character image can effectively recognize similar Chinese characters having complex structures, thereby improving the accuracy of the recognition of a handwritten character image.

Description

Handwritten Chinese character image recognition method and device, computer equipment and storage medium

This patent application is based on a Chinese invention patent application with application number 201810564691.6 filed on June 4, 2018 and entitled "Method, Device, Computer Equipment, and Storage Medium for Handwritten Chinese Character Recognition", and claims its priority.

Technical field

The present application relates to the field of image recognition, and in particular, to a method, a device, a computer device, and a storage medium for recognizing a handwritten Chinese character image.

Background technique

There are many types of Chinese characters, such as "Songti, Kaiti, Yaoti and imitation Song". Among them, the structure of some Chinese characters is relatively complicated, such as "魑, charm", and there are many structurally similar characters in Chinese characters, such as "accept and love". For standard, simple and standardized sentences, OCR (optical character recognition) technology can be used to recognize them, but for sentences composed of handwritten characters, because each person's writing habits are different and not standard Chinese characters composed of horizontal and vertical skimming, When OCR technology is used for recognition, there will be inaccurate recognition. For some similar Chinese characters that are not composed of simple strokes, the recognition accuracy will decrease, affecting the recognition of handwritten Chinese characters.

Summary of the Invention

Based on this, it is necessary to provide a method, device, computer equipment, and storage medium for handwritten Chinese character image recognition in response to the above technical problems.

A handwritten Chinese character image recognition method includes:

Obtaining an original image, the original image including handwritten Chinese characters and a background picture;

Preprocessing the original image to obtain a valid image;

Adopting a kernel density estimation algorithm and processing the effective image to remove the background image and obtain a target image including the handwritten Chinese character;

Performing a single font cutting on the target image using a vertical projection method to obtain a single font image to be identified;

The single font image to be recognized is input to a target handwriting recognition model based on a long-term and short-term memory neural network for recognition, and handwritten Chinese characters corresponding to the single font image to be recognized are obtained.

A handwritten Chinese character image recognition device includes:

An original image acquisition module, configured to acquire an original image, where the original image includes handwritten Chinese characters and a background picture;

An effective image acquisition module, configured to pre-process the original image to obtain an effective image;

A target image acquisition module, configured to process the effective image by using a kernel density estimation algorithm to acquire a target image in which the handwritten Chinese character is retained;

A to-be-recognized single-font image acquisition module, configured to adopt a kernel density estimation algorithm and process the effective image, remove the background picture, and obtain a target image including the handwritten Chinese character;

A handwritten Chinese character acquisition module is configured to input the single font image to be recognized into a target handwriting recognition model based on a long-term and short-term memory neural network for recognition, and obtain a handwritten Chinese character corresponding to the single font image to be recognized.

A computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, the following steps are implemented:

Preprocessing the original image to obtain a valid image;

One or more non-volatile readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the following steps:

Preprocessing the original image to obtain a valid image;

Details of one or more embodiments of the present application are set forth in the accompanying drawings and description below. Other features and advantages of the application will become apparent from the description, the drawings, and the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to explain the technical solutions of the embodiments of the present application more clearly, the drawings used in the description of the embodiments of the application will be briefly introduced below. Obviously, the drawings in the following description are just some embodiments of the application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without paying creative labor.

FIG. 1 is an application scenario diagram of a handwritten Chinese character image recognition method according to an embodiment of the present application;

2 is a flowchart of a method for recognizing a handwritten Chinese character image according to an embodiment of the present application;

FIG. 3 is a specific flowchart of step S20 in FIG. 2;

FIG. 4 is a specific flowchart of step S30 in FIG. 2;

5 is a specific flowchart of step S34 in FIG. 4;

6 is another flowchart of a method for recognizing handwritten Chinese characters in an embodiment of the present application;

FIG. 7 is a specific flowchart of step S63 in FIG. 6;

8 is a schematic diagram of a handwritten Chinese character image recognition device according to an embodiment of the present application;

FIG. 9 is a schematic diagram of a computer device in an embodiment of the present application.

Detailed ways

In the following, the technical solutions in the embodiments of the present application will be clearly and completely described with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of this application.

The handwritten Chinese character image recognition method provided in the embodiment of the present application can be applied in an application environment as shown in FIG. 1. The application environment of the handwritten Chinese character image recognition method includes a server and a computer device, wherein the computer device communicates with the server through a network, and the computer device is a device that can perform human-computer interaction with a user, including, but not limited to, a computer, a smartphone, and a tablet. device. The handwritten Chinese character image recognition method provided in the embodiment of the present application is applied to a server.

In an embodiment, as shown in FIG. 2, a handwritten Chinese character image recognition method is provided. The method is applied to the server in FIG. 1 as an example for description, and includes the following steps:

S10: Obtain the original image. The original image includes handwritten Chinese characters and background pictures.

The original image is an unprocessed image containing handwritten Chinese characters collected by a collection module on a computer device. The original image includes handwritten Chinese characters and background pictures. The background picture is a noise picture other than handwritten Chinese characters in the original image. Noise pictures are pictures that interfere with handwritten Chinese characters. In this embodiment, a user may collect an original image containing handwritten Chinese characters and upload it to a server through a collection module on a computer device, so that the server obtains the original image. The acquisition module includes but is not limited to camera shooting and local upload.

S20: Preprocess the original image to obtain a valid image.

Among them, the effective image is an image obtained by pre-processing the original image and excluding interference factors. Specifically, since the original image may contain multiple interference factors, such as multiple colors, it is not conducive to subsequent recognition. Therefore, the original image needs to be pre-processed to obtain an effective image that excludes interference factors. The effective image can be understood as the image obtained after the original image excludes the background image.

In an embodiment, as shown in FIG. 3, in step S20, the original image is pre-processed to obtain a valid image, which specifically includes the following steps:

S21: Enlarge and grayscale the original image to obtain a grayscale image.

The grayscale image is a grayscale image obtained after the original image is enlarged and grayscaled. The grayed image includes a matrix of pixel values. The pixel value matrix refers to a matrix containing pixel values corresponding to each pixel in the original image. In this embodiment, the server uses the imread function to read the pixel value of each pixel in the original image, and performs enlargement and grayscale processing on the original image to obtain a grayscale image. The imread function is a function in computer language for reading pixel values in an image file. The pixel value is a value assigned by the computer when the original image is digitized.

The original image may contain multiple colors, and the color itself is very susceptible to factors such as light. There are many changes in the color of similar objects, so the color itself is difficult to provide key information, so the original image needs to be grayed. In order to eliminate interference, reduce the complexity of the image and the amount of information processing. However, if the size of the handwritten Chinese characters in the original image is small, if the grayscale processing is performed directly, the thickness of the strokes of the handwritten Chinese characters will be too small and will be excluded as interference items. Therefore, in order to increase the thickness of the text strokes, The original image is enlarged and then gray-scaled to avoid direct gray-scale processing, which leads to the problem that the thickness of the strokes of the handwritten Chinese characters is too small and excluded as interference items.

Specifically, the server enlarges the original image according to the following formula: x → x ^r , where x represents an element in the matrix M, r is the number of times, and the changed element x ^{r is} replaced with x in the pixel value matrix M.

Graying is a process that renders the original image with a noticeable black and white effect. Specifically, performing grayscale processing on the enlarged image includes: the color of each pixel in the original image is determined by three components of R (red), G (green), and B (blue), and each The component has 256 values from 0 to 255 (0 is the darkest, and 255 is the brightest, white). The grayscale image is a special color image with the same three components of R, G, and B. In this embodiment, the server may directly use the imread function to read the original image, and the specific values of the three components of R, G, and B corresponding to each pixel in the grayscale image may be obtained.

S22: Standardize the grayscale image to obtain a valid image.

Among them, the standardization process refers to a process of performing a standard transformation process on a grayscale image to transform it into a fixed standard form. Specifically, because the pixel values of each pixel in the grayscale image are scattered, the magnitude of the data is not uniform, which will affect the accuracy of subsequent model recognition. Therefore, the grayscale image needs to be standardized to uniformize the magnitude of the data. .

Specifically, the server standardizes the grayscale image by using a formula for normalization processing to avoid the problem that the pixel values in the grayscale image are scattered and the order of data is not uniform. Among them, the standardization formula is

X is the pixel value of the grayscale image M, X ′ is the pixel value of the effective image, M _min is the smallest pixel value in the grayscale image M, and M _max is the largest pixel value in the grayscale image M.

S30: Use a kernel density estimation algorithm to process the effective image, remove the background image, and obtain a target image including handwritten Chinese characters.

Among them, the kernel density estimation algorithm (kernel density estimation) is a non-parametric method that studies the data distribution characteristics from the data sample itself to estimate the probability density function. The target image refers to an image that contains only handwritten Chinese characters by processing a valid image using a kernel density estimation algorithm. Specifically, the server uses a kernel density estimation algorithm to process the effective image to eliminate background image interference and obtain a target image including handwritten Chinese characters.

Specifically, the calculation formula of the kernel density estimation algorithm is

Among them, K (.) Is the kernel function, h is the pixel value range, x is the pixel value of the pixel whose probability density is to be estimated, x _i is the i-th pixel value in the h range, and n is the pixel value x in the h range. Number of

Represents the estimated probability density of a pixel.

In an embodiment, as shown in FIG. 4, in step S30, a kernel density estimation algorithm is used to process an effective image to obtain a target image including handwritten Chinese characters, which specifically includes the following steps:

S31: Perform statistics on the pixel values in the effective image to obtain a valid image histogram.

The effective image histogram is a histogram obtained by statistically calculating pixel values in the effective image. Histogram (Histogram) is a kind of statistical report diagram that represents the distribution of data by a series of vertical stripes or line segments of varying heights. In this embodiment, the horizontal axis of the effective image histogram represents a pixel value, and the vertical axis represents a frequency of occurrence corresponding to the pixel value. The server obtains the effective image histogram by counting the pixel values in the effective image, so that it can intuitively see the distribution of the pixel values in the effective image, and provides technical support for subsequent Gaussian kernel density estimation algorithms.

S32: Use a Gaussian kernel density estimation algorithm to process the effective image histogram to obtain at least one frequency maximum and at least one frequency minimum corresponding to the effective image histogram.

Among them, the Gaussian kernel density estimation algorithm refers to a kernel density estimation method in which the kernel function is a Gaussian kernel function. The formula of the Gaussian kernel function is

Among them, K _(x) refers to a Gaussian kernel function in which pixels (independent variables) are x, x refers to a pixel value in an effective image, and e and π are constants. Frequency maxima refer to the maxima at different frequency intervals in the frequency distribution histogram. The frequency minimum value refers to the minimum value corresponding to the frequency maximum value in the same frequency interval in the frequency distribution histogram.

Specifically, a Gaussian kernel density function estimation method is used to perform Gaussian smoothing on the frequency distribution histogram corresponding to the effective image, and obtain a Gaussian smooth curve corresponding to the frequency distribution histogram. Based on the frequency maxima and frequency minima on the Gaussian smooth curve, obtain the pixel values on the horizontal axis corresponding to the frequency maxima and frequency minima in order to subsequently based on the obtained frequency maxima and frequency minima Corresponding pixel values are convenient for layered segmentation processing of effective images to obtain layered images.

S33: Perform hierarchical segmentation processing on the effective image based on the frequency maximum and frequency minimum to obtain a layered image.

Among them, the layered image is an image obtained by performing hierarchical segmentation processing on the effective image based on the maximum value and the minimum value. The server first obtains the pixel values corresponding to the maximum frequency value and the minimum frequency value, and processes the effective image according to the pixel values corresponding to the maximum frequency value. How many frequency maximum values are in the effective image, the corresponding effective image The number of pixel values is divided into classes; then the pixel value corresponding to the minimum frequency value is used as the boundary value between the classes, and the effective image is layered according to the class and the boundary between the classes to obtain the layered image.

For example, the pixel values corresponding to the frequency maximum in the effective image are 18, 59, 95, 118, and 153, and the pixel values corresponding to the minimum frequency are 27, 65, 105, and 133, respectively. According to the number of frequency maxima in the effective image, it can be determined that the pixel values of the effective image can be divided into 5 categories, the effective image can be divided into 5 layers, and the pixel values corresponding to the minimum frequency are used as the Boundary value, because the minimum pixel value is 0 and the maximum pixel value is 255. Therefore, according to the boundary value between classes, a layered image with a pixel value of 18 can be determined, and the pixel value corresponding to the layered image is [ 0,27); a layered image with a pixel value of 59 and the corresponding pixel value is [27,65); a layered image with a pixel value of 95 and the corresponding pixel value is [ 65,105); a layered image with a pixel value of 118, which corresponds to a pixel value of [105,133); a layered image with a pixel value of 153, which corresponds to a pixel value of [133,255].

S34: Obtain a target image including handwritten Chinese characters based on the layered image.

After obtaining the layered image, the server performs binarization, erosion, and superposition processing on the layered image to obtain a target image including handwritten Chinese characters. The binarization process refers to a process in which the pixel value of a pixel on a layered image is set to 0 (black) or 1 (white), and the entire layered image presents an obvious black and white effect. After the layered image is binarized, the binarized layered image is corroded to remove the background image part and retain the handwritten Chinese characters on the layered image. Because the pixel values on each layered image are pixel values belonging to different ranges, after the layered image is corroded, each layered image needs to be superimposed to generate a target image containing only handwritten Chinese characters. The superimposing process refers to a process of superimposing a layered image with only a handwritten portion into an image, thereby achieving the purpose of obtaining a target image containing only handwritten Chinese characters. In this embodiment, the layered image is superimposed using the imadd function to obtain a target image containing only handwritten Chinese characters. The imadd function is a function in computer language for superimposing layered images.

In an embodiment, as shown in FIG. 5, in step S34, that is, based on the layered image, obtaining a target image including handwritten Chinese characters, specifically includes the following steps:

S341: Perform binarization processing on the layered image to obtain a binarized image.

A binarized image refers to an image obtained by binarizing a sub-image. Specifically, after the server obtains the layered image, it compares the sampled pixel value of the layered image with a preselected threshold, and sets the pixel value greater than or equal to the threshold to 1 and the pixel value less than the threshold to 0. process. The sampled pixel value is the pixel value corresponding to each pixel point in the layered image. The size of the threshold value will affect the effect of the binarization process of the layered image. When the threshold value is selected properly, the effect of the binarization process on the layered image is better; when the threshold value is not selected properly, the effect of the binarization process of the layered image will be affected. effect. To facilitate operations and simplify the calculation process, the threshold in this embodiment is determined by the developer based on experience. Binarize the layered image to facilitate subsequent corrosion treatment.

S342: Detect pixels in the binarized image to obtain a connected region corresponding to the binarized image.

The connected area refers to an area surrounded by adjacent pixels around a specific pixel. In a binarized image, a connected region means that the neighboring pixels around it are all 0, and a specific pixel and the neighboring pixel are 1, for example, a particular pixel is 0, and the surrounding neighboring pixels are 1, and the neighboring pixels are surrounded. The resulting area is used as the connected area.

Specifically, the binarized image corresponds to a pixel matrix, which includes rows and columns. Detecting pixels in a binarized image specifically includes the following processes: (1) Scan the pixel matrix line by line, group consecutive white pixels in each line into a sequence called a cluster, and note its starting point, End point and line number. (2) For the clique in all rows except the first row, if it does not overlap with any clique in the previous row, give it a new label; if it only overlaps with a clique in the previous row , Assign the label of the group in the previous line to it; if it has a coincident area with more than 2 groups in the previous line, give the current group a minimum label of the associated group, and assign these The tokens in the clique are written into equivalent pairs, indicating that they belong to a class. For example, if there are 2 clusters (1 and 2) in the second row with overlapping areas, then the smallest number given to the 2 clusters in the previous row is 1, and the groups in the previous row are assigned The equivalence pair written by the tag will be recorded as (1, 2) equivalence pair. Equivalent pairs refer to the marks of two cliques connected to each other. For example, (1, 2) indicates that the clique of mark 1 and the clique of mark 2 are connected to each other, which is a connected region. In this embodiment, eight adjacent pixels adjacent to a specific pixel in the pixel matrix are used as the connected region of the element.

S343: Eroding and superimposing the connected area corresponding to the binary image to obtain a target image including handwritten Chinese characters.

Among them, the etching process is an operation for removing the content of a part of an image in morphology. The built-in imerode function is used to etch the connected areas of the binary image. Specifically, etching the connected region corresponding to the binarized image includes the following steps: First, an n × n structural element is selected. In this embodiment, the value of 8 elements adjacent to each element in the pixel matrix is used as The connected region of this element is, therefore, the selected structural element is a 3 × 3 pixel matrix. The structural element is an n × n pixel matrix, where the matrix elements include 0 or 1. Scan the pixel matrix of the layered binarized image to obtain pixels with a pixel value of 1 and compare whether the 8 adjacent pixels adjacent to the pixel are all 1; if they are all 1, they remain unchanged; if not, If it is 1, the 8 adjacent pixels adjacent to the pixel point in the pixel matrix will become 0 (black). The part that becomes 0 is the part where the layered binarized image is corroded. Matlab is an application software for numerical calculations in the field of mathematical technology applications.

The binarized image is filtered based on the preset anti-corrosion capability range of the hand-written region. Partial deletion of the binary image that is not within the anti-corrosion capability of the hand-written region is obtained to obtain the anti-corrosion capability of the hand-written region in the binary image Within the range. The target pixel image containing only handwritten Chinese characters can be obtained by superimposing the pixel matrix corresponding to each binarized image portion that fits the range of the corrosion resistance of the handwritten area. Among them, the anti-corrosion ability of the hand-written area can adopt the formula:

Calculated, s ₁ represents the total area after being corroded in the binarized image, s ₂ represents the total area before being corroded in the binarized image, and p is the corrosion resistance of the handwritten area.

For example, the preset anti-corrosion range of the handwriting area is [0.05,0.8], according to the formula

Calculate the ratio p between the total area of each binarized image and the total area before the binarized image. By calculating the ratio p of the total area after erosion to the total area before erosion in the binarized image, which is not in the range of the anti-corrosion capability of the handwritten area, it means that the binarized image of the area is a background image instead of Write by hand and need to be etched to remove the background image. If the ratio p of the total area after erosion to the total area before erosion in the binarized image is in the range of [0.05, 0.8], it means that the binarized image of the region is a handwritten Chinese character, which needs to be retained. The pixel matrix corresponding to the retained binary image is superimposed to obtain a target image containing handwritten Chinese characters.

In steps S341-S343, the layered image is binarized to obtain a binarized image, and then pixels in the binarized image are detected and labeled, and connected areas corresponding to the binarized image are obtained. The elements in the identical pixel matrix all become 0, the binarized image with element 0 is black, and the black part is the corroded part of the binarized image. The total area of the binarized image is calculated by calculating And the ratio of the total area of the binarized image before being eroded, to determine whether the ratio is within the preset anti-corrosion range of the handwriting area, in order to remove the background image in each layered image, retain the handwritten Chinese characters, and finally replace each A layered image is superimposed to achieve the purpose of obtaining the target image.

S40: Single-font cutting is performed on the target image using a vertical projection method to obtain a single-font image to be identified.

Among them, the vertical projection method refers to a method of vertically projecting each line of handwritten Chinese characters in a target image to obtain a vertical projection histogram. The vertical projection histogram refers to the number of pixels reflecting the target image in the vertical direction.

Specifically, using the vertical projection method to perform single font cutting on the target image specifically includes the following steps: the server scans at least one line of handwritten Chinese characters in the target image line by line to obtain pixel values corresponding to each line of handwritten Chinese characters, and corresponds to each pixel value. The vertical projection histogram is used to obtain the number of pixels corresponding to different pixel values. According to the minimum value in the vertical projection histogram, the target image is cyclically cut to obtain a single font image to be identified. Understandably, the pixel value corresponding to each handwritten Chinese character is relatively concentrated, and the pixel value corresponding to the gap between the Chinese character and the Chinese character is relatively sparse. The density of the corresponding pixel value is reflected in the corresponding vertical projection histogram. In the vertical projection histogram, the number of pixels corresponding to the pixel values of Chinese characters is relatively high, and the number of pixels corresponding to the pixel values of no Chinese characters is relatively low. The vertical projection method can effectively perform single font cutting on the target image to obtain the single font to be identified. The image is simple to implement and provides technical support for subsequent handwriting recognition.

S50: Input the single-font image to be recognized into a target handwriting recognition model based on a long-term and short-term memory neural network for recognition, and obtain handwritten Chinese characters corresponding to the single-font image to be recognized.

Among them, the target handwriting recognition model is a model for handwriting recognition previously trained based on long and short-term memory neural networks. Long-short-term memory neural (LSTM) network is a kind of time-recursive neural network, which is suitable for processing and predicting important events with time series and time series with relatively long intervals and delays. Specifically, the server inputs the to-be-recognized word image into the target handwriting recognition model for recognition, so that the target handwriting recognition model can contact the context for recognition, obtain handwritten Chinese characters corresponding to each to-be-recognized word image, and improve recognition accuracy.

In this embodiment, the user may upload the original image containing the handwritten Chinese characters collected by the acquisition module on the computer device to the server, so that the server obtains the original image. Then, the server preprocesses the original image to obtain a valid image that excludes interference factors. The kernel density estimation algorithm is used to process the effective image, remove the background image, and obtain the target image containing only handwritten Chinese characters to further eliminate interference. The vertical projection method is used to cut the single font of the target image to obtain the single font image to be recognized, which is easy to implement. The to-be-recognized single-font image is input to a target handwriting recognition model based on long-term and short-term memory neural network for recognition, so that the to-be-recognized single-font image has timeliness, so that the target handwriting recognition model can contact the context for identification, and obtain The handwritten Chinese characters corresponding to the single font image improve the recognition accuracy.

In one embodiment, the handwritten Chinese character image recognition method further includes: training a target handwriting recognition model in advance. Specifically, as shown in FIG. 6, pre-training the target handwriting recognition model includes the following steps:

S61: Acquire a training handwritten Chinese character image.

The training handwritten Chinese character image is a sample image collected from an open source library for model training in advance. The training handwritten Chinese character image includes N (N is a positive integer) handwriting samples corresponding to each Chinese character in the secondary Chinese character library. The Chinese secondary character library is a very useful Chinese character library that is coded in the order of radical strokes of Chinese characters. Specifically, N handwritten sample images of different people's handwriting in the open source library are collected as training handwritten Chinese character images, so that the server obtains the trained handwritten Chinese character images. Because different users have different writing habits, N handwritten samples are used (i.e. Training handwritten Chinese character images) for training, which greatly improves the generalization of the model.

S62: Use a vertical projection method to perform single font cutting on the training handwritten Chinese character image to obtain a training single font image.

The cutting process of single font cutting of the training handwritten Chinese character image by the vertical projection method is the same as step S40. To avoid repetition, details are not described herein again. The training single font image is a single font image used for input model training.

S63: Annotate the training single font images sequentially, and input the labeled training single font images into the long-term and short-term memory neural network for training, and use a stochastic gradient descent algorithm to update the network parameters of the long-term and short-term memory neural network to obtain the target Handwriting recognition model.

Among them, the random gradient descent algorithm uses a randomly selected sample (training single font image) for updating each time when updating network parameters, instead of using all samples for updating, speeding up the training rate. The network parameters are the weights and offsets between the layers of the long- and short-term memory neural network. The long-term and short-term memory neural network has the function of time memory, so it is used to process the training single font image carrying the time series state.

The long-short-term memory neural network has a network structure of an input layer, at least one hidden layer, and an output layer. The input layer is the first layer of the long-term and short-term memory neural network, which is used to receive external signals, that is, it is responsible for receiving training single font images. The output layer is the last layer of the long-term and short-term memory neural network, which is used to output signals to the outside world, that is, it is responsible for outputting the calculation results of the long-term and short-term memory neural network. Hidden layers are layers other than the input layer and the output layer in the long-term and short-term memory neural network, which are used to process the training single font image and obtain the calculation results of the long-term and short-term memory neural network. Understandably, the use of long-term and short-term memory neural network for model training increases the timeliness of the training single font image, so as to train the training single font image according to the context, thereby improving the accuracy of the target handwriting recognition model. In this embodiment, the output layer of the long-term and short-term memory neural network uses Softmax (regression model) for regression processing, and is used to classify the output weight matrix. Softmax (regression model) is a classification function commonly used in neural networks. It maps the output of multiple neurons into the [0,1] interval, which can be understood as a probability. It is simple and convenient to calculate, so as to perform multi-classification. Output to make its output more accurate.

In this embodiment, a training handwritten Chinese character image is first acquired, and a single font cutting is performed on the training handwritten Chinese character image using a vertical projection method to obtain a training single font image, so as to sequentially label the training single font image, so that the training single font image has timing Performance, input the labeled training single font image into the long-term and short-term memory neural network for training, according to the time series of the training single-font image, so that the short-term memory neural network trains the training single-font image according to the context, thereby improving the goal Accuracy of handwriting recognition model.

In an embodiment, as shown in FIG. 7, in step S63, the training single font image is sequentially labeled, and the labeled training single font image is input to a long-term and short-term memory neural network for training, and random gradient descent is used. The algorithm updates the network parameters of the long-term and short-term memory neural network to obtain the target handwriting recognition model, which specifically includes the following steps:

S631: In the hidden layer of the long-term and short-term memory neural network, the training single font image is processed by using the first activation function to obtain a neuron carrying an activation state identifier.

Among them, each neuron in the hidden layer of the long-term and short-term memory neural network includes three gates, which are an input gate, a forgetting gate, and an output gate, respectively. The forget gate determines the past information to be discarded in the neuron. The input gate determines the information to be added to the neuron. The output gate determines the information to be output in the neuron. The first activation function is a function for activating a neuron state. The state of the neuron determines the information discarded, added, and output by each gate (ie, input gate, forget gate, and output gate). The activation status flag includes a pass flag and a fail flag. The identifiers corresponding to the input gate, the forget gate, and the output gate in this embodiment are i, f, and o, respectively.

In this embodiment, the Sigmoid (S-shaped growth curve) function is specifically selected as the first activation function. The Sigmoid function is a S-shaped function common in biology. In information science, due to its single increase and inverse function single increase In other properties, the Sigmoid function is often used as a threshold function for neural networks, mapping variables between 0 and 1. The calculation formula for its activation function is

Among them, z represents the output value of the forget gate.

Specifically, the forgetting gate includes a forgetting threshold. By calculating the activation state of each neuron (training font image), a neuron carrying an activation state identifier as a pass identifier is obtained. Among them, the calculation formula of the forgetting gate is f _t = σ (W _f · [h _t-1 , x _t ] + b _f ) to calculate which information of the forgetting gate is received (that is, only the neurons carrying the activation status flag as the pass flag are received). ), F _t represents the forgetting threshold (that is, the activation state), W _f represents the weight matrix of the forgetting gate, b _f represents the weight bias term of the forgetting gate, h _t-1 represents the output of the neuron at the previous moment, and x _t represents The input data at the current time (that is, the training single font image), t represents the current time, and t-1 represents the previous time. The forgetting gate also includes the forgetting threshold. The calculation of the font image of the training single through the calculation formula of the forgetting gate will obtain a scalar in the range of 0-1. This scalar determines the past information received by the neuron based on the comprehensive judgment of the current state and the past state. To achieve data reduction, reduce the amount of calculation, and improve training efficiency.

S632: In the hidden layer of the long-term and short-term memory neural network, a second activation function is used to process the neuron carrying the identification of the activation state to obtain the output value of the hidden layer of the long-term and short-term memory neural network.

The output value of the hidden layer of the long-term and short-term memory neural network includes the output value of the input gate, the output value of the output gate, and the state of the neuron. Specifically, in the input gate in the hidden layer of the long-term and short-term memory neural network, a second activation function is used to carry the activation state identifier to perform calculation through the identified neurons to obtain the output value of the hidden layer. In this embodiment, because the expressive ability of the linear model is insufficient, a tanh (hyperbolic tangent) function is used as the activation function of the input gate (ie, the second activation function). Non-linear factors can be added to make the trained target handwriting recognition model Able to solve more complex problems. In addition, the activation function tanh (hyperbolic tangent) has the advantage of fast convergence speed, which can save training time and increase training efficiency.

Specifically, the output value of the input gate is calculated by a calculation formula of the input gate. Wherein the input gate further includes a calculation formula input threshold, the input gate is _{_{i t = σ (W i ·}} [h t-1, x t] + b i), W i is the weight of input gates value matrix, i _t Represents the input threshold, and b _i represents the bias term of the input gate. The calculation of the font image of the training single through the calculation formula of the input gate will obtain a 0-1 interval scalar (that is, the input threshold). This scalar controls the neuron according to the current The state and the past state comprehensively judge the proportion of the current information received, that is, the proportion of the newly input information, to reduce the amount of calculation and improve the training efficiency.

Calculation formula using neuron state

with

Calculate the current neuron state; where W _c represents the weight matrix of the neuron state, b _c represents the bias term of the neuron state,

Represents the state of the neuron at the previous moment, and C _t represents the state of the neuron at the current moment. By performing a dot product operation on the state of the neuron and the forgetting threshold (input threshold), the model can only output the required information, thereby improving the efficiency of model learning.

Finally, the output gate calculation formula o _t = σ (W _o [h _t-1 , x _t ] + b _o ) is used to calculate which information is output in the output gate, and then the formula h _t = o _t * tanh (C _t ) Calculate the output value of the neuron at the current moment, where o _t represents the output threshold, W _o represents the weight matrix of the output gate, _bo represents the bias term of the output gate, and h _t represents the output value of the current neuron.

S633: According to the output value of the hidden layer of the long-term and short-term memory neural network, the network parameters of the long-term and short-term memory neural network are updated by using a stochastic gradient descent algorithm to obtain a target handwriting recognition model.

The calculation formula of the stochastic gradient descent algorithm is specifically

with

Among them, J (θ) is a loss function, m is the number of selected training single font images and m = 1, θ _j is the network parameter of the j-th layer long-term and short-term memory neural network, and h _θ (x) is the long- and short-term memory neural network. The output value of the network hidden layer, (x ⁱ , y ⁱ ) represents the i-th training single font image.

First, build a formula based on the loss function

Construction loss function, wherein, in, J (θ) is the loss function, m represents the number of training single font images of the selected and m = 1, θ _j represents the network parameter layer j length memory networks, such as W _i or b _i , h _θ (x) represents the output value of the hidden layer of the long-term and short-term memory neural network, (x ⁱ (training font image), y ⁱ (real result)) represents the i-th training font image. Because the stochastic gradient descent algorithm is used in this case, that is, each time the network parameters are updated, a randomly selected sample (a training single font image) is used for the update. Therefore, m = 1 in the loss function formula. By formula

Perform partial derivative operations on the loss function to update the network parameters, that is, the weights and offsets between the layers, and apply the updated weights and offsets of the layers to the long-term and short-term memory neural network. A target handwriting recognition model can be obtained.

Further, each weight in the target handwriting recognition model implements the functions of the target handwriting recognition model to decide which old information to discard, which new information to add, and which information to output. In the output layer of the target handwriting recognition model, a probability value is finally output. The probability value refers to the probability that the training single font image recognizes the corresponding Chinese character. It can be widely used in handwriting recognition to accurately identify the handwritten image. .

It should be understood that the size of the sequence numbers of the steps in the above embodiments does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application.

In one embodiment, FIG. 8 shows a schematic diagram of a handwritten Chinese character image recognition device corresponding to the handwritten Chinese character image recognition method in the above embodiment. As shown in FIG. 8, the handwritten Chinese character image recognition device includes an original image acquisition module 10, a valid image acquisition module 20, a target image acquisition module 30, a single-font image acquisition module 40 to be identified, and a handwritten Chinese character acquisition module 50. Each functional module is detailed described as follows:

The original image obtaining module 10 is configured to obtain an original image, where the original image includes handwritten Chinese characters and a background picture.

The effective image acquisition module 20 is configured to pre-process the original image to obtain a valid image.

The target image acquisition module 30 is configured to process a valid image by using a kernel density estimation algorithm, remove a background picture, and obtain a target image including handwritten Chinese characters.

The to-be-recognized single-font image acquisition module 40 is configured to obtain a to-be-recognized single-font image by performing single-font cutting on a target image using a vertical projection method.

A handwritten Chinese character acquisition module 50 is configured to input a single font image to be recognized into a target handwriting recognition model based on a long-term and short-term memory neural network for recognition, and obtain a handwritten Chinese character corresponding to the single font image to be recognized.

Specifically, the effective image acquisition module 20 includes a grayscale image acquisition unit 21 and an effective image acquisition unit 22.

A grayscale image acquisition unit 21 is configured to perform enlargement and grayscale processing on an original image to obtain a grayscale image.

The effective image obtaining unit 22 is configured to perform normalization processing on the grayscale image to obtain an effective image, wherein the formula of the normalization processing is

Specifically, the target image acquisition module 30 includes an effective image histogram acquisition unit 31, a frequency extreme value acquisition unit 32, a layered image acquisition unit 33, and a target image acquisition unit 34.

The effective image histogram acquisition unit 31 is configured to perform statistics on pixel values in the effective image to obtain an effective image histogram.

A frequency extreme value acquisition unit 32 is configured to process a valid image histogram by using a Gaussian kernel density estimation algorithm, and obtain at least one frequency maximum value and at least one frequency extreme value acquisition unit corresponding to the effective image histogram. Small value.

A layered image acquisition unit 33 is configured to perform hierarchical segmentation processing on an effective image based on a frequency maximum and a frequency minimum to obtain a layered image.

The target image acquisition unit 34 is configured to acquire a target image including a handwritten Chinese character based on the layered image.

Specifically, the target image acquisition unit 34 includes a binarized image acquisition subunit 341, a connected region acquisition subunit 342, and a target image acquisition subunit 343.

The binarized image acquisition subunit 341 is configured to perform binarization processing on the layered image to acquire a binarized image.

The connected region acquisition subunit 342 is configured to detect pixels in the binarized image and acquire a connected region corresponding to the binarized image.

A target image acquisition subunit 343 is configured to perform erosion and superposition processing on the connected areas corresponding to the binary image, and obtain a target image including handwritten Chinese characters.

Specifically, the handwriting training sample acquisition device further includes a handwriting recognition model training module 60 for pre-training the target handwriting recognition model.

The handwriting recognition model training module 60 includes a training handwritten Chinese character image obtaining unit 61, a training single font image obtaining unit 62, and a target handwriting recognition model obtaining unit 63.

The training handwritten Chinese character image acquiring unit 61 is configured to acquire a training handwritten Chinese character image.

A training single font image acquisition unit 62 is configured to perform single font cutting on a training handwritten Chinese character image by using a vertical projection method to obtain a training single font image.

The target handwriting recognition model acquisition unit 63 is used for sequentially labeling the training single font images, and inputting the labeled training single font images into the long-term and short-term memory neural network for training. The random gradient descent algorithm is used for the long-term and short-term memory nerves. The network parameters of the network are updated to obtain the target handwriting recognition model.

Specifically, the target handwriting recognition model acquisition unit 63 includes an activation state neuron acquisition subunit 631, a network output value acquisition subunit 632, and a target recognition model acquisition subunit 633.

The activation state neuron acquisition subunit 631 is configured to process a single font image by using a first activation function in a hidden layer of a long-term and short-term memory neural network to acquire a neuron carrying an activation state identifier.

The network output value acquisition subunit 632 is configured to process the neuron carrying the activation state identifier in the hidden layer of the long-term and short-term memory neural network to obtain the output value of the hidden layer of the long-term and short-term memory neural network.

The target recognition model acquisition subunit 633 is used to update the network parameters of the long-term and short-term memory neural network according to the output value of the hidden layer of the long-term and short-term memory neural network to obtain the target handwriting recognition model; the random gradient descent algorithm The calculation formula is

with

For specific limitations on the handwritten Chinese character image recognition device, reference may be made to the foregoing limitations on the handwritten Chinese character image recognition method, and details are not described herein again. Each module in the above-mentioned handwritten Chinese character image recognition device may be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules may be embedded in the hardware in or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.

In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 9. The computer device includes a processor, a memory, a network interface, and a database connected through a system bus. The processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for running the operating system and computer programs in a non-volatile storage medium. The database of the computer device is used for storing data generated or obtained during the execution of the handwritten Chinese character image recognition method, such as handwritten Chinese characters. The network interface of the computer device is used to communicate with an external terminal through a network connection. When the computer-readable instructions are executed by one or more processors, the one or more processors are executed to implement a handwritten Chinese character image recognition method.

In one embodiment, a computer device is provided, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor. When the processor executes the computer program, the following steps are performed: obtaining the original image, the original image Including handwritten Chinese characters and background pictures; pre-processing the original image to obtain valid images; using kernel density estimation algorithm to process the effective images to remove the background pictures to obtain the target image including handwritten Chinese characters; using vertical projection to separate the target image The font is cut to obtain a single font image to be recognized; the single font image to be recognized is input to a target handwriting recognition model based on long-term and short-term memory neural network for recognition, and handwritten Chinese characters corresponding to the single font image to be recognized are obtained.

In one embodiment, when the processor executes the computer program, the following steps are further implemented: the original image is enlarged and grayed out to obtain a grayed-out image; the grayed-out image is standardized to obtain a valid image, where the normalization is performed The formula for processing is

In one embodiment, when the processor executes the computer program, the following steps are further implemented: counting pixel values in the effective image to obtain a valid image histogram; using a Gaussian kernel density estimation algorithm to process the effective image histogram, obtaining and validating At least one frequency maximum and at least one frequency minimum corresponding to the image histogram; perform hierarchical segmentation processing on the effective image based on the frequency maximum and frequency minimum to obtain a layered image; and based on the layered image, obtain Includes target images of handwritten Chinese characters.

In an embodiment, when the processor executes the computer program, the following steps are further implemented: binarizing the layered image to obtain a binarized image; and detecting and marking pixels in the binarized image to obtain a binarized image. Corresponding connected regions; Corrosion and superposition processing is performed on the connected regions corresponding to the binary image to obtain a target image including handwritten Chinese characters.

In one embodiment, when the processor executes the computer program, the processor further implements the following steps: obtaining a training handwritten Chinese character image; using a vertical projection method to perform single font cutting on the training handwritten Chinese character image to obtain a training single font image; and performing a sequence of the training single font image Annotate and input the labeled training single font image into the long-term and short-term memory neural network for training, and use the stochastic gradient descent algorithm to update the network parameters of the long-term and short-term memory neural network to obtain the target handwriting recognition model. The calculation formula is

with

In one embodiment, when the processor executes the computer program, the processor further implements the following steps: processing the single-font image by using the first activation function in the hidden layer of the memory neural network in the short-term and long-term to obtain the neurons carrying the identification of the activation state; The hidden layer of the memory neural network uses a second activation function to process the neurons carrying the identification of the active state to obtain the output value of the hidden layer of the long-term and short-term memory neural network. According to the output value of the hidden layer of the long-term and short-term memory neural network, random gradient descent The algorithm updates the network parameters of the long-term and short-term memory neural network to obtain the target handwriting recognition model.

In one embodiment, one or more non-volatile readable storage media storing computer-readable instructions are provided, wherein when the computer-readable instructions are executed by one or more processors, the The execution of one or more processors implements the following steps: obtaining the original image, which includes handwritten Chinese characters and background pictures; pre-processing the original image to obtain a valid image; using a kernel density estimation algorithm to process the valid image to remove the background Use the image to obtain the target image including handwritten Chinese characters; use the vertical projection method to perform single-font cutting on the target image to obtain the single-font image to be recognized; input the single-font image to be recognized into the target handwriting recognition model based on long-term and short-term memory neural network Perform recognition to obtain handwritten Chinese characters corresponding to the single font image to be recognized.

In one embodiment, when the computer-readable instructions are executed by one or more processors, the execution of the one or more processors further implements the following steps: zooming in and graying the original image, and obtaining Grayscale image; standardize the grayscale image to obtain a valid image, where the formula for the normalization process is

In an embodiment, when the computer-readable instructions are executed by one or more processors, the execution of the one or more processors further implements the following steps: counting pixel values in valid images to obtain valid Image histogram; Gaussian kernel density estimation algorithm is used to process the effective image histogram to obtain at least one frequency maximum and at least one frequency minimum corresponding to the effective image histogram; based on the frequency maximum and frequency minimum Perform hierarchical segmentation on the effective image to obtain a layered image; based on the layered image, obtain a target image that includes handwritten Chinese characters.

In one embodiment, when the computer-readable instructions are executed by one or more processors, the execution of the one or more processors further implements the following steps: binarizing the layered image to obtain two Digitized image; detect and mark pixels in the binarized image to obtain the connected area corresponding to the binarized image; etch and overlay the connected area corresponding to the binarized image to obtain the target image including handwritten Chinese characters.

In one embodiment, when the computer-readable instructions are executed by one or more processors, the execution of the one or more processors further implements the following steps: acquiring training handwritten Chinese character images; using vertical projection to train Handwritten Chinese character images are cut with a single font to obtain training single font images. The training single font images are sequentially labeled, and the labeled training single font images are input to the long-term and short-term memory neural network for training. The random gradient descent algorithm The network parameters of the memory neural network are updated to obtain the target handwriting recognition model; the calculation formula of the stochastic gradient descent algorithm is specifically

with

In one embodiment, when the computer-readable instructions are executed by one or more processors, the execution of the one or more processors further implements the following steps: a first layer is used in the hidden layer of the short-term memory neural network; The activation function processes the single font image to obtain the neurons carrying the identification of the active state; in the hidden layer of the short-term memory neural network, the second activation function is used to process the neurons carrying the identification of the active state to obtain the hidden long-term memory neural network. The output value of the layer; according to the output value of the hidden layer of the long-term and short-term memory neural network, the network parameters of the long-term and short-term memory neural network are updated by using a stochastic gradient descent algorithm to obtain a target handwriting recognition model.

Those of ordinary skill in the art can understand that the implementation of all or part of the processes in the methods of the above embodiments can be completed by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage. In the medium, the computer program, when executed, may include the processes of the embodiments of the methods described above. Wherein, any reference to the storage, storage, database, or other media used in the embodiments provided in this application may include non-volatile and / or volatile storage. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

Those skilled in the art can clearly understand that, for the convenience and brevity of the description, only the above-mentioned division of functional units and modules is used as an example. In practical applications, the above functions can be assigned by different functional units, Module completion, that is, dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above.

The above-mentioned embodiments are only used to describe the technical solution of the present application, but not limited thereto. Although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that they can still implement the foregoing implementations. The technical solutions described in the examples are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in Within the scope of this application.

Claims

A method for recognizing handwritten Chinese character images, comprising:

Obtaining an original image, the original image including handwritten Chinese characters and a background picture;

Preprocessing the original image to obtain a valid image;

Adopting a kernel density estimation algorithm and processing the effective image to remove the background image and obtain a target image including the handwritten Chinese character;

Performing a single font cutting on the target image using a vertical projection method to obtain a single font image to be identified;

The single font image to be recognized is input to a target handwriting recognition model based on a long-term and short-term memory neural network for recognition, and handwritten Chinese characters corresponding to the single font image to be recognized are obtained.
The method of claim 1, wherein preprocessing the original image to obtain a valid image comprises:

Performing enlargement and graying processing on the original image to obtain a grayed image;

Performing normalization processing on the grayscale image to obtain the effective image, wherein the formula of the normalization processing is
X is the pixel value of the grayscale image M, X ′ is the pixel value of the effective image, Mmin is the smallest pixel value in the grayscale image M, and Mmax is the largest pixel value in the grayscale image M.
The method of claim 1, wherein the kernel density estimation algorithm and the effective image are processed to remove the background image and obtain a target image including the handwritten Chinese character, comprising: :

Performing statistics on pixel values in the effective image to obtain an effective image histogram;

Processing the effective image histogram by using a Gaussian kernel density estimation method to obtain at least one frequency maximum and at least one frequency minimum corresponding to the effective image histogram;

Performing hierarchical segmentation processing on the effective image based on the frequency maximum and frequency minimum to obtain a layered image;

Based on the layered image, a target image including the handwritten Chinese character is acquired.
The method for recognizing a handwritten Chinese character image according to claim 3, wherein the acquiring a target image including the handwritten Chinese character based on the layered image comprises:

Performing a binarization process on the layered image to obtain a binarized image;

Detect and mark pixels in the binarized image to obtain a connected area corresponding to the binarized image;

Eroding and superimposing the connected area corresponding to the binary image to obtain the target image including handwritten Chinese characters.
The method of claim 1, wherein the method for obtaining handwriting samples further comprises: training the target handwriting recognition model in advance;

The pre-trained target handwriting recognition model includes:

Obtain training handwritten Chinese character images;

Performing a single font cutting on the training handwritten Chinese character image by using a vertical projection method to obtain a training single font image;

Sequentially labeling the training single font image, and inputting the labeled training single font image into a long-term and short-term memory neural network for training, and using a random gradient descent algorithm to update network parameters of the long-term and short-term memory neural network, Acquiring the target handwriting recognition model.
The method for recognizing handwritten Chinese character images according to claim 5, wherein the labeled training single font image is input to a long-term and short-term memory neural network for training, and a random gradient descent algorithm is used for the long-term and short-term memory nerves. Updating the network parameters of the network to obtain the target handwriting recognition model includes:

Processing the single-font image using a first activation function in a hidden layer of a long-term and short-term memory neural network to obtain a neuron carrying an identification of an activation state;

Applying a second activation function to the neuron carrying the activation state identifier in the hidden layer of the long-term and short-term memory neural network to obtain the output value of the hidden layer of the long-term and short-term memory neural network;

According to the output value of the hidden layer of the long-term and short-term memory neural network, the network parameters of the long-term and short-term memory neural network are updated by using a random gradient descent algorithm to obtain the target handwriting recognition model; calculation of the random gradient descent algorithm The formula is specifically
with
Among them, J (θ) is a loss function, m is the number of selected single font font images and m = 1, θ j is the network parameter of the long-term and short-term memory neural network in the j-th layer, and h θ (x) is the The output value of the hidden layer of the long-term and short-term memory neural network, (x i , y i ) represents the i-th training single font image.
A handwritten Chinese character image recognition device, comprising:

An original image acquisition module, configured to acquire an original image, where the original image includes handwritten Chinese characters and a background picture;

An effective image acquisition module, configured to pre-process the original image to obtain an effective image;

A target image acquisition module, configured to adopt a kernel density estimation algorithm and process the effective image, remove the background picture, and obtain a target image including the handwritten Chinese character;

A to-be-recognized single-font image acquisition module, configured to use a vertical projection method to perform single-font cutting on the target image to obtain the to-be-recognized single-font image;

A handwritten Chinese character acquisition module is configured to input the single font image to be recognized into a target handwriting recognition model based on a long-term and short-term memory neural network for recognition, and obtain a handwritten Chinese character corresponding to the single font image to be recognized.
The device for recognizing handwritten Chinese characters according to claim 7, wherein the target image acquisition module comprises:

An effective image histogram acquisition unit, configured to perform statistics on pixel values in the effective image to obtain an effective image histogram;

A frequency extreme value obtaining unit, configured to process the effective image histogram by using a Gaussian kernel density estimation algorithm to obtain at least one frequency maximum and at least one frequency minimum corresponding to the effective image histogram;

A layered image acquisition unit, configured to perform layered segmentation processing on the effective image based on the frequency maximum and frequency minimum to obtain a layered image;

A target image acquisition unit is configured to acquire a target image including the handwritten Chinese character based on the layered image.
A computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the following steps when the computer program is executed:

Obtaining an original image, the original image including handwritten Chinese characters and a background picture;

Preprocessing the original image to obtain a valid image;

Adopting a kernel density estimation algorithm and processing the effective image to remove the background image and obtain a target image including the handwritten Chinese character;

Performing a single font cutting on the target image using a vertical projection method to obtain a single font image to be identified;

The single font image to be recognized is input to a target handwriting recognition model based on a long-term and short-term memory neural network for recognition, and handwritten Chinese characters corresponding to the single font image to be recognized are obtained.
The computer device according to claim 9, wherein preprocessing the original image to obtain a valid image comprises:

Performing enlargement and graying processing on the original image to obtain a grayed image;

Performing normalization processing on the grayscale image to obtain the effective image, wherein the formula of the normalization processing is
X is the pixel value of the grayscale image M, X ′ is the pixel value of the effective image, Mmin is the smallest pixel value in the grayscale image M, and Mmax is the largest pixel value in the grayscale image M.
The computer device according to claim 9, wherein the adopting a kernel density estimation algorithm and processing the effective image, removing the background picture, and obtaining a target image including the handwritten Chinese character comprises:

Performing statistics on pixel values in the effective image to obtain an effective image histogram;

Processing the effective image histogram by using a Gaussian kernel density estimation method to obtain at least one frequency maximum and at least one frequency minimum corresponding to the effective image histogram;

Performing hierarchical segmentation processing on the effective image based on the frequency maximum and frequency minimum to obtain a layered image;

Based on the layered image, a target image including the handwritten Chinese character is acquired.
The computer device according to claim 11, wherein the acquiring a target image including the handwritten Chinese character based on the layered image comprises:

Performing a binarization process on the layered image to obtain a binarized image;

Detect and mark pixels in the binarized image to obtain a connected area corresponding to the binarized image;

Eroding and superimposing the connected area corresponding to the binary image to obtain the target image including handwritten Chinese characters.
The computer device according to claim 9, wherein the handwriting sample acquisition method further comprises: pre-training the target handwriting recognition model;

The pre-trained target handwriting recognition model includes:

Obtain training handwritten Chinese character images;

Performing a single font cutting on the training handwritten Chinese character image by using a vertical projection method to obtain a training single font image;

Sequentially labeling the training single font image, and inputting the labeled training single font image into a long-term and short-term memory neural network for training, and using a random gradient descent algorithm to update network parameters of the long-term and short-term memory neural network, Acquiring the target handwriting recognition model.
The computer device according to claim 13, wherein the labeled training single font image is input to a long-term and short-term memory neural network for training, and a random gradient descent algorithm is used for the network of the long-term and short-term memory neural network. The parameters are updated to obtain the target handwriting recognition model, including:

Processing the single-font image using a first activation function in a hidden layer of the long-term and short-term memory neural network to obtain a neuron carrying an activation state identifier;

Applying a second activation function to the neuron carrying the activation state identifier in the hidden layer of the long-term and short-term memory neural network to obtain the output value of the hidden layer of the long-term and short-term memory neural network;

According to the output value of the hidden layer of the long-term and short-term memory neural network, the network parameters of the long-term and short-term memory neural network are updated by using a random gradient descent algorithm to obtain the target handwriting recognition model; calculation of the random gradient descent algorithm The formula is specifically
with
Among them, J (θ) is a loss function, m is the number of selected single font font images and m = 1, θ j is the network parameter of the long-term and short-term memory neural network in the j-th layer, and h θ (x) is the The output value of the hidden layer of the long-term and short-term memory neural network, (x i , y i ) represents the i-th training single font image.
One or more non-volatile readable storage media storing computer readable instructions, characterized in that when the computer readable instructions are executed by one or more processors, the one or more processors are caused to execute The following steps:

Obtaining an original image, the original image including handwritten Chinese characters and a background picture;

Preprocessing the original image to obtain a valid image;

Adopting a kernel density estimation algorithm and processing the effective image to remove the background image and obtain a target image including the handwritten Chinese character;

Performing a single font cutting on the target image using a vertical projection method to obtain a single font image to be identified;

The single font image to be recognized is input to a target handwriting recognition model based on a long-term and short-term memory neural network for recognition, and handwritten Chinese characters corresponding to the single font image to be recognized are obtained.
The non-volatile readable storage medium of claim 15, wherein preprocessing the original image to obtain a valid image comprises:

Performing enlargement and graying processing on the original image to obtain a grayed image;

Performing normalization processing on the grayscale image to obtain the effective image, wherein the formula of the normalization processing is
X is the pixel value of the grayscale image M, X ′ is the pixel value of the effective image, Mmin is the smallest pixel value in the grayscale image M, and Mmax is the largest pixel value in the grayscale image M.
The non-volatile readable storage medium according to claim 15, wherein the kernel density estimation algorithm and the effective image are processed to remove the background picture and obtain a target including the handwritten Chinese character Images, including:

Performing statistics on pixel values in the effective image to obtain an effective image histogram;

Processing the effective image histogram by using a Gaussian kernel density estimation method to obtain at least one frequency maximum and at least one frequency minimum corresponding to the effective image histogram;

Performing hierarchical segmentation processing on the effective image based on the frequency maximum and frequency minimum to obtain a layered image;

Based on the layered image, a target image including the handwritten Chinese character is acquired.
The non-volatile readable storage medium according to claim 17, wherein the obtaining a target image including the handwritten Chinese character based on the layered image comprises:

Performing a binarization process on the layered image to obtain a binarized image;

Detect and mark pixels in the binarized image to obtain a connected area corresponding to the binarized image;

Eroding and superimposing the connected area corresponding to the binary image to obtain the target image including handwritten Chinese characters.
The non-volatile readable storage medium according to claim 15, wherein the method for acquiring handwriting samples further comprises: pre-training the target handwriting recognition model;

The pre-trained target handwriting recognition model includes:

Obtain training handwritten Chinese character images;

Performing a single font cutting on the training handwritten Chinese character image by using a vertical projection method to obtain a training single font image;

Sequentially labeling the training single font image, and inputting the labeled training single font image into a long-term and short-term memory neural network for training, and using a random gradient descent algorithm to update network parameters of the long-term and short-term memory neural network, Acquiring the target handwriting recognition model.
The non-volatile readable storage medium according to claim 19, wherein the labeled training single font image is input to a long-term and short-term memory neural network for training, and a random gradient descent algorithm is used for the length Updating the network parameters of the memory neural network to obtain the target handwriting recognition model includes:

Processing the single-font image using a first activation function in a hidden layer of the long-term and short-term memory neural network to obtain a neuron carrying an activation state identifier;

Applying a second activation function to the neuron carrying the activation state identifier in the hidden layer of the long-term and short-term memory neural network to obtain the output value of the hidden layer of the long-term and short-term memory neural network;

According to the output value of the hidden layer of the long-term and short-term memory neural network, the network parameters of the long-term and short-term memory neural network are updated by using a random gradient descent algorithm to obtain the target handwriting recognition model; calculation of the random gradient descent algorithm The formula is specifically
with
Among them, J (θ) is a loss function, m is the number of selected single font font images and m = 1, θ j is the network parameter of the long-term and short-term memory neural network in the j-th layer, and h θ (x) is the The output value of the hidden layer of the long-term and short-term memory neural network, (x i , y i ) represents the i-th training single font image.