WO2019232850A1 - Method and apparatus for recognizing handwritten chinese character image, computer device, and storage medium - Google Patents

Method and apparatus for recognizing handwritten chinese character image, computer device, and storage medium Download PDF

Info

Publication number
WO2019232850A1
WO2019232850A1 PCT/CN2018/094222 CN2018094222W WO2019232850A1 WO 2019232850 A1 WO2019232850 A1 WO 2019232850A1 CN 2018094222 W CN2018094222 W CN 2018094222W WO 2019232850 A1 WO2019232850 A1 WO 2019232850A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
term
handwritten chinese
neural network
short
Prior art date
Application number
PCT/CN2018/094222
Other languages
French (fr)
Chinese (zh)
Inventor
高梁梁
周罡
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019232850A1 publication Critical patent/WO2019232850A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/32Digital ink
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/32Digital ink
    • G06V30/36Matching; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/28Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet
    • G06V30/287Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet of Kanji, Hiragana or Katakana characters

Definitions

  • the present application relates to the field of image recognition, and in particular, to a method, a device, a computer device, and a storage medium for recognizing a handwritten Chinese character image.
  • the single font image to be recognized is input to a target handwriting recognition model based on a long-term and short-term memory neural network for recognition, and handwritten Chinese characters corresponding to the single font image to be recognized are obtained.
  • An original image acquisition module configured to acquire an original image, where the original image includes handwritten Chinese characters and a background picture;
  • An effective image acquisition module configured to pre-process the original image to obtain an effective image
  • a target image acquisition module configured to process the effective image by using a kernel density estimation algorithm to acquire a target image in which the handwritten Chinese character is retained;
  • a to-be-recognized single-font image acquisition module configured to adopt a kernel density estimation algorithm and process the effective image, remove the background picture, and obtain a target image including the handwritten Chinese character;
  • a handwritten Chinese character acquisition module is configured to input the single font image to be recognized into a target handwriting recognition model based on a long-term and short-term memory neural network for recognition, and obtain a handwritten Chinese character corresponding to the single font image to be recognized.
  • a computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, the following steps are implemented:
  • the single font image to be recognized is input to a target handwriting recognition model based on a long-term and short-term memory neural network for recognition, and handwritten Chinese characters corresponding to the single font image to be recognized are obtained.
  • One or more non-volatile readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the following steps:
  • the single font image to be recognized is input to a target handwriting recognition model based on a long-term and short-term memory neural network for recognition, and handwritten Chinese characters corresponding to the single font image to be recognized are obtained.
  • FIG. 1 is an application scenario diagram of a handwritten Chinese character image recognition method according to an embodiment of the present application
  • FIG. 3 is a specific flowchart of step S20 in FIG. 2;
  • FIG. 4 is a specific flowchart of step S30 in FIG. 2;
  • step S34 in FIG. 4 is a specific flowchart of step S34 in FIG. 4;
  • FIG. 6 is another flowchart of a method for recognizing handwritten Chinese characters in an embodiment of the present application.
  • FIG. 7 is a specific flowchart of step S63 in FIG. 6;
  • FIG. 9 is a schematic diagram of a computer device in an embodiment of the present application.
  • the handwritten Chinese character image recognition method provided in the embodiment of the present application can be applied in an application environment as shown in FIG. 1.
  • the application environment of the handwritten Chinese character image recognition method includes a server and a computer device, wherein the computer device communicates with the server through a network, and the computer device is a device that can perform human-computer interaction with a user, including, but not limited to, a computer, a smartphone, and a tablet. device.
  • the handwritten Chinese character image recognition method provided in the embodiment of the present application is applied to a server.
  • a handwritten Chinese character image recognition method is provided.
  • the method is applied to the server in FIG. 1 as an example for description, and includes the following steps:
  • the original image includes handwritten Chinese characters and background pictures.
  • the effective image is an image obtained by pre-processing the original image and excluding interference factors.
  • the original image may contain multiple interference factors, such as multiple colors, it is not conducive to subsequent recognition. Therefore, the original image needs to be pre-processed to obtain an effective image that excludes interference factors.
  • the effective image can be understood as the image obtained after the original image excludes the background image.
  • the grayscale image is a grayscale image obtained after the original image is enlarged and grayscaled.
  • the grayed image includes a matrix of pixel values.
  • the pixel value matrix refers to a matrix containing pixel values corresponding to each pixel in the original image.
  • the server uses the imread function to read the pixel value of each pixel in the original image, and performs enlargement and grayscale processing on the original image to obtain a grayscale image.
  • the imread function is a function in computer language for reading pixel values in an image file.
  • the pixel value is a value assigned by the computer when the original image is digitized.
  • the original image may contain multiple colors, and the color itself is very susceptible to factors such as light. There are many changes in the color of similar objects, so the color itself is difficult to provide key information, so the original image needs to be grayed. In order to eliminate interference, reduce the complexity of the image and the amount of information processing. However, if the size of the handwritten Chinese characters in the original image is small, if the grayscale processing is performed directly, the thickness of the strokes of the handwritten Chinese characters will be too small and will be excluded as interference items. Therefore, in order to increase the thickness of the text strokes, The original image is enlarged and then gray-scaled to avoid direct gray-scale processing, which leads to the problem that the thickness of the strokes of the handwritten Chinese characters is too small and excluded as interference items.
  • Graying is a process that renders the original image with a noticeable black and white effect.
  • performing grayscale processing on the enlarged image includes: the color of each pixel in the original image is determined by three components of R (red), G (green), and B (blue), and each The component has 256 values from 0 to 255 (0 is the darkest, and 255 is the brightest, white).
  • the grayscale image is a special color image with the same three components of R, G, and B.
  • the server may directly use the imread function to read the original image, and the specific values of the three components of R, G, and B corresponding to each pixel in the grayscale image may be obtained.
  • the server standardizes the grayscale image by using a formula for normalization processing to avoid the problem that the pixel values in the grayscale image are scattered and the order of data is not uniform.
  • the standardization formula is X is the pixel value of the grayscale image M
  • X ′ is the pixel value of the effective image
  • M min is the smallest pixel value in the grayscale image M
  • M max is the largest pixel value in the grayscale image M.
  • S30 Use a kernel density estimation algorithm to process the effective image, remove the background image, and obtain a target image including handwritten Chinese characters.
  • the kernel density estimation algorithm is a non-parametric method that studies the data distribution characteristics from the data sample itself to estimate the probability density function.
  • the target image refers to an image that contains only handwritten Chinese characters by processing a valid image using a kernel density estimation algorithm.
  • the server uses a kernel density estimation algorithm to process the effective image to eliminate background image interference and obtain a target image including handwritten Chinese characters.
  • K (.) Is the kernel function
  • h is the pixel value range
  • x is the pixel value of the pixel whose probability density is to be estimated
  • x i is the i-th pixel value in the h range
  • n is the pixel value x in the h range.
  • the effective image histogram is a histogram obtained by statistically calculating pixel values in the effective image.
  • Histogram is a kind of statistical report diagram that represents the distribution of data by a series of vertical stripes or line segments of varying heights.
  • the horizontal axis of the effective image histogram represents a pixel value
  • the vertical axis represents a frequency of occurrence corresponding to the pixel value.
  • the server obtains the effective image histogram by counting the pixel values in the effective image, so that it can intuitively see the distribution of the pixel values in the effective image, and provides technical support for subsequent Gaussian kernel density estimation algorithms.
  • S32 Use a Gaussian kernel density estimation algorithm to process the effective image histogram to obtain at least one frequency maximum and at least one frequency minimum corresponding to the effective image histogram.
  • the Gaussian kernel density estimation algorithm refers to a kernel density estimation method in which the kernel function is a Gaussian kernel function.
  • the formula of the Gaussian kernel function is Among them, K (x) refers to a Gaussian kernel function in which pixels (independent variables) are x, x refers to a pixel value in an effective image, and e and ⁇ are constants.
  • Frequency maxima refer to the maxima at different frequency intervals in the frequency distribution histogram.
  • the frequency minimum value refers to the minimum value corresponding to the frequency maximum value in the same frequency interval in the frequency distribution histogram.
  • a Gaussian kernel density function estimation method is used to perform Gaussian smoothing on the frequency distribution histogram corresponding to the effective image, and obtain a Gaussian smooth curve corresponding to the frequency distribution histogram. Based on the frequency maxima and frequency minima on the Gaussian smooth curve, obtain the pixel values on the horizontal axis corresponding to the frequency maxima and frequency minima in order to subsequently based on the obtained frequency maxima and frequency minima Corresponding pixel values are convenient for layered segmentation processing of effective images to obtain layered images.
  • S33 Perform hierarchical segmentation processing on the effective image based on the frequency maximum and frequency minimum to obtain a layered image.
  • the layered image is an image obtained by performing hierarchical segmentation processing on the effective image based on the maximum value and the minimum value.
  • the server first obtains the pixel values corresponding to the maximum frequency value and the minimum frequency value, and processes the effective image according to the pixel values corresponding to the maximum frequency value. How many frequency maximum values are in the effective image, the corresponding effective image The number of pixel values is divided into classes; then the pixel value corresponding to the minimum frequency value is used as the boundary value between the classes, and the effective image is layered according to the class and the boundary between the classes to obtain the layered image.
  • the pixel values corresponding to the frequency maximum in the effective image are 18, 59, 95, 118, and 153, and the pixel values corresponding to the minimum frequency are 27, 65, 105, and 133, respectively.
  • the number of frequency maxima in the effective image it can be determined that the pixel values of the effective image can be divided into 5 categories, the effective image can be divided into 5 layers, and the pixel values corresponding to the minimum frequency are used as the Boundary value, because the minimum pixel value is 0 and the maximum pixel value is 255.
  • a layered image with a pixel value of 18 can be determined, and the pixel value corresponding to the layered image is [ 0,27); a layered image with a pixel value of 59 and the corresponding pixel value is [27,65); a layered image with a pixel value of 95 and the corresponding pixel value is [ 65,105); a layered image with a pixel value of 118, which corresponds to a pixel value of [105,133); a layered image with a pixel value of 153, which corresponds to a pixel value of [133,255].
  • the server After obtaining the layered image, the server performs binarization, erosion, and superposition processing on the layered image to obtain a target image including handwritten Chinese characters.
  • the binarization process refers to a process in which the pixel value of a pixel on a layered image is set to 0 (black) or 1 (white), and the entire layered image presents an obvious black and white effect.
  • the binarized layered image is corroded to remove the background image part and retain the handwritten Chinese characters on the layered image. Because the pixel values on each layered image are pixel values belonging to different ranges, after the layered image is corroded, each layered image needs to be superimposed to generate a target image containing only handwritten Chinese characters.
  • the superimposing process refers to a process of superimposing a layered image with only a handwritten portion into an image, thereby achieving the purpose of obtaining a target image containing only handwritten Chinese characters.
  • the layered image is superimposed using the imadd function to obtain a target image containing only handwritten Chinese characters.
  • the imadd function is a function in computer language for superimposing layered images.
  • S341 Perform binarization processing on the layered image to obtain a binarized image.
  • a binarized image refers to an image obtained by binarizing a sub-image. Specifically, after the server obtains the layered image, it compares the sampled pixel value of the layered image with a preselected threshold, and sets the pixel value greater than or equal to the threshold to 1 and the pixel value less than the threshold to 0. process.
  • the sampled pixel value is the pixel value corresponding to each pixel point in the layered image.
  • the size of the threshold value will affect the effect of the binarization process of the layered image. When the threshold value is selected properly, the effect of the binarization process on the layered image is better; when the threshold value is not selected properly, the effect of the binarization process of the layered image will be affected. effect.
  • the threshold in this embodiment is determined by the developer based on experience. Binarize the layered image to facilitate subsequent corrosion treatment.
  • the connected area refers to an area surrounded by adjacent pixels around a specific pixel.
  • a connected region means that the neighboring pixels around it are all 0, and a specific pixel and the neighboring pixel are 1, for example, a particular pixel is 0, and the surrounding neighboring pixels are 1, and the neighboring pixels are surrounded.
  • the resulting area is used as the connected area.
  • the binarized image corresponds to a pixel matrix, which includes rows and columns.
  • Detecting pixels in a binarized image specifically includes the following processes: (1) Scan the pixel matrix line by line, group consecutive white pixels in each line into a sequence called a cluster, and note its starting point, End point and line number.
  • the etching process is an operation for removing the content of a part of an image in morphology.
  • the built-in imerode function is used to etch the connected areas of the binary image.
  • etching the connected region corresponding to the binarized image includes the following steps: First, an n ⁇ n structural element is selected. In this embodiment, the value of 8 elements adjacent to each element in the pixel matrix is used as The connected region of this element is, therefore, the selected structural element is a 3 ⁇ 3 pixel matrix.
  • the structural element is an n ⁇ n pixel matrix, where the matrix elements include 0 or 1.
  • the binarized image is filtered based on the preset anti-corrosion capability range of the hand-written region. Partial deletion of the binary image that is not within the anti-corrosion capability of the hand-written region is obtained to obtain the anti-corrosion capability of the hand-written region in the binary image Within the range.
  • the target pixel image containing only handwritten Chinese characters can be obtained by superimposing the pixel matrix corresponding to each binarized image portion that fits the range of the corrosion resistance of the handwritten area.
  • the anti-corrosion ability of the hand-written area can adopt the formula: Calculated, s 1 represents the total area after being corroded in the binarized image, s 2 represents the total area before being corroded in the binarized image, and p is the corrosion resistance of the handwritten area.
  • the preset anti-corrosion range of the handwriting area is [0.05,0.8], according to the formula Calculate the ratio p between the total area of each binarized image and the total area before the binarized image.
  • the ratio p of the total area after erosion to the total area before erosion in the binarized image which is not in the range of the anti-corrosion capability of the handwritten area, it means that the binarized image of the area is a background image instead of Write by hand and need to be etched to remove the background image.
  • the ratio p of the total area after erosion to the total area before erosion in the binarized image is in the range of [0.05, 0.8], it means that the binarized image of the region is a handwritten Chinese character, which needs to be retained.
  • the pixel matrix corresponding to the retained binary image is superimposed to obtain a target image containing handwritten Chinese characters.
  • the layered image is binarized to obtain a binarized image, and then pixels in the binarized image are detected and labeled, and connected areas corresponding to the binarized image are obtained.
  • the elements in the identical pixel matrix all become 0, the binarized image with element 0 is black, and the black part is the corroded part of the binarized image.
  • the total area of the binarized image is calculated by calculating And the ratio of the total area of the binarized image before being eroded, to determine whether the ratio is within the preset anti-corrosion range of the handwriting area, in order to remove the background image in each layered image, retain the handwritten Chinese characters, and finally replace each A layered image is superimposed to achieve the purpose of obtaining the target image.
  • S40 Single-font cutting is performed on the target image using a vertical projection method to obtain a single-font image to be identified.
  • the vertical projection method refers to a method of vertically projecting each line of handwritten Chinese characters in a target image to obtain a vertical projection histogram.
  • the vertical projection histogram refers to the number of pixels reflecting the target image in the vertical direction.
  • using the vertical projection method to perform single font cutting on the target image specifically includes the following steps: the server scans at least one line of handwritten Chinese characters in the target image line by line to obtain pixel values corresponding to each line of handwritten Chinese characters, and corresponds to each pixel value.
  • the vertical projection histogram is used to obtain the number of pixels corresponding to different pixel values.
  • the target image is cyclically cut to obtain a single font image to be identified. Understandably, the pixel value corresponding to each handwritten Chinese character is relatively concentrated, and the pixel value corresponding to the gap between the Chinese character and the Chinese character is relatively sparse. The density of the corresponding pixel value is reflected in the corresponding vertical projection histogram.
  • the vertical projection histogram the number of pixels corresponding to the pixel values of Chinese characters is relatively high, and the number of pixels corresponding to the pixel values of no Chinese characters is relatively low.
  • the vertical projection method can effectively perform single font cutting on the target image to obtain the single font to be identified.
  • the image is simple to implement and provides technical support for subsequent handwriting recognition.
  • S50 Input the single-font image to be recognized into a target handwriting recognition model based on a long-term and short-term memory neural network for recognition, and obtain handwritten Chinese characters corresponding to the single-font image to be recognized.
  • the target handwriting recognition model is a model for handwriting recognition previously trained based on long and short-term memory neural networks.
  • Long-short-term memory neural (LSTM) network is a kind of time-recursive neural network, which is suitable for processing and predicting important events with time series and time series with relatively long intervals and delays.
  • the server inputs the to-be-recognized word image into the target handwriting recognition model for recognition, so that the target handwriting recognition model can contact the context for recognition, obtain handwritten Chinese characters corresponding to each to-be-recognized word image, and improve recognition accuracy.
  • the user may upload the original image containing the handwritten Chinese characters collected by the acquisition module on the computer device to the server, so that the server obtains the original image. Then, the server preprocesses the original image to obtain a valid image that excludes interference factors.
  • the kernel density estimation algorithm is used to process the effective image, remove the background image, and obtain the target image containing only handwritten Chinese characters to further eliminate interference.
  • the vertical projection method is used to cut the single font of the target image to obtain the single font image to be recognized, which is easy to implement.
  • the to-be-recognized single-font image is input to a target handwriting recognition model based on long-term and short-term memory neural network for recognition, so that the to-be-recognized single-font image has timeliness, so that the target handwriting recognition model can contact the context for identification, and obtain The handwritten Chinese characters corresponding to the single font image improve the recognition accuracy.
  • the handwritten Chinese character image recognition method further includes: training a target handwriting recognition model in advance. Specifically, as shown in FIG. 6, pre-training the target handwriting recognition model includes the following steps:
  • the training handwritten Chinese character image is a sample image collected from an open source library for model training in advance.
  • the training handwritten Chinese character image includes N (N is a positive integer) handwriting samples corresponding to each Chinese character in the secondary Chinese character library.
  • the Chinese secondary character library is a very useful Chinese character library that is coded in the order of radical strokes of Chinese characters.
  • N handwritten sample images of different people's handwriting in the open source library are collected as training handwritten Chinese character images, so that the server obtains the trained handwritten Chinese character images. Because different users have different writing habits, N handwritten samples are used (i.e. Training handwritten Chinese character images) for training, which greatly improves the generalization of the model.
  • S62 Use a vertical projection method to perform single font cutting on the training handwritten Chinese character image to obtain a training single font image.
  • the cutting process of single font cutting of the training handwritten Chinese character image by the vertical projection method is the same as step S40. To avoid repetition, details are not described herein again.
  • the training single font image is a single font image used for input model training.
  • S63 Annotate the training single font images sequentially, and input the labeled training single font images into the long-term and short-term memory neural network for training, and use a stochastic gradient descent algorithm to update the network parameters of the long-term and short-term memory neural network to obtain the target Handwriting recognition model.
  • the random gradient descent algorithm uses a randomly selected sample (training single font image) for updating each time when updating network parameters, instead of using all samples for updating, speeding up the training rate.
  • the network parameters are the weights and offsets between the layers of the long- and short-term memory neural network.
  • the long-term and short-term memory neural network has the function of time memory, so it is used to process the training single font image carrying the time series state.
  • the use of long-term and short-term memory neural network for model training increases the timeliness of the training single font image, so as to train the training single font image according to the context, thereby improving the accuracy of the target handwriting recognition model.
  • the output layer of the long-term and short-term memory neural network uses Softmax (regression model) for regression processing, and is used to classify the output weight matrix.
  • Softmax regression model
  • Softmax is a classification function commonly used in neural networks. It maps the output of multiple neurons into the [0,1] interval, which can be understood as a probability. It is simple and convenient to calculate, so as to perform multi-classification. Output to make its output more accurate.
  • step S63 the training single font image is sequentially labeled, and the labeled training single font image is input to a long-term and short-term memory neural network for training, and random gradient descent is used.
  • the algorithm updates the network parameters of the long-term and short-term memory neural network to obtain the target handwriting recognition model, which specifically includes the following steps:
  • the training single font image is processed by using the first activation function to obtain a neuron carrying an activation state identifier.
  • each neuron in the hidden layer of the long-term and short-term memory neural network includes three gates, which are an input gate, a forgetting gate, and an output gate, respectively.
  • the forget gate determines the past information to be discarded in the neuron.
  • the input gate determines the information to be added to the neuron.
  • the output gate determines the information to be output in the neuron.
  • the first activation function is a function for activating a neuron state.
  • the state of the neuron determines the information discarded, added, and output by each gate (ie, input gate, forget gate, and output gate).
  • the activation status flag includes a pass flag and a fail flag.
  • the identifiers corresponding to the input gate, the forget gate, and the output gate in this embodiment are i, f, and o, respectively.
  • the Sigmoid (S-shaped growth curve) function is specifically selected as the first activation function.
  • the Sigmoid function is a S-shaped function common in biology. In information science, due to its single increase and inverse function single increase In other properties, the Sigmoid function is often used as a threshold function for neural networks, mapping variables between 0 and 1. The calculation formula for its activation function is Among them, z represents the output value of the forget gate.
  • the forgetting gate includes a forgetting threshold.
  • a neuron carrying an activation state identifier as a pass identifier is obtained.
  • F t represents the forgetting threshold (that is, the activation state)
  • W f represents the weight matrix of the forgetting gate
  • b f represents the weight bias term of the forgetting gate
  • h t-1 represents the output of the neuron at the previous moment
  • x t represents The input data at the current time (that is, the training single font image)
  • t represents the current time
  • t-1 represents the previous time.
  • the forgetting gate also includes the forgetting threshold.
  • the calculation of the font image of the training single through the calculation formula of the forgetting gate will obtain a scalar in the range of 0-1. This scalar determines the past information received by the neuron based on the comprehensive judgment of the current state and the past state. To achieve data reduction, reduce the amount of calculation, and improve training efficiency.
  • the output value of the hidden layer of the long-term and short-term memory neural network includes the output value of the input gate, the output value of the output gate, and the state of the neuron.
  • a second activation function is used to carry the activation state identifier to perform calculation through the identified neurons to obtain the output value of the hidden layer.
  • a tanh (hyperbolic tangent) function is used as the activation function of the input gate (ie, the second activation function).
  • Non-linear factors can be added to make the trained target handwriting recognition model Able to solve more complex problems.
  • the activation function tanh has the advantage of fast convergence speed, which can save training time and increase training efficiency.
  • the output value of the input gate is calculated by a calculation formula of the input gate.
  • the input gate further includes a calculation formula input threshold
  • W i is the weight of input gates value matrix
  • b i represents the bias term of the input gate.
  • the network parameters of the long-term and short-term memory neural network are updated by using a stochastic gradient descent algorithm to obtain a target handwriting recognition model.
  • J ( ⁇ ) is a loss function
  • ⁇ j is the network parameter of the j-th layer long-term and short-term memory neural network
  • h ⁇ (x) is the long- and short-term memory neural network.
  • the output value of the network hidden layer, (x i , y i ) represents the i-th training single font image.
  • each weight in the target handwriting recognition model implements the functions of the target handwriting recognition model to decide which old information to discard, which new information to add, and which information to output.
  • a probability value is finally output.
  • the probability value refers to the probability that the training single font image recognizes the corresponding Chinese character. It can be widely used in handwriting recognition to accurately identify the handwritten image. .
  • the original image obtaining module 10 is configured to obtain an original image, where the original image includes handwritten Chinese characters and a background picture.
  • the effective image acquisition module 20 is configured to pre-process the original image to obtain a valid image.
  • the target image acquisition module 30 is configured to process a valid image by using a kernel density estimation algorithm, remove a background picture, and obtain a target image including handwritten Chinese characters.
  • the to-be-recognized single-font image acquisition module 40 is configured to obtain a to-be-recognized single-font image by performing single-font cutting on a target image using a vertical projection method.
  • a handwritten Chinese character acquisition module 50 is configured to input a single font image to be recognized into a target handwriting recognition model based on a long-term and short-term memory neural network for recognition, and obtain a handwritten Chinese character corresponding to the single font image to be recognized.
  • the effective image acquisition module 20 includes a grayscale image acquisition unit 21 and an effective image acquisition unit 22.
  • a grayscale image acquisition unit 21 is configured to perform enlargement and grayscale processing on an original image to obtain a grayscale image.
  • the effective image obtaining unit 22 is configured to perform normalization processing on the grayscale image to obtain an effective image, wherein the formula of the normalization processing is X is the pixel value of the grayscale image M, X ′ is the pixel value of the effective image, M min is the smallest pixel value in the grayscale image M, and M max is the largest pixel value in the grayscale image M.
  • the target image acquisition module 30 includes an effective image histogram acquisition unit 31, a frequency extreme value acquisition unit 32, a layered image acquisition unit 33, and a target image acquisition unit 34.
  • the effective image histogram acquisition unit 31 is configured to perform statistics on pixel values in the effective image to obtain an effective image histogram.
  • a frequency extreme value acquisition unit 32 is configured to process a valid image histogram by using a Gaussian kernel density estimation algorithm, and obtain at least one frequency maximum value and at least one frequency extreme value acquisition unit corresponding to the effective image histogram. Small value.
  • the target image acquisition unit 34 includes a binarized image acquisition subunit 341, a connected region acquisition subunit 342, and a target image acquisition subunit 343.
  • the connected region acquisition subunit 342 is configured to detect pixels in the binarized image and acquire a connected region corresponding to the binarized image.
  • the handwriting training sample acquisition device further includes a handwriting recognition model training module 60 for pre-training the target handwriting recognition model.
  • the handwriting recognition model training module 60 includes a training handwritten Chinese character image obtaining unit 61, a training single font image obtaining unit 62, and a target handwriting recognition model obtaining unit 63.
  • the training handwritten Chinese character image acquiring unit 61 is configured to acquire a training handwritten Chinese character image.
  • a training single font image acquisition unit 62 is configured to perform single font cutting on a training handwritten Chinese character image by using a vertical projection method to obtain a training single font image.
  • the target handwriting recognition model acquisition unit 63 is used for sequentially labeling the training single font images, and inputting the labeled training single font images into the long-term and short-term memory neural network for training.
  • the random gradient descent algorithm is used for the long-term and short-term memory nerves.
  • the network parameters of the network are updated to obtain the target handwriting recognition model.
  • the target handwriting recognition model acquisition unit 63 includes an activation state neuron acquisition subunit 631, a network output value acquisition subunit 632, and a target recognition model acquisition subunit 633.
  • the activation state neuron acquisition subunit 631 is configured to process a single font image by using a first activation function in a hidden layer of a long-term and short-term memory neural network to acquire a neuron carrying an activation state identifier.
  • the network output value acquisition subunit 632 is configured to process the neuron carrying the activation state identifier in the hidden layer of the long-term and short-term memory neural network to obtain the output value of the hidden layer of the long-term and short-term memory neural network.
  • the target recognition model acquisition subunit 633 is used to update the network parameters of the long-term and short-term memory neural network according to the output value of the hidden layer of the long-term and short-term memory neural network to obtain the target handwriting recognition model; the random gradient descent algorithm
  • J ( ⁇ ) is a loss function
  • ⁇ j is the network parameter of the j-th layer long-term and short-term memory neural network
  • h ⁇ (x) is the long- and short-term memory neural network.
  • the output value of the network hidden layer, (x i , y i ) represents the i-th training single font image.
  • Each module in the above-mentioned handwritten Chinese character image recognition device may be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the hardware in or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 9.
  • the computer device includes a processor, a memory, a network interface, and a database connected through a system bus.
  • the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, a computer program, and a database.
  • the internal memory provides an environment for running the operating system and computer programs in a non-volatile storage medium.
  • the database of the computer device is used for storing data generated or obtained during the execution of the handwritten Chinese character image recognition method, such as handwritten Chinese characters.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection. When the computer-readable instructions are executed by one or more processors, the one or more processors are executed to implement a handwritten Chinese character image recognition method.
  • a computer device which includes a memory, a processor, and a computer program stored on the memory and executable on the processor.
  • the processor executes the computer program, the following steps are performed: obtaining the original image, the original image Including handwritten Chinese characters and background pictures; pre-processing the original image to obtain valid images; using kernel density estimation algorithm to process the effective images to remove the background pictures to obtain the target image including handwritten Chinese characters; using vertical projection to separate the target image
  • the font is cut to obtain a single font image to be recognized; the single font image to be recognized is input to a target handwriting recognition model based on long-term and short-term memory neural network for recognition, and handwritten Chinese characters corresponding to the single font image to be recognized are obtained.
  • the processor executes the computer program, the following steps are further implemented: the original image is enlarged and grayed out to obtain a grayed-out image; the grayed-out image is standardized to obtain a valid image, where the normalization is performed
  • the formula for processing is X is the pixel value of the grayscale image M, X ′ is the pixel value of the effective image, M min is the smallest pixel value in the grayscale image M, and M max is the largest pixel value in the grayscale image M.
  • the processor when the processor executes the computer program, the following steps are further implemented: counting pixel values in the effective image to obtain a valid image histogram; using a Gaussian kernel density estimation algorithm to process the effective image histogram, obtaining and validating At least one frequency maximum and at least one frequency minimum corresponding to the image histogram; perform hierarchical segmentation processing on the effective image based on the frequency maximum and frequency minimum to obtain a layered image; and based on the layered image, obtain Includes target images of handwritten Chinese characters.
  • the processor when the processor executes the computer program, the following steps are further implemented: binarizing the layered image to obtain a binarized image; and detecting and marking pixels in the binarized image to obtain a binarized image. Corresponding connected regions; Corrosion and superposition processing is performed on the connected regions corresponding to the binary image to obtain a target image including handwritten Chinese characters.
  • the processor when the processor executes the computer program, the processor further implements the following steps: obtaining a training handwritten Chinese character image; using a vertical projection method to perform single font cutting on the training handwritten Chinese character image to obtain a training single font image; and performing a sequence of the training single font image Annotate and input the labeled training single font image into the long-term and short-term memory neural network for training, and use the stochastic gradient descent algorithm to update the network parameters of the long-term and short-term memory neural network to obtain the target handwriting recognition model.
  • J ( ⁇ ) is a loss function
  • ⁇ j is the network parameter of the j-th layer long-term and short-term memory neural network
  • h ⁇ (x) is the long- and short-term memory neural network.
  • the output value of the network hidden layer, (x i , y i ) represents the i-th training single font image.
  • the processor when the processor executes the computer program, the processor further implements the following steps: processing the single-font image by using the first activation function in the hidden layer of the memory neural network in the short-term and long-term to obtain the neurons carrying the identification of the activation state;
  • the hidden layer of the memory neural network uses a second activation function to process the neurons carrying the identification of the active state to obtain the output value of the hidden layer of the long-term and short-term memory neural network.
  • random gradient descent The algorithm updates the network parameters of the long-term and short-term memory neural network to obtain the target handwriting recognition model.
  • one or more non-volatile readable storage media storing computer-readable instructions are provided, wherein when the computer-readable instructions are executed by one or more processors, the The execution of one or more processors implements the following steps: obtaining the original image, which includes handwritten Chinese characters and background pictures; pre-processing the original image to obtain a valid image; using a kernel density estimation algorithm to process the valid image to remove the background Use the image to obtain the target image including handwritten Chinese characters; use the vertical projection method to perform single-font cutting on the target image to obtain the single-font image to be recognized; input the single-font image to be recognized into the target handwriting recognition model based on long-term and short-term memory neural network Perform recognition to obtain handwritten Chinese characters corresponding to the single font image to be recognized.
  • the execution of the one or more processors further implements the following steps: zooming in and graying the original image, and obtaining Grayscale image; standardize the grayscale image to obtain a valid image, where the formula for the normalization process is X is the pixel value of the grayscale image M, X ′ is the pixel value of the effective image, M min is the smallest pixel value in the grayscale image M, and M max is the largest pixel value in the grayscale image M.
  • the execution of the one or more processors further implements the following steps: counting pixel values in valid images to obtain valid Image histogram; Gaussian kernel density estimation algorithm is used to process the effective image histogram to obtain at least one frequency maximum and at least one frequency minimum corresponding to the effective image histogram; based on the frequency maximum and frequency minimum Perform hierarchical segmentation on the effective image to obtain a layered image; based on the layered image, obtain a target image that includes handwritten Chinese characters.
  • the execution of the one or more processors further implements the following steps: binarizing the layered image to obtain two Digitized image; detect and mark pixels in the binarized image to obtain the connected area corresponding to the binarized image; etch and overlay the connected area corresponding to the binarized image to obtain the target image including handwritten Chinese characters.
  • the execution of the one or more processors further implements the following steps: acquiring training handwritten Chinese character images; using vertical projection to train Handwritten Chinese character images are cut with a single font to obtain training single font images.
  • the training single font images are sequentially labeled, and the labeled training single font images are input to the long-term and short-term memory neural network for training.
  • the output value of the network hidden layer, (x i , y i ) represents the i-th training single font image.
  • the execution of the one or more processors further implements the following steps: a first layer is used in the hidden layer of the short-term memory neural network; The activation function processes the single font image to obtain the neurons carrying the identification of the active state; in the hidden layer of the short-term memory neural network, the second activation function is used to process the neurons carrying the identification of the active state to obtain the hidden long-term memory neural network.
  • the output value of the layer; according to the output value of the hidden layer of the long-term and short-term memory neural network, the network parameters of the long-term and short-term memory neural network are updated by using a stochastic gradient descent algorithm to obtain a target handwriting recognition model.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Character Discrimination (AREA)

Abstract

A method and apparatus for recognizing a handwritten Chinese character image, a computer device, and a storage medium. The method for recognizing a handwritten Chinese character image comprises: obtaining an original image, the original image comprising a handwritten Chinese character and a background picture (S10); pre-processing the original image to obtain a valid image (S20); processing the valid image using a kernel density estimation algorithm, and removing the background picture to obtain a target image comprising the handwritten Chinese character (S30); performing single character-based cutting on the target image using a vertical projection method to obtain a single character image to be recognized (S40); and inputting the single character image to be recognized to a target handwritten character recognition model of a long short-term memory neural network for recognition to obtain a handwritten Chinese character corresponding to the single character image to be recognized (S50). The method for recognizing a handwritten Chinese character image can effectively recognize similar Chinese characters having complex structures, thereby improving the accuracy of the recognition of a handwritten character image.

Description

手写汉字图像识别方法、装置、计算机设备及存储介质Handwritten Chinese character image recognition method and device, computer equipment and storage medium
本专利申请以2018年6月4日提交的申请号为201810564691.6,名称为“手写汉字图像识别方法、装置、计算机设备及存储介质”的中国发明专利申请为基础,并要求其优先权。This patent application is based on a Chinese invention patent application with application number 201810564691.6 filed on June 4, 2018 and entitled "Method, Device, Computer Equipment, and Storage Medium for Handwritten Chinese Character Recognition", and claims its priority.
技术领域Technical field
本申请涉及图像识别领域,尤其涉及一种手写汉字图像识别方法、装置、计算机设备及存储介质。The present application relates to the field of image recognition, and in particular, to a method, a device, a computer device, and a storage medium for recognizing a handwritten Chinese character image.
背景技术Background technique
由于汉字的类别繁多,比如“宋体、楷体、姚体和仿宋”。其中,一些汉字的结构比较复杂,比如“魑、魅”,并且汉字中存在着较多的结构相似的字,比如“受和爱”。对标准的、书写简单且规范的句子,采用OCR(光学字符识别)技术可以识别,但是对于手写的字组成的句子,由于每个人的书写习惯不相同且不是标准的横竖撇捺组成的汉字,采用OCR技术识别时,会存在识别不准确的情况,对于一些相似的且不是由简单的笔画组成的汉字,会出现识别准确率降低的情况,影响手写汉字的识别效果。There are many types of Chinese characters, such as "Songti, Kaiti, Yaoti and imitation Song". Among them, the structure of some Chinese characters is relatively complicated, such as "魑, charm", and there are many structurally similar characters in Chinese characters, such as "accept and love". For standard, simple and standardized sentences, OCR (optical character recognition) technology can be used to recognize them, but for sentences composed of handwritten characters, because each person's writing habits are different and not standard Chinese characters composed of horizontal and vertical skimming, When OCR technology is used for recognition, there will be inaccurate recognition. For some similar Chinese characters that are not composed of simple strokes, the recognition accuracy will decrease, affecting the recognition of handwritten Chinese characters.
发明内容Summary of the Invention
基于此,有必要针对上述技术问题,提供一种手写汉字图像识别方法、装置、计算机设备及存储介质。Based on this, it is necessary to provide a method, device, computer equipment, and storage medium for handwritten Chinese character image recognition in response to the above technical problems.
一种手写汉字图像识别方法,包括:A handwritten Chinese character image recognition method includes:
获取原始图像,所述原始图像包括手写汉字和背景图片;Obtaining an original image, the original image including handwritten Chinese characters and a background picture;
对所述原始图像进行预处理,获取有效图像;Preprocessing the original image to obtain a valid image;
采用核密度估计算法和对所述有效图像进行处理,去除所述背景图片,获取包括所述手写汉字的目标图像;Adopting a kernel density estimation algorithm and processing the effective image to remove the background image and obtain a target image including the handwritten Chinese character;
采用垂直投影法对所述目标图像进行单字体切割,获取待识别单字体图像;Performing a single font cutting on the target image using a vertical projection method to obtain a single font image to be identified;
将所述待识别单字体图像输入到基于长短时记忆神经网络的目标手写字识别模型中进行识别,获取待识别单字体图像对应的手写汉字。The single font image to be recognized is input to a target handwriting recognition model based on a long-term and short-term memory neural network for recognition, and handwritten Chinese characters corresponding to the single font image to be recognized are obtained.
一种手写汉字图像识别装置,包括:A handwritten Chinese character image recognition device includes:
原始图像获取模块,用于获取原始图像,所述原始图像包括手写汉字和背景图片;An original image acquisition module, configured to acquire an original image, where the original image includes handwritten Chinese characters and a background picture;
有效图像获取模块,用于对所述原始图像进行预处理,获取有效图像;An effective image acquisition module, configured to pre-process the original image to obtain an effective image;
目标图像获取模块,用于采用核密度估计算法对所述有效图像进行处理,获取保留所述手写汉字的目标图像;A target image acquisition module, configured to process the effective image by using a kernel density estimation algorithm to acquire a target image in which the handwritten Chinese character is retained;
待识别单字体图像获取模块,用于采用核密度估计算法和对所述有效图像进行处理,去除所述背景图片,获取包括所述手写汉字的目标图像;A to-be-recognized single-font image acquisition module, configured to adopt a kernel density estimation algorithm and process the effective image, remove the background picture, and obtain a target image including the handwritten Chinese character;
手写汉字获取模块,用于将所述待识别单字体图像输入到基于长短时记忆神经网络的目标手写字识别模型中进行识别,获取待识别单字体图像对应的手写汉字。A handwritten Chinese character acquisition module is configured to input the single font image to be recognized into a target handwriting recognition model based on a long-term and short-term memory neural network for recognition, and obtain a handwritten Chinese character corresponding to the single font image to be recognized.
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如下步骤:A computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, the following steps are implemented:
获取原始图像,所述原始图像包括手写汉字和背景图片;Obtaining an original image, the original image including handwritten Chinese characters and a background picture;
对所述原始图像进行预处理,获取有效图像;Preprocessing the original image to obtain a valid image;
采用核密度估计算法和对所述有效图像进行处理,去除所述背景图片,获取包括所述手写汉字的目标图像;Adopting a kernel density estimation algorithm and processing the effective image to remove the background image and obtain a target image including the handwritten Chinese character;
采用垂直投影法对所述目标图像进行单字体切割,获取待识别单字体图像;Performing a single font cutting on the target image using a vertical projection method to obtain a single font image to be identified;
将所述待识别单字体图像输入到基于长短时记忆神经网络的目标手写字识别模型中进行识别,获取待识别单字体图像对应的手写汉字。The single font image to be recognized is input to a target handwriting recognition model based on a long-term and short-term memory neural network for recognition, and handwritten Chinese characters corresponding to the single font image to be recognized are obtained.
一个或多个存储有计算机可读指令的非易失性可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:One or more non-volatile readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the following steps:
获取原始图像,所述原始图像包括手写汉字和背景图片;Obtaining an original image, the original image including handwritten Chinese characters and a background picture;
对所述原始图像进行预处理,获取有效图像;Preprocessing the original image to obtain a valid image;
采用核密度估计算法和对所述有效图像进行处理,去除所述背景图片,获取包括所述手写汉字的目标图像;Adopting a kernel density estimation algorithm and processing the effective image to remove the background image and obtain a target image including the handwritten Chinese character;
采用垂直投影法对所述目标图像进行单字体切割,获取待识别单字体图像;Performing a single font cutting on the target image using a vertical projection method to obtain a single font image to be identified;
将所述待识别单字体图像输入到基于长短时记忆神经网络的目标手写字识别模型中进行识别,获取待识别单字体图像对应的手写汉字。The single font image to be recognized is input to a target handwriting recognition model based on a long-term and short-term memory neural network for recognition, and handwritten Chinese characters corresponding to the single font image to be recognized are obtained.
本申请的一个或多个实施例的细节在下面的附图及描述中提出。本申请的其他特征和优点将从说明书、附图以及权利要求书变得明显。Details of one or more embodiments of the present application are set forth in the accompanying drawings and description below. Other features and advantages of the application will become apparent from the description, the drawings, and the claims.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions of the embodiments of the present application more clearly, the drawings used in the description of the embodiments of the application will be briefly introduced below. Obviously, the drawings in the following description are just some embodiments of the application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without paying creative labor.
图1是本申请一实施例中手写汉字图像识别方法的一应用场景图;FIG. 1 is an application scenario diagram of a handwritten Chinese character image recognition method according to an embodiment of the present application;
图2是本申请一实施例中手写汉字图像识别方法的一流程图;2 is a flowchart of a method for recognizing a handwritten Chinese character image according to an embodiment of the present application;
图3是图2中步骤S20的一具体流程图;FIG. 3 is a specific flowchart of step S20 in FIG. 2;
图4是图2中步骤S30的一具体流程图;FIG. 4 is a specific flowchart of step S30 in FIG. 2;
图5是图4中步骤S34的一具体流程图;5 is a specific flowchart of step S34 in FIG. 4;
图6是本申请一实施例中手写汉字图像识别方法的另一流程图;6 is another flowchart of a method for recognizing handwritten Chinese characters in an embodiment of the present application;
图7是图6中步骤S63的一具体流程图;FIG. 7 is a specific flowchart of step S63 in FIG. 6;
图8是本申请一实施例中手写汉字图像识别装置的一示意图;8 is a schematic diagram of a handwritten Chinese character image recognition device according to an embodiment of the present application;
图9是本申请一实施例中计算机设备的一示意图。FIG. 9 is a schematic diagram of a computer device in an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In the following, the technical solutions in the embodiments of the present application will be clearly and completely described with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of this application.
本申请实施例提供的手写汉字图像识别方法,可应用在如图1的应用环境中。该手写汉字图像识别方法的应用环境包括服务器和计算机设备,其中,计算机设备通过网络与服务器进行通信,计算机设备是可与用户进行人机交互的设备,包括但不限于电脑、智能手机和平板等设备。本申请实施例提供的手写汉字图像识别方法应用于服务器。The handwritten Chinese character image recognition method provided in the embodiment of the present application can be applied in an application environment as shown in FIG. 1. The application environment of the handwritten Chinese character image recognition method includes a server and a computer device, wherein the computer device communicates with the server through a network, and the computer device is a device that can perform human-computer interaction with a user, including, but not limited to, a computer, a smartphone, and a tablet. device. The handwritten Chinese character image recognition method provided in the embodiment of the present application is applied to a server.
在一实施例中,如图2所示,提供一种手写汉字图像识别方法,以该方法应用在图1中的服务器为例进行说明,包括如下步骤:In an embodiment, as shown in FIG. 2, a handwritten Chinese character image recognition method is provided. The method is applied to the server in FIG. 1 as an example for description, and includes the following steps:
S10:获取原始图像,原始图像包括手写汉字和背景图片。S10: Obtain the original image. The original image includes handwritten Chinese characters and background pictures.
其中,原始图像是由计算机设备上的采集模块采集到的未经处理的包含手写汉字的图像。该原始图像包括手写汉字和背景图片。背景图片是原始图像中除手写汉字之外的噪声图片。噪声图片是对手写汉字造成干扰的图片。本实施例中,用户可通过计算机设备上的采集模块采集包含手写汉字的原始图像上传到服务器,以使服务器获取原始图像。该采集模块包括但不限于相机拍摄和本地上传。The original image is an unprocessed image containing handwritten Chinese characters collected by a collection module on a computer device. The original image includes handwritten Chinese characters and background pictures. The background picture is a noise picture other than handwritten Chinese characters in the original image. Noise pictures are pictures that interfere with handwritten Chinese characters. In this embodiment, a user may collect an original image containing handwritten Chinese characters and upload it to a server through a collection module on a computer device, so that the server obtains the original image. The acquisition module includes but is not limited to camera shooting and local upload.
S20:对原始图像进行预处理,获取有效图像。S20: Preprocess the original image to obtain a valid image.
其中,有效图像是对原始图像进行预处理后得到的排除干扰因素的图像。具体地,由于原始图像中可能包含多种干扰因素,如色彩繁多,不利于后续的识别。因此需要对原始图像进行预处理,以获取排除干扰因素的有效图像,该有效图像可以理解为原始图像排除背景图片后获取的图片。Among them, the effective image is an image obtained by pre-processing the original image and excluding interference factors. Specifically, since the original image may contain multiple interference factors, such as multiple colors, it is not conducive to subsequent recognition. Therefore, the original image needs to be pre-processed to obtain an effective image that excludes interference factors. The effective image can be understood as the image obtained after the original image excludes the background image.
在一实施例中,如图3所示,步骤S20中,即对原始图像进行预处理,获取有效图像,具体包括如下步骤:In an embodiment, as shown in FIG. 3, in step S20, the original image is pre-processed to obtain a valid image, which specifically includes the following steps:
S21:对原始图像进行放大和灰度化处理,获取灰度化图像。S21: Enlarge and grayscale the original image to obtain a grayscale image.
其中,灰度化图像是对原始图像进行放大和灰度化处理后获取的灰度化图像。该灰度化图像包括一像素值矩阵。像素值矩阵是指包含原始图像中每个像素对应的像素值的矩阵。本实施例中,服务器采用imread函数读取原始图像中每个像素的像素值,并对原始图像进行放大和灰度化处理,获取灰度化图像。imread函数是计算机语言中的一个函数,用于读取图像文件中的像素值。像素值是原始图像被数字化时由计算机赋予的值。The grayscale image is a grayscale image obtained after the original image is enlarged and grayscaled. The grayed image includes a matrix of pixel values. The pixel value matrix refers to a matrix containing pixel values corresponding to each pixel in the original image. In this embodiment, the server uses the imread function to read the pixel value of each pixel in the original image, and performs enlargement and grayscale processing on the original image to obtain a grayscale image. The imread function is a function in computer language for reading pixel values in an image file. The pixel value is a value assigned by the computer when the original image is digitized.
由于原始图像中可能包含多种颜色,而颜色本身,非常容易受到光照等因素的影响,同类的物体颜色有很多变化,所以颜色本身难以提供关键信息,因此需要对原始图像进行灰度化处理,以排除干扰,减少图像的复杂度和信息处理量。但由于原始图像中的手写汉字的尺寸较小时,若直接进行灰度化处理,会导致手写汉字的笔画的厚度过小,会被当成干扰项排除,因此为了增加文字笔画的厚度,需先将原始图像进行放大处理,再进行灰度化处理,以避免直接进行灰度化处理,导致手写汉字的笔画的厚度过小被当成干扰项排除的问题。The original image may contain multiple colors, and the color itself is very susceptible to factors such as light. There are many changes in the color of similar objects, so the color itself is difficult to provide key information, so the original image needs to be grayed. In order to eliminate interference, reduce the complexity of the image and the amount of information processing. However, if the size of the handwritten Chinese characters in the original image is small, if the grayscale processing is performed directly, the thickness of the strokes of the handwritten Chinese characters will be too small and will be excluded as interference items. Therefore, in order to increase the thickness of the text strokes, The original image is enlarged and then gray-scaled to avoid direct gray-scale processing, which leads to the problem that the thickness of the strokes of the handwritten Chinese characters is too small and excluded as interference items.
具体地,服务器按照如下公式对原始图像进行放大处理:x→x r,其中,x代表矩阵M中的元素,r为次数,将变化后的元素x r替换像素值矩阵M中x。 Specifically, the server enlarges the original image according to the following formula: x → x r , where x represents an element in the matrix M, r is the number of times, and the changed element x r is replaced with x in the pixel value matrix M.
灰度化处理是将原始图像呈现出明显的黑白效果的处理。具体地,对放大后的图像进行灰度化处理包括:原始图像中的每个像素的颜色都是通过R(红)、G(绿)和B(蓝)三个分量决定的,而每个分量有0-255这256种值可取(0最暗表示黑色,255最亮表示白色)。而灰度化图像是R、G和B三个分量相同的一种特殊的彩色图像。本实施例中,服务器可直接采用imread函数读取原始图像,即可获取灰度化图像中每个像素对应的R、G和B三个分量的具体数值。Graying is a process that renders the original image with a noticeable black and white effect. Specifically, performing grayscale processing on the enlarged image includes: the color of each pixel in the original image is determined by three components of R (red), G (green), and B (blue), and each The component has 256 values from 0 to 255 (0 is the darkest, and 255 is the brightest, white). The grayscale image is a special color image with the same three components of R, G, and B. In this embodiment, the server may directly use the imread function to read the original image, and the specific values of the three components of R, G, and B corresponding to each pixel in the grayscale image may be obtained.
S22:对灰度化图像进行标准化处理,获取有效图像。S22: Standardize the grayscale image to obtain a valid image.
其中,标准化处理是指对灰度化图像进行标准的变换处理,使之变换为一固定标准形式的处理。具体地,由于灰度化图像中每个像素的像素值比较分散,导致数据的数量级不统一,会影响后续模型识别的准确率,因此需要将灰度化图像进行标准化处理,以统一数据的数量级。Among them, the standardization process refers to a process of performing a standard transformation process on a grayscale image to transform it into a fixed standard form. Specifically, because the pixel values of each pixel in the grayscale image are scattered, the magnitude of the data is not uniform, which will affect the accuracy of subsequent model recognition. Therefore, the grayscale image needs to be standardized to uniformize the magnitude of the data. .
具体地,服务器采用标准化处理的公式对灰度化图像进行标准化处理,以避免灰度化图像中像素值较分散,导致数据的数量级不统一的问题。其中,标准化处理的公式为
Figure PCTCN2018094222-appb-000001
X是灰度化图像M的像素值,X′是有效图像的像素值,M min是灰度化图像M中最小的像素值,M max是灰度化图像M中最大的像素值。
Specifically, the server standardizes the grayscale image by using a formula for normalization processing to avoid the problem that the pixel values in the grayscale image are scattered and the order of data is not uniform. Among them, the standardization formula is
Figure PCTCN2018094222-appb-000001
X is the pixel value of the grayscale image M, X ′ is the pixel value of the effective image, M min is the smallest pixel value in the grayscale image M, and M max is the largest pixel value in the grayscale image M.
S30:采用核密度估计算法对有效图像进行处理,去除背景图片,获取包括手写汉字的目标图像。S30: Use a kernel density estimation algorithm to process the effective image, remove the background image, and obtain a target image including handwritten Chinese characters.
其中,核密度估计算法(kernel density estimation)是一种从数据样本本身出发研究数据分布特征,用于估计概率密度函数的非参数方法。目标图像是指采用核密度估计 算法对有效图像进行处理获取只包含手写汉字的图像。具体地,服务器采用核密度估计算法对有效图像进行处理,以排除背景图片干扰,获取包括手写汉字的目标图像。Among them, the kernel density estimation algorithm (kernel density estimation) is a non-parametric method that studies the data distribution characteristics from the data sample itself to estimate the probability density function. The target image refers to an image that contains only handwritten Chinese characters by processing a valid image using a kernel density estimation algorithm. Specifically, the server uses a kernel density estimation algorithm to process the effective image to eliminate background image interference and obtain a target image including handwritten Chinese characters.
具体地,核密度估计算法的计算公式为
Figure PCTCN2018094222-appb-000002
其中,K(.)为核函数,h为像素值范围,x为要估计概率密度的像素的像素值,x i为h范围内的第i个像素值,n为h范围内的像素值x的个数,
Figure PCTCN2018094222-appb-000003
表示像素的估计概率密度。
Specifically, the calculation formula of the kernel density estimation algorithm is
Figure PCTCN2018094222-appb-000002
Among them, K (.) Is the kernel function, h is the pixel value range, x is the pixel value of the pixel whose probability density is to be estimated, x i is the i-th pixel value in the h range, and n is the pixel value x in the h range. Number of
Figure PCTCN2018094222-appb-000003
Represents the estimated probability density of a pixel.
在一实施例中,如图4所示,步骤S30中,即采用核密度估计算法对有效图像进行处理,获取包括手写汉字的目标图像,具体包括如下步骤:In an embodiment, as shown in FIG. 4, in step S30, a kernel density estimation algorithm is used to process an effective image to obtain a target image including handwritten Chinese characters, which specifically includes the following steps:
S31:对有效图像中的像素值进行统计,获取有效图像直方图。S31: Perform statistics on the pixel values in the effective image to obtain a valid image histogram.
其中,有效图像直方图是对有效图像中的像素值进行统计所获取的直方图。直方图(Histogram)是由一系列高度不等的纵向条纹或线段表示数据分布的情况的一种统计报告图。本实施例中,有效图像直方图的横轴表示像素值,纵轴表示像素值对应的出现频率。服务器通过对有效图像中的像素值进行统计,获取有效图像直方图,以便能够直观的看到有效图像中像素值的分布情况,为后续高斯核密度估计算法进行估计提供技术支持。The effective image histogram is a histogram obtained by statistically calculating pixel values in the effective image. Histogram (Histogram) is a kind of statistical report diagram that represents the distribution of data by a series of vertical stripes or line segments of varying heights. In this embodiment, the horizontal axis of the effective image histogram represents a pixel value, and the vertical axis represents a frequency of occurrence corresponding to the pixel value. The server obtains the effective image histogram by counting the pixel values in the effective image, so that it can intuitively see the distribution of the pixel values in the effective image, and provides technical support for subsequent Gaussian kernel density estimation algorithms.
S32:采用高斯核密度估计算法对有效图像直方图进行处理,获取与有效图像直方图对应的至少一个频率极大值和至少一个频率极小值。S32: Use a Gaussian kernel density estimation algorithm to process the effective image histogram to obtain at least one frequency maximum and at least one frequency minimum corresponding to the effective image histogram.
其中,高斯核密度估计算法是指核密度估计算法中的核函数为高斯核函数的核密度估计方法。高斯核函数的公式为
Figure PCTCN2018094222-appb-000004
其中,K (x)指像素(自变量)为x的高斯核函数,x指有效图像中的像素值,e和π为常数。频率极大值指在频率分布直方图中,不同频率区间上的极大值。频率极小值指在频率分布直方图中,在同一频率区间上与频率极大值相对应的极小值。
Among them, the Gaussian kernel density estimation algorithm refers to a kernel density estimation method in which the kernel function is a Gaussian kernel function. The formula of the Gaussian kernel function is
Figure PCTCN2018094222-appb-000004
Among them, K (x) refers to a Gaussian kernel function in which pixels (independent variables) are x, x refers to a pixel value in an effective image, and e and π are constants. Frequency maxima refer to the maxima at different frequency intervals in the frequency distribution histogram. The frequency minimum value refers to the minimum value corresponding to the frequency maximum value in the same frequency interval in the frequency distribution histogram.
具体地,采用高斯核密度函数估算方法对有效图像对应的频率分布直方图进行高斯平滑处理,获取该频率分布直方图对应的高斯平滑曲线。基于该高斯平滑曲线上的频率极大值和频率极小值,获取频率极大值和频率极小值对应横轴上的像素值,以便后续基于获取到的频率极大值和频率极小值对应的像素值便于对有效图像进行分层切分处理,获取分层图像。Specifically, a Gaussian kernel density function estimation method is used to perform Gaussian smoothing on the frequency distribution histogram corresponding to the effective image, and obtain a Gaussian smooth curve corresponding to the frequency distribution histogram. Based on the frequency maxima and frequency minima on the Gaussian smooth curve, obtain the pixel values on the horizontal axis corresponding to the frequency maxima and frequency minima in order to subsequently based on the obtained frequency maxima and frequency minima Corresponding pixel values are convenient for layered segmentation processing of effective images to obtain layered images.
S33:基于频率极大值和频率极小值对有效图像进行分层切分处理,获取分层图像。S33: Perform hierarchical segmentation processing on the effective image based on the frequency maximum and frequency minimum to obtain a layered image.
其中,分层图像是基于极大值和极小值对有效图像进行分层切分处理所获取的图像。服务器先获取频率极大值和频率极小值对应的像素值,根据频率极大值对应的像素值对有效图像进行分层处理,有效图像中有多少个频率极大值,则对应的有效图像的像素值就被划分为多少类;然后以频率极小值对应的像素值作为类之间的边界值,根据类及类之间的边界,对该有效图像进行分层处理,以获取分层图像。Among them, the layered image is an image obtained by performing hierarchical segmentation processing on the effective image based on the maximum value and the minimum value. The server first obtains the pixel values corresponding to the maximum frequency value and the minimum frequency value, and processes the effective image according to the pixel values corresponding to the maximum frequency value. How many frequency maximum values are in the effective image, the corresponding effective image The number of pixel values is divided into classes; then the pixel value corresponding to the minimum frequency value is used as the boundary value between the classes, and the effective image is layered according to the class and the boundary between the classes to obtain the layered image.
如有效图像中的频率极大值对应的像素值分别为18、59、95、118和153,频率极小值对应的像素值分别为27、65、105和133。根据有效图像中的频率极大值的个数可以确定该有效图像的像素值可以被分为5类,该有效图像可以被分为5层,频率极小值对应的像素值作为类之间的边界值,由于最小的像素值为0,最大的像素值为255,因此,根据类之间的边界值则可以确定以像素值为18的分层图像,该分层图像对应的像素值为[0,27);以像素值为59的分层图像,该分层图像对应的像素值为[27,65);以像素值为95的分层图像,该分层图像对应的像素值为[65,105);以像素值为118的分层图像,该 分层图像对应的像素值为[105,133);以像素值为153的分层图像,该分层图像对应的像素值为[133,255]。For example, the pixel values corresponding to the frequency maximum in the effective image are 18, 59, 95, 118, and 153, and the pixel values corresponding to the minimum frequency are 27, 65, 105, and 133, respectively. According to the number of frequency maxima in the effective image, it can be determined that the pixel values of the effective image can be divided into 5 categories, the effective image can be divided into 5 layers, and the pixel values corresponding to the minimum frequency are used as the Boundary value, because the minimum pixel value is 0 and the maximum pixel value is 255. Therefore, according to the boundary value between classes, a layered image with a pixel value of 18 can be determined, and the pixel value corresponding to the layered image is [ 0,27); a layered image with a pixel value of 59 and the corresponding pixel value is [27,65); a layered image with a pixel value of 95 and the corresponding pixel value is [ 65,105); a layered image with a pixel value of 118, which corresponds to a pixel value of [105,133); a layered image with a pixel value of 153, which corresponds to a pixel value of [133,255].
S34:基于分层图像,获取包括手写汉字的目标图像。S34: Obtain a target image including handwritten Chinese characters based on the layered image.
服务器在获取分层图像后,对分层图像进行二值化、腐蚀和叠加处理,以获取包括手写汉字的目标图像。其中,二值化处理是指将分层图像上的像素点的像素值设置为0(黑色)或1(白色),将整个分层图像呈现出明显的黑白效果的处理。对分层图像进行二值化处理后,对二值化处理后的分层图像进行腐蚀处理,去除背景图片部分,保留分层图像上的手写汉字部分。由于每个分层图像上的像素值是属于不同范围的像素值,因此,对分层图像进行腐蚀处理后,还需要将每个分层图像叠加,生成仅含有手写汉字的目标图像。其中,叠加处理指将分层后的仅保留有手写字部分的图像叠加成一个图像的处理过程,从而实现获取只包含手写汉字的目标图像的目的。本实施例中,采用imadd函数对分层图像进行叠加处理,以获取只包含手写汉字的目标图像。imadd函数是计算机语言中的一个函数,用于对分层图像进行叠加。After obtaining the layered image, the server performs binarization, erosion, and superposition processing on the layered image to obtain a target image including handwritten Chinese characters. The binarization process refers to a process in which the pixel value of a pixel on a layered image is set to 0 (black) or 1 (white), and the entire layered image presents an obvious black and white effect. After the layered image is binarized, the binarized layered image is corroded to remove the background image part and retain the handwritten Chinese characters on the layered image. Because the pixel values on each layered image are pixel values belonging to different ranges, after the layered image is corroded, each layered image needs to be superimposed to generate a target image containing only handwritten Chinese characters. The superimposing process refers to a process of superimposing a layered image with only a handwritten portion into an image, thereby achieving the purpose of obtaining a target image containing only handwritten Chinese characters. In this embodiment, the layered image is superimposed using the imadd function to obtain a target image containing only handwritten Chinese characters. The imadd function is a function in computer language for superimposing layered images.
在一个实施例中,如图5所示,步骤S34中,即基于分层图像,获取包括手写汉字的目标图像,具体包括如下步骤:In an embodiment, as shown in FIG. 5, in step S34, that is, based on the layered image, obtaining a target image including handwritten Chinese characters, specifically includes the following steps:
S341:对分层图像进行二值化处理,获取二值化图像。S341: Perform binarization processing on the layered image to obtain a binarized image.
二值化图像指对分图像进行二值化处理获取的图像。具体地,服务器获取分层图像后,基于分层图像的采样像素值和预先选取的阈值进行比较,将采样像素值大于或等于阈值的像素值设置为1,小于阈值的像素值设置为0的过程。采样像素值是分层图像中每一像素点对应的像素值。阈值的大小会影响分层图像二值化处理的效果,阈值选取合适时,对分层图像进行二值化处理的效果较好;阈值选取不合适时,会影响分层图像二值化处理的效果。为了方便操作,简化计算过程,本实施例中的阈值是由开发人员根据经验确定。对分层图像进行二值化处理,方便后续进行腐蚀处理。A binarized image refers to an image obtained by binarizing a sub-image. Specifically, after the server obtains the layered image, it compares the sampled pixel value of the layered image with a preselected threshold, and sets the pixel value greater than or equal to the threshold to 1 and the pixel value less than the threshold to 0. process. The sampled pixel value is the pixel value corresponding to each pixel point in the layered image. The size of the threshold value will affect the effect of the binarization process of the layered image. When the threshold value is selected properly, the effect of the binarization process on the layered image is better; when the threshold value is not selected properly, the effect of the binarization process of the layered image will be affected. effect. To facilitate operations and simplify the calculation process, the threshold in this embodiment is determined by the developer based on experience. Binarize the layered image to facilitate subsequent corrosion treatment.
S342:对二值化图像中的像素进行检测标记,获取二值化图像对应的连通区域。S342: Detect pixels in the binarized image to obtain a connected region corresponding to the binarized image.
其中,连通区域是指某一特定像素周围的邻接像素所围成的区域。在二值化图像中连通区域是指其周围的邻接像素均为0,某一特定像素与邻接像素为1,例如某特定像素为0,其周围的邻接像素为1,则将邻接像素所围成的区域作为连通区域。The connected area refers to an area surrounded by adjacent pixels around a specific pixel. In a binarized image, a connected region means that the neighboring pixels around it are all 0, and a specific pixel and the neighboring pixel are 1, for example, a particular pixel is 0, and the surrounding neighboring pixels are 1, and the neighboring pixels are surrounded. The resulting area is used as the connected area.
具体地,二值化图像对应一像素矩阵,其中包含行和列。对二值化图像中的像素进行检测标记具体包括如下过程:(1)对像素矩阵进行逐行扫描,把每一行中连续的白色像素组成一个序列称为一个团,并记下它的起点、终点以及所在的行号。(2)对于除了第一行外的所有行里的团,如果它与前一行中的所有团都没有重合区域,则给它一个新的标号;如果它仅与上一行中一个团有重合区域,则将上一行的那个团的标号赋给它;如果它与上一行的2个以上的团有重合区域,则给当前团赋一个相关联团的最小标号,并将上一行的这几个团中的标记写入等价对,说明它们属于一类。例如,若第二行中与上一行有2个团(1和2)有重合区域,则赋予该团上一行的2个团中的最小标号即1,并将上一行的这几个团中的标记写入等价对即将(1,2)记为等价对。等价对是指互相连通的两个团的标记,例如(1,2)表示标记1的团与标记2的团互相连通即为一个连通区域。本实施例中是以像素矩阵中某个特定像素相邻的8个邻接像素作为该元素的连通区域。Specifically, the binarized image corresponds to a pixel matrix, which includes rows and columns. Detecting pixels in a binarized image specifically includes the following processes: (1) Scan the pixel matrix line by line, group consecutive white pixels in each line into a sequence called a cluster, and note its starting point, End point and line number. (2) For the clique in all rows except the first row, if it does not overlap with any clique in the previous row, give it a new label; if it only overlaps with a clique in the previous row , Assign the label of the group in the previous line to it; if it has a coincident area with more than 2 groups in the previous line, give the current group a minimum label of the associated group, and assign these The tokens in the clique are written into equivalent pairs, indicating that they belong to a class. For example, if there are 2 clusters (1 and 2) in the second row with overlapping areas, then the smallest number given to the 2 clusters in the previous row is 1, and the groups in the previous row are assigned The equivalence pair written by the tag will be recorded as (1, 2) equivalence pair. Equivalent pairs refer to the marks of two cliques connected to each other. For example, (1, 2) indicates that the clique of mark 1 and the clique of mark 2 are connected to each other, which is a connected region. In this embodiment, eight adjacent pixels adjacent to a specific pixel in the pixel matrix are used as the connected region of the element.
S343:对二值化图像对应的连通区域进行腐蚀和叠加处理,获取包括手写汉字的目标图像。S343: Eroding and superimposing the connected area corresponding to the binary image to obtain a target image including handwritten Chinese characters.
其中,腐蚀处理是用于形态学中去除图像的某部分的内容的操作。采用MATLAB中内置的imerode函数对二值化图像的连通区域进行腐蚀处理。具体地,对二值化图像对应的连通区域进行腐蚀处理包括如下步骤:首先,选取一个n×n的结构元素,本实施例中是以像素矩阵中每个元素相邻的8个元素值作为该元素的连通区域的,因此,选取的结构元素为3×3的像素矩阵。结构元素是一个n×n的像素矩阵,其中的矩阵元素包括0或1。 对分层二值化图像的像素矩阵进行扫描,获取像素值为1的像素点,比较该像素点相邻的8个邻接像素是否全为1,若全为1,则保持不变;若不全为1,则像素矩阵中该像素点相邻的8个邻接像素都变为0(黑色)。该变为0部分则为分层二值化图像被腐蚀的部分。Matlab是在数学科技应用领域中数值计算方面的应用软件。Among them, the etching process is an operation for removing the content of a part of an image in morphology. The built-in imerode function is used to etch the connected areas of the binary image. Specifically, etching the connected region corresponding to the binarized image includes the following steps: First, an n × n structural element is selected. In this embodiment, the value of 8 elements adjacent to each element in the pixel matrix is used as The connected region of this element is, therefore, the selected structural element is a 3 × 3 pixel matrix. The structural element is an n × n pixel matrix, where the matrix elements include 0 or 1. Scan the pixel matrix of the layered binarized image to obtain pixels with a pixel value of 1 and compare whether the 8 adjacent pixels adjacent to the pixel are all 1; if they are all 1, they remain unchanged; if not, If it is 1, the 8 adjacent pixels adjacent to the pixel point in the pixel matrix will become 0 (black). The part that becomes 0 is the part where the layered binarized image is corroded. Matlab is an application software for numerical calculations in the field of mathematical technology applications.
基于预先设置的手写字区域抗腐蚀能力范围对二值化图像进行筛选,对于不在手写字区域抗腐蚀能力范围内的二值化图像部分删除,获取二值化图像中在手写字区域抗腐蚀能力范围内的部分。对筛选出的符合手写字区域抗腐蚀能力范围的每个二值化图像部分对应的像素矩阵进行叠加,就可以获取到仅含有手写汉字的目标图像。其中,手写字区域抗腐蚀能力可以采用公式:
Figure PCTCN2018094222-appb-000005
计算,s 1表示二值化图像中被腐蚀后的总面积,s 2表示二值化图像中被腐蚀前的总面积,p为手写字区域抗腐蚀能力。
The binarized image is filtered based on the preset anti-corrosion capability range of the hand-written region. Partial deletion of the binary image that is not within the anti-corrosion capability of the hand-written region is obtained to obtain the anti-corrosion capability of the hand-written region in the binary image Within the range. The target pixel image containing only handwritten Chinese characters can be obtained by superimposing the pixel matrix corresponding to each binarized image portion that fits the range of the corrosion resistance of the handwritten area. Among them, the anti-corrosion ability of the hand-written area can adopt the formula:
Figure PCTCN2018094222-appb-000005
Calculated, s 1 represents the total area after being corroded in the binarized image, s 2 represents the total area before being corroded in the binarized image, and p is the corrosion resistance of the handwritten area.
例如,预先设置的手写字区域抗腐蚀能力范围为[0.05,0.8],根据公式
Figure PCTCN2018094222-appb-000006
计算每个二值化图像被腐蚀后的总面积和二值化图像被腐蚀前的总面积的比值p。通过计算二值化图像中某区域腐蚀后的总面积和腐蚀前的总面积的比值p不在预先设置的手写字区域抗腐蚀能力范围内,则表示该区域的二值化图像是背景图像而不是手写字,需进行腐蚀处理,以去除该背景图像。若二值化图像中的某区域腐蚀后的总面积和腐蚀前的总面积的比值p在[0.05,0.8]范围内,则表示该区域的二值化图像是手写汉字,需保留。对保留下的二值化图像对应的像素矩阵进行叠加处理,获取含有手写汉字的目标图像。
For example, the preset anti-corrosion range of the handwriting area is [0.05,0.8], according to the formula
Figure PCTCN2018094222-appb-000006
Calculate the ratio p between the total area of each binarized image and the total area before the binarized image. By calculating the ratio p of the total area after erosion to the total area before erosion in the binarized image, which is not in the range of the anti-corrosion capability of the handwritten area, it means that the binarized image of the area is a background image instead of Write by hand and need to be etched to remove the background image. If the ratio p of the total area after erosion to the total area before erosion in the binarized image is in the range of [0.05, 0.8], it means that the binarized image of the region is a handwritten Chinese character, which needs to be retained. The pixel matrix corresponding to the retained binary image is superimposed to obtain a target image containing handwritten Chinese characters.
步骤S341-S343中,对分层图像进行二值化处理,获取二值化图像,然后对二值化图像中的像素进行检测标记,获取二值化图像对应的连通区域,对与结构元素不完全一致的像素矩阵中的元素都变为0,元素为0的二值化图像为黑色,该黑色部分则是二值化图像被腐蚀的部分,通过计算二值化图像被腐蚀后的总面积和二值化图像被腐蚀前的总面积的比值p,判断该比值是否在预先设置的手写字区域抗腐蚀能力范围,以便去除每一分层图像中的背景图像,保留手写汉字,最后将每一分层图像进行叠加,达到获取目标图像的目的。In steps S341-S343, the layered image is binarized to obtain a binarized image, and then pixels in the binarized image are detected and labeled, and connected areas corresponding to the binarized image are obtained. The elements in the identical pixel matrix all become 0, the binarized image with element 0 is black, and the black part is the corroded part of the binarized image. The total area of the binarized image is calculated by calculating And the ratio of the total area of the binarized image before being eroded, to determine whether the ratio is within the preset anti-corrosion range of the handwriting area, in order to remove the background image in each layered image, retain the handwritten Chinese characters, and finally replace each A layered image is superimposed to achieve the purpose of obtaining the target image.
S40:采用垂直投影法对目标图像进行单字体切割,获取待识别单字体图像。S40: Single-font cutting is performed on the target image using a vertical projection method to obtain a single-font image to be identified.
其中,垂直投影法是指将目标图像中每一行手写汉字进行垂直方向的投影,获取垂直投影直方图的方法。垂直投影直方图是指反映目标图像在垂直方向上的像素个数。Among them, the vertical projection method refers to a method of vertically projecting each line of handwritten Chinese characters in a target image to obtain a vertical projection histogram. The vertical projection histogram refers to the number of pixels reflecting the target image in the vertical direction.
具体地,采用垂直投影法对目标图像进行单字体切割具体包括如下步骤:服务器对目标图像中的至少一行手写汉字逐行进行扫描,获取每一行手写汉字对应的像素值,根据每一像素值对应的垂直投影直方图,获取不同像素值对应的像素数量,按照垂直投影直方图中的极小值,对目标图像进行循环切割,获取待识别单字体图像。可以理解地,每一个手写汉字对应的像素值是比较集中的,汉字与汉字之间的间隙对应的像素值是比较稀疏的,对应的像素值的密集程度反应在对应的垂直投影直方图中,则在垂直投影直方图中有汉字的像素值对应的像素数量比较高,没有汉字的像素值对应的像素数量比较低,通过垂直投影法能够有效对目标图像进行单字体切割,获取待识别单字体图像,实现简单,为后续进行手写字识别提供技术支持。Specifically, using the vertical projection method to perform single font cutting on the target image specifically includes the following steps: the server scans at least one line of handwritten Chinese characters in the target image line by line to obtain pixel values corresponding to each line of handwritten Chinese characters, and corresponds to each pixel value. The vertical projection histogram is used to obtain the number of pixels corresponding to different pixel values. According to the minimum value in the vertical projection histogram, the target image is cyclically cut to obtain a single font image to be identified. Understandably, the pixel value corresponding to each handwritten Chinese character is relatively concentrated, and the pixel value corresponding to the gap between the Chinese character and the Chinese character is relatively sparse. The density of the corresponding pixel value is reflected in the corresponding vertical projection histogram. In the vertical projection histogram, the number of pixels corresponding to the pixel values of Chinese characters is relatively high, and the number of pixels corresponding to the pixel values of no Chinese characters is relatively low. The vertical projection method can effectively perform single font cutting on the target image to obtain the single font to be identified. The image is simple to implement and provides technical support for subsequent handwriting recognition.
S50:将待识别单字体图像输入到基于长短时记忆神经网络的目标手写字识别模型中进行识别,获取待识别单字体图像对应的手写汉字。S50: Input the single-font image to be recognized into a target handwriting recognition model based on a long-term and short-term memory neural network for recognition, and obtain handwritten Chinese characters corresponding to the single-font image to be recognized.
其中,目标手写字识别模型是预先基于长短时记忆神经网络训练的用于识别手写字的模型。长短时记忆神经(long-short term memory,简称LSTM)网络是一种时间递归神经网络,适合于处理和预测具有时间序列,且时间序列间隔和延迟相对较长的重要事件。具 体地,服务器将待识别单字图像输入到目标手写字识别模型中进行识别,使得目标手写字识别模型能够联系上下文进行识别,获取每一待识别单字图像对应的手写汉字,提高识别的准确率。Among them, the target handwriting recognition model is a model for handwriting recognition previously trained based on long and short-term memory neural networks. Long-short-term memory neural (LSTM) network is a kind of time-recursive neural network, which is suitable for processing and predicting important events with time series and time series with relatively long intervals and delays. Specifically, the server inputs the to-be-recognized word image into the target handwriting recognition model for recognition, so that the target handwriting recognition model can contact the context for recognition, obtain handwritten Chinese characters corresponding to each to-be-recognized word image, and improve recognition accuracy.
本实施例中,用户可将计算机设备上的采集模块采集包含手写汉字的原始图像上传到服务器,以使服务器获取原始图像。然后,服务器对原始图像进行预处理,获取排除干扰因素的有效图像。采用核密度估计算法对有效图像进行处理,去除背景图片,获取只包含手写汉字的目标图像,进一步排除干扰。采用垂直投影法对目标图像进行单字体切割,获取待识别单字体图像,容易实现。将待识别单字体图像输入到基于长短时记忆神经网络的目标手写字识别模型中进行识别,以使待识别单字体图像具备时序性,使得目标手写字识别模型能够联系上下文进行识别,获取每一单字体图像对应的手写汉字,提高识别的准确率。In this embodiment, the user may upload the original image containing the handwritten Chinese characters collected by the acquisition module on the computer device to the server, so that the server obtains the original image. Then, the server preprocesses the original image to obtain a valid image that excludes interference factors. The kernel density estimation algorithm is used to process the effective image, remove the background image, and obtain the target image containing only handwritten Chinese characters to further eliminate interference. The vertical projection method is used to cut the single font of the target image to obtain the single font image to be recognized, which is easy to implement. The to-be-recognized single-font image is input to a target handwriting recognition model based on long-term and short-term memory neural network for recognition, so that the to-be-recognized single-font image has timeliness, so that the target handwriting recognition model can contact the context for identification, and obtain The handwritten Chinese characters corresponding to the single font image improve the recognition accuracy.
在一实施例中,该手写汉字图像识别方法还包括:预先训练目标手写字识别模型。具体地,如图6所示,预先训练目标手写字识别模型包括如下步骤:In one embodiment, the handwritten Chinese character image recognition method further includes: training a target handwriting recognition model in advance. Specifically, as shown in FIG. 6, pre-training the target handwriting recognition model includes the following steps:
S61:获取训练手写汉字图像。S61: Acquire a training handwritten Chinese character image.
其中,训练手写汉字图像是预先从开源库中采集的用于进行模型训练的样本图像。该训练手写汉字图像包括中文二级字库中每一中文对应的N(N为正整数)张手写字样本。中文二级字库是按汉字的部首笔划顺序编码的非常用汉字库。具体地,采集开源库中的不同人手写的N张手写字样本图像作为训练手写汉字图像,以使服务器获取训练手写汉字图像,由于不同用户的书写习惯不同,因此采用N张手写字样本(即训练手写汉字图像)进行训练,极大的提高了模型的泛化性。The training handwritten Chinese character image is a sample image collected from an open source library for model training in advance. The training handwritten Chinese character image includes N (N is a positive integer) handwriting samples corresponding to each Chinese character in the secondary Chinese character library. The Chinese secondary character library is a very useful Chinese character library that is coded in the order of radical strokes of Chinese characters. Specifically, N handwritten sample images of different people's handwriting in the open source library are collected as training handwritten Chinese character images, so that the server obtains the trained handwritten Chinese character images. Because different users have different writing habits, N handwritten samples are used (i.e. Training handwritten Chinese character images) for training, which greatly improves the generalization of the model.
S62:采用垂直投影法对训练手写汉字图像进行单字体切割,获取训练单字体图像。S62: Use a vertical projection method to perform single font cutting on the training handwritten Chinese character image to obtain a training single font image.
其中,垂直投影法对训练手写汉字图像进行单字体切割的切割过程与步骤S40相同,为避免重复,在此不再赘述。训练单字体图像是用于输入模型进行训练的单字体图像。The cutting process of single font cutting of the training handwritten Chinese character image by the vertical projection method is the same as step S40. To avoid repetition, details are not described herein again. The training single font image is a single font image used for input model training.
S63:对训练单字体图像进行顺序标注,并将标注好的训练单字体图像输入到长短时记忆神经网络中进行训练,采用随机梯度下降算法对长短时记忆神经网络的网络参数进行更新,获取目标手写字识别模型。S63: Annotate the training single font images sequentially, and input the labeled training single font images into the long-term and short-term memory neural network for training, and use a stochastic gradient descent algorithm to update the network parameters of the long-term and short-term memory neural network to obtain the target Handwriting recognition model.
其中,随机梯度下降算法是每次在更新网络参数时,采用随机选取的一个样本(训练单字体图像)来进行更新,而不是采用所有样本进行更新,加快训练速率。网络参数是长短时记忆神经网络的各层之间的权值和偏置。长短时记忆神经网络具有时间记忆功能,因而用来处理携带时序状态的训练单字体图像。Among them, the random gradient descent algorithm uses a randomly selected sample (training single font image) for updating each time when updating network parameters, instead of using all samples for updating, speeding up the training rate. The network parameters are the weights and offsets between the layers of the long- and short-term memory neural network. The long-term and short-term memory neural network has the function of time memory, so it is used to process the training single font image carrying the time series state.
长短时记忆神经网络具有一输入层、至少一个隐藏层和一输出层的网络结构。其中,输入层是长短时记忆神经网络的第一层,用于接收外界信号,即负责接收训练单字体图像。输出层是长短时记忆神经网络的最后一层,用于向外界输出信号,即负责输出长短时记忆神经网络的计算结果。隐藏层是长短时记忆神经网络中除输入层和输出层之外的各层,用于对训练单字体图像进行处理,获取长短时记忆神经网络的计算结果。可以理解地,采用长短时记忆神经网络进行模型训练增加了训练单字体图像的时序性,以便根据上下文对训练单字体图像进行训练,从而提高了目标手写字识别模型的准确率。本实施例中,长短时记忆神经网络的输出层采用Softmax(回归模型)进行回归处理,用于分类输出权值矩阵。Softmax(回归模型)是一种常用于神经网络的分类函数,它将多个神经元的输出,映射到[0,1]区间内,可以理解成概率,计算起来简单方便,从而来进行多分类输出,使其输出结果更准确。The long-short-term memory neural network has a network structure of an input layer, at least one hidden layer, and an output layer. The input layer is the first layer of the long-term and short-term memory neural network, which is used to receive external signals, that is, it is responsible for receiving training single font images. The output layer is the last layer of the long-term and short-term memory neural network, which is used to output signals to the outside world, that is, it is responsible for outputting the calculation results of the long-term and short-term memory neural network. Hidden layers are layers other than the input layer and the output layer in the long-term and short-term memory neural network, which are used to process the training single font image and obtain the calculation results of the long-term and short-term memory neural network. Understandably, the use of long-term and short-term memory neural network for model training increases the timeliness of the training single font image, so as to train the training single font image according to the context, thereby improving the accuracy of the target handwriting recognition model. In this embodiment, the output layer of the long-term and short-term memory neural network uses Softmax (regression model) for regression processing, and is used to classify the output weight matrix. Softmax (regression model) is a classification function commonly used in neural networks. It maps the output of multiple neurons into the [0,1] interval, which can be understood as a probability. It is simple and convenient to calculate, so as to perform multi-classification. Output to make its output more accurate.
本实施例中,先获取训练手写汉字图像,采用垂直投影法对训练手写汉字图像进行单字体切割,获取训练单字体图像,以便对训练单字体图像进行顺序标注,以使训练单字体图像具备时序性,将标注好的训练单字体图像输入到长短时记忆神经网络中进行训练,根据训练单字体图像的时序性,以便长短时记忆神经网络根据上下文对训练单字体图像进行 训练,从而提高了目标手写字识别模型的准确率。In this embodiment, a training handwritten Chinese character image is first acquired, and a single font cutting is performed on the training handwritten Chinese character image using a vertical projection method to obtain a training single font image, so as to sequentially label the training single font image, so that the training single font image has timing Performance, input the labeled training single font image into the long-term and short-term memory neural network for training, according to the time series of the training single-font image, so that the short-term memory neural network trains the training single-font image according to the context, thereby improving the goal Accuracy of handwriting recognition model.
在一实施例中,如图7所示,步骤S63中,即对训练单字体图像进行顺序标注,并将标注好的训练单字体图像输入到长短时记忆神经网络中进行训练,采用随机梯度下降算法对长短时记忆神经网络的网络参数进行更新,获取目标手写字识别模型,具体包括如下步骤:In an embodiment, as shown in FIG. 7, in step S63, the training single font image is sequentially labeled, and the labeled training single font image is input to a long-term and short-term memory neural network for training, and random gradient descent is used. The algorithm updates the network parameters of the long-term and short-term memory neural network to obtain the target handwriting recognition model, which specifically includes the following steps:
S631:在长短时记忆神经网络的隐藏层采用第一激活函数对训练单字体图像进行处理,获取携带激活状态标识的神经元。S631: In the hidden layer of the long-term and short-term memory neural network, the training single font image is processed by using the first activation function to obtain a neuron carrying an activation state identifier.
其中,长短时记忆神经网络的隐藏层中的每个神经元包括三个门,其分别为输入门、遗忘门和输出门。遗忘门决定了在神经元中所要丢弃的过去的信息。输入门决定了在神经元中所要增加的信息。输出门决定了在神经元中所要输出的信息。第一激活函数是用于激活神经元状态的函数。神经元状态决定了各个门(即输入门、遗忘门和输出门)的丢弃、增加和输出的信息。激活状态标识包括通过标识和不通过标识。本实施例中的输入门、遗忘门和输出门对应的标识分别为i、f和o。Among them, each neuron in the hidden layer of the long-term and short-term memory neural network includes three gates, which are an input gate, a forgetting gate, and an output gate, respectively. The forget gate determines the past information to be discarded in the neuron. The input gate determines the information to be added to the neuron. The output gate determines the information to be output in the neuron. The first activation function is a function for activating a neuron state. The state of the neuron determines the information discarded, added, and output by each gate (ie, input gate, forget gate, and output gate). The activation status flag includes a pass flag and a fail flag. The identifiers corresponding to the input gate, the forget gate, and the output gate in this embodiment are i, f, and o, respectively.
本实施例中,具体选用Sigmoid(S型生长曲线)函数作为第一激活函数,Sigmoid函数是一个在生物学中常见的S型的函数,在信息科学中,由于其单增以及反函数单增等性质,Sigmoid函数常被用作神经网络的阈值函数,将变量映射到0,1之间。其激活函数的计算公式为
Figure PCTCN2018094222-appb-000007
其中,z表示遗忘门的输出值。
In this embodiment, the Sigmoid (S-shaped growth curve) function is specifically selected as the first activation function. The Sigmoid function is a S-shaped function common in biology. In information science, due to its single increase and inverse function single increase In other properties, the Sigmoid function is often used as a threshold function for neural networks, mapping variables between 0 and 1. The calculation formula for its activation function is
Figure PCTCN2018094222-appb-000007
Among them, z represents the output value of the forget gate.
具体地,遗忘门中包括遗忘门限,通过计算每一神经元(训练单字体图像)的激活状态,以获取携带激活状态标识为通过标识的神经元。其中,采用遗忘门的计算公式f t=σ(W f·[h t-1,x t]+b f)计算遗忘门哪些信息被接收(即只接收携带激活状态标识为通过标识的神经元),f t表示遗忘门限(即激活状态),W f表示遗忘门的权重矩阵,b f表示遗忘门的权值偏置项,h t-1表示上一时刻神经元的输出,x t表示当前时刻的输入数据(即训练单字体图像),t表示当前时刻,t-1表示上一时刻。遗忘门中还包括遗忘门限,通过遗忘门的计算公式对训练单字体图像进行计算会得到一个0-1区间的标量,此标量决定了神经元根据当前状态和过去状态的综合判断所接收过去信息的比例,以达到数据的降维,减少计算量,提高训练效率。 Specifically, the forgetting gate includes a forgetting threshold. By calculating the activation state of each neuron (training font image), a neuron carrying an activation state identifier as a pass identifier is obtained. Among them, the calculation formula of the forgetting gate is f t = σ (W f · [h t-1 , x t ] + b f ) to calculate which information of the forgetting gate is received (that is, only the neurons carrying the activation status flag as the pass flag are received). ), F t represents the forgetting threshold (that is, the activation state), W f represents the weight matrix of the forgetting gate, b f represents the weight bias term of the forgetting gate, h t-1 represents the output of the neuron at the previous moment, and x t represents The input data at the current time (that is, the training single font image), t represents the current time, and t-1 represents the previous time. The forgetting gate also includes the forgetting threshold. The calculation of the font image of the training single through the calculation formula of the forgetting gate will obtain a scalar in the range of 0-1. This scalar determines the past information received by the neuron based on the comprehensive judgment of the current state and the past state. To achieve data reduction, reduce the amount of calculation, and improve training efficiency.
S632:在长短时记忆神经网络的隐藏层采用第二激活函数对携带激活状态标识的神经元进行处理,获取长短时记忆神经网络隐藏层的输出值。S632: In the hidden layer of the long-term and short-term memory neural network, a second activation function is used to process the neuron carrying the identification of the activation state to obtain the output value of the hidden layer of the long-term and short-term memory neural network.
其中,长短时记忆神经网络隐藏层的输出值包括输入门的输出值、输出门的输出值和神经元状态。具体地,在长短时记忆神经网络的隐藏层中的输入门中,采用第二激活函数携带激活状态标识为通过标识的神经元进行计算,获取隐藏层的输出值。本实施例中,由于线性模型的表达能力不够,因此采用tanh(双曲正切)函数作为输入门的激活函数(即第二激活函数),可加入非线性因素使得训练出的目标手写字识别模型能够解决更复杂的问题。并且,激活函数tanh(双曲正切)具有收敛速度快的优点,可以节省训练时间,增加训练效率。The output value of the hidden layer of the long-term and short-term memory neural network includes the output value of the input gate, the output value of the output gate, and the state of the neuron. Specifically, in the input gate in the hidden layer of the long-term and short-term memory neural network, a second activation function is used to carry the activation state identifier to perform calculation through the identified neurons to obtain the output value of the hidden layer. In this embodiment, because the expressive ability of the linear model is insufficient, a tanh (hyperbolic tangent) function is used as the activation function of the input gate (ie, the second activation function). Non-linear factors can be added to make the trained target handwriting recognition model Able to solve more complex problems. In addition, the activation function tanh (hyperbolic tangent) has the advantage of fast convergence speed, which can save training time and increase training efficiency.
具体地,通过输入门的计算公式计算输入门的输出值。其中,输入门中还包括输入门限,输入门的计算公式为i t=σ(W i·[h t-1,x t]+b i),W i为输入门的权值矩阵,i t表示输入门限,b i表示输入门的偏置项,通过输入门的计算公式对训练单字体图像进行计算会得到一 个0-1区间的标量(即输入门限),此标量控制了神经元根据当前状态和过去状态的综合判断所接收当前信息的比例,即接收新输入的信息的比例,以减少计算量,提高训练效率。 Specifically, the output value of the input gate is calculated by a calculation formula of the input gate. Wherein the input gate further includes a calculation formula input threshold, the input gate is i t = σ (W i · [h t-1, x t] + b i), W i is the weight of input gates value matrix, i t Represents the input threshold, and b i represents the bias term of the input gate. The calculation of the font image of the training single through the calculation formula of the input gate will obtain a 0-1 interval scalar (that is, the input threshold). This scalar controls the neuron according to the current The state and the past state comprehensively judge the proportion of the current information received, that is, the proportion of the newly input information, to reduce the amount of calculation and improve the training efficiency.
采用神经元状态的计算公式
Figure PCTCN2018094222-appb-000008
Figure PCTCN2018094222-appb-000009
计算当前神经元状态;其中,W c表示神经元状态的权重矩阵,b c表示神经元状态的偏置项,
Figure PCTCN2018094222-appb-000010
表示上一时刻的神经元状态,C t表示当前时刻神经元状态。通过将神经元状态和遗忘门限(输入门限)进行点乘操作,以便模型只输出所需的信息,提高模型学习的效率。
Calculation formula using neuron state
Figure PCTCN2018094222-appb-000008
with
Figure PCTCN2018094222-appb-000009
Calculate the current neuron state; where W c represents the weight matrix of the neuron state, b c represents the bias term of the neuron state,
Figure PCTCN2018094222-appb-000010
Represents the state of the neuron at the previous moment, and C t represents the state of the neuron at the current moment. By performing a dot product operation on the state of the neuron and the forgetting threshold (input threshold), the model can only output the required information, thereby improving the efficiency of model learning.
最后,采用输出门的计算公式o t=σ(W o[h t-1,x t]+b o)计算输出门中哪些信息被输出,再采用公式h t=o t*tanh(C t)计算当前时刻神经元的输出值,其中,o t表示输出门限,W o表示输出门的权重矩阵,b o表示输出门的偏置项,h t表示当前神经元的输出值。 Finally, the output gate calculation formula o t = σ (W o [h t-1 , x t ] + b o ) is used to calculate which information is output in the output gate, and then the formula h t = o t * tanh (C t ) Calculate the output value of the neuron at the current moment, where o t represents the output threshold, W o represents the weight matrix of the output gate, bo represents the bias term of the output gate, and h t represents the output value of the current neuron.
S633:根据长短时记忆神经网络隐藏层的输出值,采用随机梯度下降算法对长短时记忆神经网络的网络参数进行更新,获取目标手写字识别模型。S633: According to the output value of the hidden layer of the long-term and short-term memory neural network, the network parameters of the long-term and short-term memory neural network are updated by using a stochastic gradient descent algorithm to obtain a target handwriting recognition model.
随机梯度下降算法的计算公式具体为
Figure PCTCN2018094222-appb-000011
Figure PCTCN2018094222-appb-000012
其中,J(θ)为损失函数,m表示选取的训练单字体图像的数量且m=1,θ j表示第j层长短时记忆神经网络的网络参数,h θ(x)表示长短时记忆神经网络隐藏层的输出值,(x i,y i)表示第i个训练单字体图像。
The calculation formula of the stochastic gradient descent algorithm is specifically
Figure PCTCN2018094222-appb-000011
with
Figure PCTCN2018094222-appb-000012
Among them, J (θ) is a loss function, m is the number of selected training single font images and m = 1, θ j is the network parameter of the j-th layer long-term and short-term memory neural network, and h θ (x) is the long- and short-term memory neural network. The output value of the network hidden layer, (x i , y i ) represents the i-th training single font image.
首先,根据损失函数构建公式
Figure PCTCN2018094222-appb-000013
构建损失函数,其中,中,J(θ)为损失函数,m表示选取的训练单字体图像的数量且m=1,θ j表示第j层长短时记忆神经网络的网络参数,如W i或b i,h θ(x)表示长短时记忆神经网络隐藏层的输出值,(x i(训练单字体图像),y i(真实结果))表示第i个训练单字体图像。由于本案中采用随机梯度下降算法即每次在更新网络参数时,采用随机选取的一个样本(训练单字体图像)来进行更新,因此,损失函数公式中m=1。通过公式
Figure PCTCN2018094222-appb-000014
对损失函数进行求偏导运算,以更新网络参数即更新各层之间的权值和偏置,将获取的更新后的各层的权值和偏置,应用到长短时记忆神经网络中即可获取目标手写字识别模型。
First, build a formula based on the loss function
Figure PCTCN2018094222-appb-000013
Construction loss function, wherein, in, J (θ) is the loss function, m represents the number of training single font images of the selected and m = 1, θ j represents the network parameter layer j length memory networks, such as W i or b i , h θ (x) represents the output value of the hidden layer of the long-term and short-term memory neural network, (x i (training font image), y i (real result)) represents the i-th training font image. Because the stochastic gradient descent algorithm is used in this case, that is, each time the network parameters are updated, a randomly selected sample (a training single font image) is used for the update. Therefore, m = 1 in the loss function formula. By formula
Figure PCTCN2018094222-appb-000014
Perform partial derivative operations on the loss function to update the network parameters, that is, the weights and offsets between the layers, and apply the updated weights and offsets of the layers to the long-term and short-term memory neural network. A target handwriting recognition model can be obtained.
进一步地,该目标手写字识别模型中的各权值实现了目标手写字识别模型决定丢弃哪些旧信息、增加哪些新信息以及输出哪些信息的功能。在目标手写字识别模型的输出层最 终会输出概率值,该概率值是指训练单字体图像识别出对应的汉字的概率,可广泛应用于手写字识别方面,以达到准确识别手写字图像的目的。Further, each weight in the target handwriting recognition model implements the functions of the target handwriting recognition model to decide which old information to discard, which new information to add, and which information to output. In the output layer of the target handwriting recognition model, a probability value is finally output. The probability value refers to the probability that the training single font image recognizes the corresponding Chinese character. It can be widely used in handwriting recognition to accurately identify the handwritten image. .
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the sequence numbers of the steps in the above embodiments does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application.
在一个实施例中,图8示出与上述实施例中手写汉字图像识别方法一一对应的手写汉字图像识别装置的示意图。如图8所示,该手写汉字图像识别装置包括原始图像获取模块10、有效图像获取模块20、目标图像获取模块30、待识别单字体图像获取模块40和手写汉字获取模块50,各功能模块详细说明如下:In one embodiment, FIG. 8 shows a schematic diagram of a handwritten Chinese character image recognition device corresponding to the handwritten Chinese character image recognition method in the above embodiment. As shown in FIG. 8, the handwritten Chinese character image recognition device includes an original image acquisition module 10, a valid image acquisition module 20, a target image acquisition module 30, a single-font image acquisition module 40 to be identified, and a handwritten Chinese character acquisition module 50. Each functional module is detailed described as follows:
原始图像获取模块10,用于获取原始图像,原始图像包括手写汉字和背景图片。The original image obtaining module 10 is configured to obtain an original image, where the original image includes handwritten Chinese characters and a background picture.
有效图像获取模块20,用于对原始图像进行预处理,获取有效图像。The effective image acquisition module 20 is configured to pre-process the original image to obtain a valid image.
目标图像获取模块30,用于采用核密度估计算法对有效图像进行处理,去除背景图片,获取包括手写汉字的目标图像。The target image acquisition module 30 is configured to process a valid image by using a kernel density estimation algorithm, remove a background picture, and obtain a target image including handwritten Chinese characters.
待识别单字体图像获取模块40,用于采用垂直投影法对目标图像进行单字体切割,获取待识别单字体图像。The to-be-recognized single-font image acquisition module 40 is configured to obtain a to-be-recognized single-font image by performing single-font cutting on a target image using a vertical projection method.
手写汉字获取模块50,用于将待识别单字体图像输入到基于长短时记忆神经网络的目标手写字识别模型中进行识别,获取待识别单字体图像对应的手写汉字。A handwritten Chinese character acquisition module 50 is configured to input a single font image to be recognized into a target handwriting recognition model based on a long-term and short-term memory neural network for recognition, and obtain a handwritten Chinese character corresponding to the single font image to be recognized.
具体地,有效图像获取模块20包括灰度化图像获取单元21和有效图像获取单元22。Specifically, the effective image acquisition module 20 includes a grayscale image acquisition unit 21 and an effective image acquisition unit 22.
灰度化图像获取单元21,用于对原始图像进行放大和灰度化处理,获取灰度化图像。A grayscale image acquisition unit 21 is configured to perform enlargement and grayscale processing on an original image to obtain a grayscale image.
有效图像获取单元22,用于对灰度化图像进行标准化处理,获取有效图像,其中,标准化处理的公式为
Figure PCTCN2018094222-appb-000015
X是灰度化图像M的像素值,X′是有效图像的像素值,M min是灰度化图像M中最小的像素值,M max是灰度化图像M中最大的像素值。
The effective image obtaining unit 22 is configured to perform normalization processing on the grayscale image to obtain an effective image, wherein the formula of the normalization processing is
Figure PCTCN2018094222-appb-000015
X is the pixel value of the grayscale image M, X ′ is the pixel value of the effective image, M min is the smallest pixel value in the grayscale image M, and M max is the largest pixel value in the grayscale image M.
具体地,目标图像获取模块30包括有效图像直方图获取单元31、频率极值获取单元32、分层图像获取单元33和目标图像获取单元34。Specifically, the target image acquisition module 30 includes an effective image histogram acquisition unit 31, a frequency extreme value acquisition unit 32, a layered image acquisition unit 33, and a target image acquisition unit 34.
有效图像直方图获取单元31,用于对有效图像中的像素值进行统计,获取有效图像直方图。The effective image histogram acquisition unit 31 is configured to perform statistics on pixel values in the effective image to obtain an effective image histogram.
频率极值获取单元32,用于采用高斯核密度估计算法对有效图像直方图进行处理,获取与有效图像直方图对应的至少一个频率极大值和至少一个频率极值获取单元,用于频率极小值。A frequency extreme value acquisition unit 32 is configured to process a valid image histogram by using a Gaussian kernel density estimation algorithm, and obtain at least one frequency maximum value and at least one frequency extreme value acquisition unit corresponding to the effective image histogram. Small value.
分层图像获取单元33,用于基于频率极大值和频率极小值对有效图像进行分层切分处理,获取分层图像。A layered image acquisition unit 33 is configured to perform hierarchical segmentation processing on an effective image based on a frequency maximum and a frequency minimum to obtain a layered image.
目标图像获取单元34,用于基于分层图像,获取包括手写汉字的目标图像。The target image acquisition unit 34 is configured to acquire a target image including a handwritten Chinese character based on the layered image.
具体地,目标图像获取单元34包括二值化图像获取子单元341、连通区域获取子单元342和目标图像获取子单元343。Specifically, the target image acquisition unit 34 includes a binarized image acquisition subunit 341, a connected region acquisition subunit 342, and a target image acquisition subunit 343.
二值化图像获取子单元341,用于对分层图像进行二值化处理,获取二值化图像。The binarized image acquisition subunit 341 is configured to perform binarization processing on the layered image to acquire a binarized image.
连通区域获取子单元342,用于对二值化图像中的像素进行检测标记,获取二值化图像对应的连通区域。The connected region acquisition subunit 342 is configured to detect pixels in the binarized image and acquire a connected region corresponding to the binarized image.
目标图像获取子单元343,用于对二值化图像对应的连通区域进行腐蚀和叠加处理,获取包括手写汉字的目标图像。A target image acquisition subunit 343 is configured to perform erosion and superposition processing on the connected areas corresponding to the binary image, and obtain a target image including handwritten Chinese characters.
具体地,该手写字训练样本获取装置还包括手写字识别模型训练模块60,用于预先训练目标手写字识别模型。Specifically, the handwriting training sample acquisition device further includes a handwriting recognition model training module 60 for pre-training the target handwriting recognition model.
手写字识别模型训练模块60包括训练手写汉字图像获取单元61、训练单字体图像获取单元62和目标手写字识别模型获取单元63。The handwriting recognition model training module 60 includes a training handwritten Chinese character image obtaining unit 61, a training single font image obtaining unit 62, and a target handwriting recognition model obtaining unit 63.
训练手写汉字图像获取单元61,用于获取训练手写汉字图像。The training handwritten Chinese character image acquiring unit 61 is configured to acquire a training handwritten Chinese character image.
训练单字体图像获取单元62,用于采用垂直投影法对训练手写汉字图像进行单字体切割,获取训练单字体图像。A training single font image acquisition unit 62 is configured to perform single font cutting on a training handwritten Chinese character image by using a vertical projection method to obtain a training single font image.
目标手写字识别模型获取单元63,用于对训练单字体图像进行顺序标注,并将标注好的训练单字体图像输入到长短时记忆神经网络中进行训练,采用随机梯度下降算法对长短时记忆神经网络的网络参数进行更新,获取目标手写字识别模型。The target handwriting recognition model acquisition unit 63 is used for sequentially labeling the training single font images, and inputting the labeled training single font images into the long-term and short-term memory neural network for training. The random gradient descent algorithm is used for the long-term and short-term memory nerves. The network parameters of the network are updated to obtain the target handwriting recognition model.
具体地,目标手写字识别模型获取单元63包括激活状态神经元获取子单元631、网络输出值获取子单元632和目标识别模型获取子单元633。Specifically, the target handwriting recognition model acquisition unit 63 includes an activation state neuron acquisition subunit 631, a network output value acquisition subunit 632, and a target recognition model acquisition subunit 633.
激活状态神经元获取子单元631,用于在长短时记忆神经网络的隐藏层采用第一激活函数对单字体图像进行处理,获取携带激活状态标识的神经元。The activation state neuron acquisition subunit 631 is configured to process a single font image by using a first activation function in a hidden layer of a long-term and short-term memory neural network to acquire a neuron carrying an activation state identifier.
网络输出值获取子单元632,用于在长短时记忆神经网络的隐藏层采用第二激活函数对携带激活状态标识的神经元进行处理,获取长短时记忆神经网络隐藏层的输出值。The network output value acquisition subunit 632 is configured to process the neuron carrying the activation state identifier in the hidden layer of the long-term and short-term memory neural network to obtain the output value of the hidden layer of the long-term and short-term memory neural network.
目标识别模型获取子单元633,用于根据长短时记忆神经网络隐藏层的输出值,采用随机梯度下降算法对长短时记忆神经网络的网络参数进行更新,获取目标手写字识别模型;随机梯度下降算法的计算公式具体为
Figure PCTCN2018094222-appb-000016
Figure PCTCN2018094222-appb-000017
其中,J(θ)为损失函数,m表示选取的训练单字体图像的数量且m=1,θ j表示第j层长短时记忆神经网络的网络参数,h θ(x)表示长短时记忆神经网络隐藏层的输出值,(x i,y i)表示第i个训练单字体图像。
The target recognition model acquisition subunit 633 is used to update the network parameters of the long-term and short-term memory neural network according to the output value of the hidden layer of the long-term and short-term memory neural network to obtain the target handwriting recognition model; the random gradient descent algorithm The calculation formula is
Figure PCTCN2018094222-appb-000016
with
Figure PCTCN2018094222-appb-000017
Among them, J (θ) is a loss function, m is the number of selected training single font images and m = 1, θ j is the network parameter of the j-th layer long-term and short-term memory neural network, and h θ (x) is the long- and short-term memory neural network. The output value of the network hidden layer, (x i , y i ) represents the i-th training single font image.
关于手写汉字图像识别装置的具体限定可以参见上文中对于手写汉字图像识别方法的限定,在此不再赘述。上述手写汉字图像识别装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For specific limitations on the handwritten Chinese character image recognition device, reference may be made to the foregoing limitations on the handwritten Chinese character image recognition method, and details are not described herein again. Each module in the above-mentioned handwritten Chinese character image recognition device may be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules may be embedded in the hardware in or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图9所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于用于存储执行手写汉字图像识别方法过程中生成或获取的数据,如手写汉字。该计算机设备的网络接口用于与外部的终端通过网络连接通信。所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行时以实现一种手写汉字图像识别方法。In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 9. The computer device includes a processor, a memory, a network interface, and a database connected through a system bus. The processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for running the operating system and computer programs in a non-volatile storage medium. The database of the computer device is used for storing data generated or obtained during the execution of the handwritten Chinese character image recognition method, such as handwritten Chinese characters. The network interface of the computer device is used to communicate with an external terminal through a network connection. When the computer-readable instructions are executed by one or more processors, the one or more processors are executed to implement a handwritten Chinese character image recognition method.
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行计算机程序时实现以下步骤:获取原始图像,原始图像包括手写汉字和背景图片;对原始图像进行预处理,获取有效图像;采用核密度估计算法对有效图像进行处理,去除背景图片,获取包括手写汉字的目标图像;采用垂直投影法对目标图像进行单字体切割,获取待识别单字体图像;将待识别单字体图像输入到基于长短时记忆神经网络的目标手写字识别模型中进行识别,获取待识别单字体图像对应的手写汉字。In one embodiment, a computer device is provided, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor. When the processor executes the computer program, the following steps are performed: obtaining the original image, the original image Including handwritten Chinese characters and background pictures; pre-processing the original image to obtain valid images; using kernel density estimation algorithm to process the effective images to remove the background pictures to obtain the target image including handwritten Chinese characters; using vertical projection to separate the target image The font is cut to obtain a single font image to be recognized; the single font image to be recognized is input to a target handwriting recognition model based on long-term and short-term memory neural network for recognition, and handwritten Chinese characters corresponding to the single font image to be recognized are obtained.
在一个实施例中,处理器执行计算机程序时还实现以下步骤:对原始图像进行放大和 灰度化处理,获取灰度化图像;对灰度化图像进行标准化处理,获取有效图像,其中,标准化处理的公式为
Figure PCTCN2018094222-appb-000018
X是灰度化图像M的像素值,X′是有效图像的像素值,M min是灰度化图像M中最小的像素值,M max是灰度化图像M中最大的像素值。
In one embodiment, when the processor executes the computer program, the following steps are further implemented: the original image is enlarged and grayed out to obtain a grayed-out image; the grayed-out image is standardized to obtain a valid image, where the normalization is performed The formula for processing is
Figure PCTCN2018094222-appb-000018
X is the pixel value of the grayscale image M, X ′ is the pixel value of the effective image, M min is the smallest pixel value in the grayscale image M, and M max is the largest pixel value in the grayscale image M.
在一个实施例中,处理器执行计算机程序时还实现以下步骤:对有效图像中的像素值进行统计,获取有效图像直方图;采用高斯核密度估计算法对有效图像直方图进行处理,获取与有效图像直方图对应的至少一个频率极大值和至少一个频率极小值;基于频率极大值和频率极小值对有效图像进行分层切分处理,获取分层图像;基于分层图像,获取包括手写汉字的目标图像。In one embodiment, when the processor executes the computer program, the following steps are further implemented: counting pixel values in the effective image to obtain a valid image histogram; using a Gaussian kernel density estimation algorithm to process the effective image histogram, obtaining and validating At least one frequency maximum and at least one frequency minimum corresponding to the image histogram; perform hierarchical segmentation processing on the effective image based on the frequency maximum and frequency minimum to obtain a layered image; and based on the layered image, obtain Includes target images of handwritten Chinese characters.
在一个实施例中,处理器执行计算机程序时还实现以下步骤:对分层图像进行二值化处理,获取二值化图像;对二值化图像中的像素进行检测标记,获取二值化图像对应的连通区域;对二值化图像对应的连通区域进行腐蚀和叠加处理,获取包括手写汉字的目标图像。In an embodiment, when the processor executes the computer program, the following steps are further implemented: binarizing the layered image to obtain a binarized image; and detecting and marking pixels in the binarized image to obtain a binarized image. Corresponding connected regions; Corrosion and superposition processing is performed on the connected regions corresponding to the binary image to obtain a target image including handwritten Chinese characters.
在一个实施例中,处理器执行计算机程序时还实现以下步骤:获取训练手写汉字图像;采用垂直投影法对训练手写汉字图像进行单字体切割,获取训练单字体图像;对训练单字体图像进行顺序标注,并将标注好的训练单字体图像输入到长短时记忆神经网络中进行训练,采用随机梯度下降算法对长短时记忆神经网络的网络参数进行更新,获取目标手写字识别模型;随机梯度下降算法的计算公式具体为
Figure PCTCN2018094222-appb-000019
Figure PCTCN2018094222-appb-000020
其中,J(θ)为损失函数,m表示选取的训练单字体图像的数量且m=1,θ j表示第j层长短时记忆神经网络的网络参数,h θ(x)表示长短时记忆神经网络隐藏层的输出值,(x i,y i)表示第i个训练单字体图像。
In one embodiment, when the processor executes the computer program, the processor further implements the following steps: obtaining a training handwritten Chinese character image; using a vertical projection method to perform single font cutting on the training handwritten Chinese character image to obtain a training single font image; and performing a sequence of the training single font image Annotate and input the labeled training single font image into the long-term and short-term memory neural network for training, and use the stochastic gradient descent algorithm to update the network parameters of the long-term and short-term memory neural network to obtain the target handwriting recognition model. The calculation formula is
Figure PCTCN2018094222-appb-000019
with
Figure PCTCN2018094222-appb-000020
Among them, J (θ) is a loss function, m is the number of selected training single font images and m = 1, θ j is the network parameter of the j-th layer long-term and short-term memory neural network, and h θ (x) is the long- and short-term memory neural network. The output value of the network hidden layer, (x i , y i ) represents the i-th training single font image.
在一个实施例中,处理器执行计算机程序时还实现以下步骤:在长短时记忆神经网络的隐藏层采用第一激活函数对单字体图像进行处理,获取携带激活状态标识的神经元;在长短时记忆神经网络的隐藏层采用第二激活函数对携带激活状态标识的神经元进行处理,获取长短时记忆神经网络隐藏层的输出值;根据长短时记忆神经网络隐藏层的输出值,采用随机梯度下降算法对长短时记忆神经网络的网络参数进行更新,获取目标手写字识别模型。In one embodiment, when the processor executes the computer program, the processor further implements the following steps: processing the single-font image by using the first activation function in the hidden layer of the memory neural network in the short-term and long-term to obtain the neurons carrying the identification of the activation state; The hidden layer of the memory neural network uses a second activation function to process the neurons carrying the identification of the active state to obtain the output value of the hidden layer of the long-term and short-term memory neural network. According to the output value of the hidden layer of the long-term and short-term memory neural network, random gradient descent The algorithm updates the network parameters of the long-term and short-term memory neural network to obtain the target handwriting recognition model.
在一个实施例中,提供了一个或多个存储有计算机可读指令的非易失性可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行时实现以下步骤:获取原始图像,原始图像包括手写汉字和背景图片;对原始图像进行预处理,获取有效图像;采用核密度估计算法对有效图像进行处理,去除背景图片,获取包括手写汉字的目标图像;采用垂直投影法对目标图像进行单字体切割,获取待识别单字体图像;将待识别单字体图像输入到基于长短时记忆神经网络的目标手写字识别模型中进行识别,获取待识别单字体图像对应的手写汉字。In one embodiment, one or more non-volatile readable storage media storing computer-readable instructions are provided, wherein when the computer-readable instructions are executed by one or more processors, the The execution of one or more processors implements the following steps: obtaining the original image, which includes handwritten Chinese characters and background pictures; pre-processing the original image to obtain a valid image; using a kernel density estimation algorithm to process the valid image to remove the background Use the image to obtain the target image including handwritten Chinese characters; use the vertical projection method to perform single-font cutting on the target image to obtain the single-font image to be recognized; input the single-font image to be recognized into the target handwriting recognition model based on long-term and short-term memory neural network Perform recognition to obtain handwritten Chinese characters corresponding to the single font image to be recognized.
在一个实施例中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行时还实现以下步骤:对原始图像进行放大和灰度化处理,获取灰度化图像;对灰度化图像进行标准化处理,获取有效图像,其中,标准化处理的公式为
Figure PCTCN2018094222-appb-000021
X是灰度化图像M的像素值,X′是有效图像的像素值,M min是灰度化图像M中最小的像素值,M max是灰度化图像M中最大的像素值。
In one embodiment, when the computer-readable instructions are executed by one or more processors, the execution of the one or more processors further implements the following steps: zooming in and graying the original image, and obtaining Grayscale image; standardize the grayscale image to obtain a valid image, where the formula for the normalization process is
Figure PCTCN2018094222-appb-000021
X is the pixel value of the grayscale image M, X ′ is the pixel value of the effective image, M min is the smallest pixel value in the grayscale image M, and M max is the largest pixel value in the grayscale image M.
在一个实施例中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行时还实现以下步骤:对有效图像中的像素值进行统计,获取有效图像直方图;采用高斯核密度估计算法对有效图像直方图进行处理,获取与有效图像直方图对应的至少一个频率极大值和至少一个频率极小值;基于频率极大值和频率极小值对有效图像进行分层切分处理,获取分层图像;基于分层图像,获取包括手写汉字的目标图像。In an embodiment, when the computer-readable instructions are executed by one or more processors, the execution of the one or more processors further implements the following steps: counting pixel values in valid images to obtain valid Image histogram; Gaussian kernel density estimation algorithm is used to process the effective image histogram to obtain at least one frequency maximum and at least one frequency minimum corresponding to the effective image histogram; based on the frequency maximum and frequency minimum Perform hierarchical segmentation on the effective image to obtain a layered image; based on the layered image, obtain a target image that includes handwritten Chinese characters.
在一个实施例中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行时还实现以下步骤:对分层图像进行二值化处理,获取二值化图像;对二值化图像中的像素进行检测标记,获取二值化图像对应的连通区域;对二值化图像对应的连通区域进行腐蚀和叠加处理,获取包括手写汉字的目标图像。In one embodiment, when the computer-readable instructions are executed by one or more processors, the execution of the one or more processors further implements the following steps: binarizing the layered image to obtain two Digitized image; detect and mark pixels in the binarized image to obtain the connected area corresponding to the binarized image; etch and overlay the connected area corresponding to the binarized image to obtain the target image including handwritten Chinese characters.
在一个实施例中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行时还实现以下步骤:获取训练手写汉字图像;采用垂直投影法对训练手写汉字图像进行单字体切割,获取训练单字体图像;对训练单字体图像进行顺序标注,并将标注好的训练单字体图像输入到长短时记忆神经网络中进行训练,采用随机梯度下降算法对长短时记忆神经网络的网络参数进行更新,获取目标手写字识别模型;随机梯度下降算法的计算公式具体为
Figure PCTCN2018094222-appb-000022
Figure PCTCN2018094222-appb-000023
其中,J(θ)为损失函数,m表示选取的训练单字体图像的数量且m=1,θ j表示第j层长短时记忆神经网络的网络参数,h θ(x)表示长短时记忆神经网络隐藏层的输出值,(x i,y i)表示第i个训练单字体图像。
In one embodiment, when the computer-readable instructions are executed by one or more processors, the execution of the one or more processors further implements the following steps: acquiring training handwritten Chinese character images; using vertical projection to train Handwritten Chinese character images are cut with a single font to obtain training single font images. The training single font images are sequentially labeled, and the labeled training single font images are input to the long-term and short-term memory neural network for training. The random gradient descent algorithm The network parameters of the memory neural network are updated to obtain the target handwriting recognition model; the calculation formula of the stochastic gradient descent algorithm is specifically
Figure PCTCN2018094222-appb-000022
with
Figure PCTCN2018094222-appb-000023
Among them, J (θ) is a loss function, m is the number of selected training single font images and m = 1, θ j is the network parameter of the j-th layer long-term and short-term memory neural network, and h θ (x) is the long- and short-term memory neural network. The output value of the network hidden layer, (x i , y i ) represents the i-th training single font image.
在一个实施例中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行时还实现以下步骤:在长短时记忆神经网络的隐藏层采用第一激活函数对单字体图像进行处理,获取携带激活状态标识的神经元;在长短时记忆神经网络的隐藏层采用第二激活函数对携带激活状态标识的神经元进行处理,获取长短时记忆神经网络隐藏层的输出值;根据长短时记忆神经网络隐藏层的输出值,采用随机梯度下降算法对长短时记忆神经网络的网络参数进行更新,获取目标手写字识别模型。In one embodiment, when the computer-readable instructions are executed by one or more processors, the execution of the one or more processors further implements the following steps: a first layer is used in the hidden layer of the short-term memory neural network; The activation function processes the single font image to obtain the neurons carrying the identification of the active state; in the hidden layer of the short-term memory neural network, the second activation function is used to process the neurons carrying the identification of the active state to obtain the hidden long-term memory neural network. The output value of the layer; according to the output value of the hidden layer of the long-term and short-term memory neural network, the network parameters of the long-term and short-term memory neural network are updated by using a stochastic gradient descent algorithm to obtain a target handwriting recognition model.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those of ordinary skill in the art can understand that the implementation of all or part of the processes in the methods of the above embodiments can be completed by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage. In the medium, the computer program, when executed, may include the processes of the embodiments of the methods described above. Wherein, any reference to the storage, storage, database, or other media used in the embodiments provided in this application may include non-volatile and / or volatile storage. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功 能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。Those skilled in the art can clearly understand that, for the convenience and brevity of the description, only the above-mentioned division of functional units and modules is used as an example. In practical applications, the above functions can be assigned by different functional units, Module completion, that is, dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above.
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。The above-mentioned embodiments are only used to describe the technical solution of the present application, but not limited thereto. Although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that they can still implement the foregoing implementations. The technical solutions described in the examples are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in Within the scope of this application.

Claims (20)

  1. 一种手写汉字图像识别方法,其特征在于,包括:A method for recognizing handwritten Chinese character images, comprising:
    获取原始图像,所述原始图像包括手写汉字和背景图片;Obtaining an original image, the original image including handwritten Chinese characters and a background picture;
    对所述原始图像进行预处理,获取有效图像;Preprocessing the original image to obtain a valid image;
    采用核密度估计算法和对所述有效图像进行处理,去除所述背景图片,获取包括所述手写汉字的目标图像;Adopting a kernel density estimation algorithm and processing the effective image to remove the background image and obtain a target image including the handwritten Chinese character;
    采用垂直投影法对所述目标图像进行单字体切割,获取待识别单字体图像;Performing a single font cutting on the target image using a vertical projection method to obtain a single font image to be identified;
    将所述待识别单字体图像输入到基于长短时记忆神经网络的目标手写字识别模型中进行识别,获取待识别单字体图像对应的手写汉字。The single font image to be recognized is input to a target handwriting recognition model based on a long-term and short-term memory neural network for recognition, and handwritten Chinese characters corresponding to the single font image to be recognized are obtained.
  2. 如权利要求1所述的手写汉字图像识别方法,其特征在于,对所述原始图像进行预处理,获取有效图像,包括:The method of claim 1, wherein preprocessing the original image to obtain a valid image comprises:
    对所述原始图像进行放大和灰度化处理,获取灰度化图像;Performing enlargement and graying processing on the original image to obtain a grayed image;
    对所述灰度化图像进行标准化处理,获取所述有效图像,其中,所述标准化处理的公式为
    Figure PCTCN2018094222-appb-100001
    X是所述灰度化图像M的像素值,X′是所述有效图像的像素值,Mmin是灰度化图像M中最小的像素值,Mmax是灰度化图像M中最大的像素值。
    Performing normalization processing on the grayscale image to obtain the effective image, wherein the formula of the normalization processing is
    Figure PCTCN2018094222-appb-100001
    X is the pixel value of the grayscale image M, X ′ is the pixel value of the effective image, Mmin is the smallest pixel value in the grayscale image M, and Mmax is the largest pixel value in the grayscale image M.
  3. 如权利要求1所述的手写汉字图像识别方法,其特征在于,所述采用核密度估计算法和对所述有效图像进行处理,去除所述背景图片,获取包括所述手写汉字的目标图像,包括:The method of claim 1, wherein the kernel density estimation algorithm and the effective image are processed to remove the background image and obtain a target image including the handwritten Chinese character, comprising: :
    对所述有效图像中的像素值进行统计,获取有效图像直方图;Performing statistics on pixel values in the effective image to obtain an effective image histogram;
    采用高斯核密度估算方法对所述有效图像直方图进行处理,获取与有效图像直方图对应的至少一个频率极大值和至少一个频率极小值;Processing the effective image histogram by using a Gaussian kernel density estimation method to obtain at least one frequency maximum and at least one frequency minimum corresponding to the effective image histogram;
    基于所述频率极大值和频率极小值对所述有效图像进行分层切分处理,获取分层图像;Performing hierarchical segmentation processing on the effective image based on the frequency maximum and frequency minimum to obtain a layered image;
    基于所述分层图像,获取包括所述手写汉字的目标图像。Based on the layered image, a target image including the handwritten Chinese character is acquired.
  4. 如权利要求3所述的手写汉字图像识别方法,其特征在于,所述基于所述分层图像,获取包括所述手写汉字的目标图像,包括:The method for recognizing a handwritten Chinese character image according to claim 3, wherein the acquiring a target image including the handwritten Chinese character based on the layered image comprises:
    对所述分层图像进行二值化处理,获取二值化图像;Performing a binarization process on the layered image to obtain a binarized image;
    对所述二值化图像中的像素进行检测标记,获取所述二值化图像对应的连通区域;Detect and mark pixels in the binarized image to obtain a connected area corresponding to the binarized image;
    对所述二值化图像对应的连通区域进行腐蚀和叠加处理,获取所述包括手写汉字的目标图像。Eroding and superimposing the connected area corresponding to the binary image to obtain the target image including handwritten Chinese characters.
  5. 如权利要求1所述的手写汉字图像识别方法,其特征在于,所述手写字样本获取方法还包括:预先训练所述目标手写字识别模型;The method of claim 1, wherein the method for obtaining handwriting samples further comprises: training the target handwriting recognition model in advance;
    所述预先训练目标手写字识别模型,包括:The pre-trained target handwriting recognition model includes:
    获取训练手写汉字图像;Obtain training handwritten Chinese character images;
    采用垂直投影法对所述训练手写汉字图像进行单字体切割,获取训练单字体图像;Performing a single font cutting on the training handwritten Chinese character image by using a vertical projection method to obtain a training single font image;
    对所述训练单字体图像进行顺序标注,并将标注好的训练单字体图像输入到长短时记忆神经网络中进行训练,采用随机梯度下降算法对所述长短时记忆神经网络的网络参数进行更新,获取所述目标手写字识别模型。Sequentially labeling the training single font image, and inputting the labeled training single font image into a long-term and short-term memory neural network for training, and using a random gradient descent algorithm to update network parameters of the long-term and short-term memory neural network, Acquiring the target handwriting recognition model.
  6. 如权利要求5所述的手写汉字图像识别方法,其特征在于,所述将标注好的训练单字体图像输入到长短时记忆神经网络中进行训练,采用随机梯度下降算法对所述长短时记忆神经网络的网络参数进行更新,获取所述目标手写字识别模型,包括:The method for recognizing handwritten Chinese character images according to claim 5, wherein the labeled training single font image is input to a long-term and short-term memory neural network for training, and a random gradient descent algorithm is used for the long-term and short-term memory nerves. Updating the network parameters of the network to obtain the target handwriting recognition model includes:
    在长短时记忆神经网络的隐藏层采用第一激活函数对所述单字体图像进行处理,获取 携带激活状态标识的神经元;Processing the single-font image using a first activation function in a hidden layer of a long-term and short-term memory neural network to obtain a neuron carrying an identification of an activation state;
    在所述长短时记忆神经网络的隐藏层采用第二激活函数对所述携带激活状态标识的神经元进行处理,获取长短时记忆神经网络隐藏层的输出值;Applying a second activation function to the neuron carrying the activation state identifier in the hidden layer of the long-term and short-term memory neural network to obtain the output value of the hidden layer of the long-term and short-term memory neural network;
    根据所述长短时记忆神经网络隐藏层的输出值,采用随机梯度下降算法对所述长短时记忆神经网络的网络参数进行更新,获取所述目标手写字识别模型;所述随机梯度下降算法的计算公式具体为
    Figure PCTCN2018094222-appb-100002
    Figure PCTCN2018094222-appb-100003
    其中,J(θ)为损失函数,m表示选取的训练单字体图像的数量且m=1,θ j表示第j层所述长短时记忆神经网络的网络参数,h θ(x)表示所述长短时记忆神经网络隐藏层的输出值,(x i,y i)表示第i个所述训练单字体图像。
    According to the output value of the hidden layer of the long-term and short-term memory neural network, the network parameters of the long-term and short-term memory neural network are updated by using a random gradient descent algorithm to obtain the target handwriting recognition model; calculation of the random gradient descent algorithm The formula is specifically
    Figure PCTCN2018094222-appb-100002
    with
    Figure PCTCN2018094222-appb-100003
    Among them, J (θ) is a loss function, m is the number of selected single font font images and m = 1, θ j is the network parameter of the long-term and short-term memory neural network in the j-th layer, and h θ (x) is the The output value of the hidden layer of the long-term and short-term memory neural network, (x i , y i ) represents the i-th training single font image.
  7. 一种手写汉字图像识别装置,其特征在于,包括:A handwritten Chinese character image recognition device, comprising:
    原始图像获取模块,用于获取原始图像,所述原始图像包括手写汉字和背景图片;An original image acquisition module, configured to acquire an original image, where the original image includes handwritten Chinese characters and a background picture;
    有效图像获取模块,用于对所述原始图像进行预处理,获取有效图像;An effective image acquisition module, configured to pre-process the original image to obtain an effective image;
    目标图像获取模块,用于采用核密度估计算法和对所述有效图像进行处理,去除所述背景图片,获取包括所述手写汉字的目标图像;A target image acquisition module, configured to adopt a kernel density estimation algorithm and process the effective image, remove the background picture, and obtain a target image including the handwritten Chinese character;
    待识别单字体图像获取模块,用于采用垂直投影法对所述目标图像进行单字体切割,获取待识别单字体图像;A to-be-recognized single-font image acquisition module, configured to use a vertical projection method to perform single-font cutting on the target image to obtain the to-be-recognized single-font image;
    手写汉字获取模块,用于将所述待识别单字体图像输入到基于长短时记忆神经网络的目标手写字识别模型中进行识别,获取待识别单字体图像对应的手写汉字。A handwritten Chinese character acquisition module is configured to input the single font image to be recognized into a target handwriting recognition model based on a long-term and short-term memory neural network for recognition, and obtain a handwritten Chinese character corresponding to the single font image to be recognized.
  8. 如权利要求7所述的手写汉字图像识别装置,其特征在于,所述目标图像获取模块包括:The device for recognizing handwritten Chinese characters according to claim 7, wherein the target image acquisition module comprises:
    有效图像直方图获取单元,用于对所述有效图像中的像素值进行统计,获取有效图像直方图;An effective image histogram acquisition unit, configured to perform statistics on pixel values in the effective image to obtain an effective image histogram;
    频率极值获取单元,用于采用高斯核密度估计算法对所述有效图像直方图进行处理,获取与有效图像直方图对应的至少一个频率极大值和至少一个频率极小值;A frequency extreme value obtaining unit, configured to process the effective image histogram by using a Gaussian kernel density estimation algorithm to obtain at least one frequency maximum and at least one frequency minimum corresponding to the effective image histogram;
    分层图像获取单元,用于基于所述频率极大值和频率极小值对所述有效图像进行分层切分处理,获取分层图像;A layered image acquisition unit, configured to perform layered segmentation processing on the effective image based on the frequency maximum and frequency minimum to obtain a layered image;
    目标图像获取单元,用于基于所述分层图像,获取包括所述手写汉字的目标图像。A target image acquisition unit is configured to acquire a target image including the handwritten Chinese character based on the layered image.
  9. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如下步骤:A computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the following steps when the computer program is executed:
    获取原始图像,所述原始图像包括手写汉字和背景图片;Obtaining an original image, the original image including handwritten Chinese characters and a background picture;
    对所述原始图像进行预处理,获取有效图像;Preprocessing the original image to obtain a valid image;
    采用核密度估计算法和对所述有效图像进行处理,去除所述背景图片,获取包括所述手写汉字的目标图像;Adopting a kernel density estimation algorithm and processing the effective image to remove the background image and obtain a target image including the handwritten Chinese character;
    采用垂直投影法对所述目标图像进行单字体切割,获取待识别单字体图像;Performing a single font cutting on the target image using a vertical projection method to obtain a single font image to be identified;
    将所述待识别单字体图像输入到基于长短时记忆神经网络的目标手写字识别模型中进行识别,获取待识别单字体图像对应的手写汉字。The single font image to be recognized is input to a target handwriting recognition model based on a long-term and short-term memory neural network for recognition, and handwritten Chinese characters corresponding to the single font image to be recognized are obtained.
  10. 如权利要求9所述的计算机设备,其特征在于,对所述原始图像进行预处理,获取有效图像,包括:The computer device according to claim 9, wherein preprocessing the original image to obtain a valid image comprises:
    对所述原始图像进行放大和灰度化处理,获取灰度化图像;Performing enlargement and graying processing on the original image to obtain a grayed image;
    对所述灰度化图像进行标准化处理,获取所述有效图像,其中,所述标准化处理的公 式为
    Figure PCTCN2018094222-appb-100004
    X是所述灰度化图像M的像素值,X′是所述有效图像的像素值,Mmin是灰度化图像M中最小的像素值,Mmax是灰度化图像M中最大的像素值。
    Performing normalization processing on the grayscale image to obtain the effective image, wherein the formula of the normalization processing is
    Figure PCTCN2018094222-appb-100004
    X is the pixel value of the grayscale image M, X ′ is the pixel value of the effective image, Mmin is the smallest pixel value in the grayscale image M, and Mmax is the largest pixel value in the grayscale image M.
  11. 如权利要求9所述的计算机设备,其特征在于,所述采用核密度估计算法和对所述有效图像进行处理,去除所述背景图片,获取包括所述手写汉字的目标图像,包括:The computer device according to claim 9, wherein the adopting a kernel density estimation algorithm and processing the effective image, removing the background picture, and obtaining a target image including the handwritten Chinese character comprises:
    对所述有效图像中的像素值进行统计,获取有效图像直方图;Performing statistics on pixel values in the effective image to obtain an effective image histogram;
    采用高斯核密度估算方法对所述有效图像直方图进行处理,获取与有效图像直方图对应的至少一个频率极大值和至少一个频率极小值;Processing the effective image histogram by using a Gaussian kernel density estimation method to obtain at least one frequency maximum and at least one frequency minimum corresponding to the effective image histogram;
    基于所述频率极大值和频率极小值对所述有效图像进行分层切分处理,获取分层图像;Performing hierarchical segmentation processing on the effective image based on the frequency maximum and frequency minimum to obtain a layered image;
    基于所述分层图像,获取包括所述手写汉字的目标图像。Based on the layered image, a target image including the handwritten Chinese character is acquired.
  12. 如权利要求11所述的计算机设备,其特征在于,所述基于所述分层图像,获取包括所述手写汉字的目标图像,包括:The computer device according to claim 11, wherein the acquiring a target image including the handwritten Chinese character based on the layered image comprises:
    对所述分层图像进行二值化处理,获取二值化图像;Performing a binarization process on the layered image to obtain a binarized image;
    对所述二值化图像中的像素进行检测标记,获取所述二值化图像对应的连通区域;Detect and mark pixels in the binarized image to obtain a connected area corresponding to the binarized image;
    对所述二值化图像对应的连通区域进行腐蚀和叠加处理,获取所述包括手写汉字的目标图像。Eroding and superimposing the connected area corresponding to the binary image to obtain the target image including handwritten Chinese characters.
  13. 如权利要求9所述的计算机设备,其特征在于,所述手写字样本获取方法还包括:预先训练所述目标手写字识别模型;The computer device according to claim 9, wherein the handwriting sample acquisition method further comprises: pre-training the target handwriting recognition model;
    所述预先训练目标手写字识别模型,包括:The pre-trained target handwriting recognition model includes:
    获取训练手写汉字图像;Obtain training handwritten Chinese character images;
    采用垂直投影法对所述训练手写汉字图像进行单字体切割,获取训练单字体图像;Performing a single font cutting on the training handwritten Chinese character image by using a vertical projection method to obtain a training single font image;
    对所述训练单字体图像进行顺序标注,并将标注好的训练单字体图像输入到长短时记忆神经网络中进行训练,采用随机梯度下降算法对所述长短时记忆神经网络的网络参数进行更新,获取所述目标手写字识别模型。Sequentially labeling the training single font image, and inputting the labeled training single font image into a long-term and short-term memory neural network for training, and using a random gradient descent algorithm to update network parameters of the long-term and short-term memory neural network, Acquiring the target handwriting recognition model.
  14. 如权利要求13所述的计算机设备,其特征在于,所述将标注好的训练单字体图像输入到长短时记忆神经网络中进行训练,采用随机梯度下降算法对所述长短时记忆神经网络的网络参数进行更新,获取所述目标手写字识别模型,包括:The computer device according to claim 13, wherein the labeled training single font image is input to a long-term and short-term memory neural network for training, and a random gradient descent algorithm is used for the network of the long-term and short-term memory neural network. The parameters are updated to obtain the target handwriting recognition model, including:
    在长短时记忆神经网络的隐藏层采用第一激活函数对所述单字体图像进行处理,获取携带激活状态标识的神经元;Processing the single-font image using a first activation function in a hidden layer of the long-term and short-term memory neural network to obtain a neuron carrying an activation state identifier;
    在所述长短时记忆神经网络的隐藏层采用第二激活函数对所述携带激活状态标识的神经元进行处理,获取长短时记忆神经网络隐藏层的输出值;Applying a second activation function to the neuron carrying the activation state identifier in the hidden layer of the long-term and short-term memory neural network to obtain the output value of the hidden layer of the long-term and short-term memory neural network;
    根据所述长短时记忆神经网络隐藏层的输出值,采用随机梯度下降算法对所述长短时记忆神经网络的网络参数进行更新,获取所述目标手写字识别模型;所述随机梯度下降算法的计算公式具体为
    Figure PCTCN2018094222-appb-100005
    Figure PCTCN2018094222-appb-100006
    其中,J(θ)为损失函数,m表示选取的训练单字体图像的数量且m=1,θ j表示第j层所述长短时记忆神经网络的网络参数,h θ(x)表示所述长短时记忆神经网络隐藏层的输出值,(x i,y i)表示第i个所述训练单字体图像。
    According to the output value of the hidden layer of the long-term and short-term memory neural network, the network parameters of the long-term and short-term memory neural network are updated by using a random gradient descent algorithm to obtain the target handwriting recognition model; calculation of the random gradient descent algorithm The formula is specifically
    Figure PCTCN2018094222-appb-100005
    with
    Figure PCTCN2018094222-appb-100006
    Among them, J (θ) is a loss function, m is the number of selected single font font images and m = 1, θ j is the network parameter of the long-term and short-term memory neural network in the j-th layer, and h θ (x) is the The output value of the hidden layer of the long-term and short-term memory neural network, (x i , y i ) represents the i-th training single font image.
  15. 一个或多个存储有计算机可读指令的非易失性可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:One or more non-volatile readable storage media storing computer readable instructions, characterized in that when the computer readable instructions are executed by one or more processors, the one or more processors are caused to execute The following steps:
    获取原始图像,所述原始图像包括手写汉字和背景图片;Obtaining an original image, the original image including handwritten Chinese characters and a background picture;
    对所述原始图像进行预处理,获取有效图像;Preprocessing the original image to obtain a valid image;
    采用核密度估计算法和对所述有效图像进行处理,去除所述背景图片,获取包括所述手写汉字的目标图像;Adopting a kernel density estimation algorithm and processing the effective image to remove the background image and obtain a target image including the handwritten Chinese character;
    采用垂直投影法对所述目标图像进行单字体切割,获取待识别单字体图像;Performing a single font cutting on the target image using a vertical projection method to obtain a single font image to be identified;
    将所述待识别单字体图像输入到基于长短时记忆神经网络的目标手写字识别模型中进行识别,获取待识别单字体图像对应的手写汉字。The single font image to be recognized is input to a target handwriting recognition model based on a long-term and short-term memory neural network for recognition, and handwritten Chinese characters corresponding to the single font image to be recognized are obtained.
  16. 如权利要求15所述的非易失性可读存储介质,其特征在于,对所述原始图像进行预处理,获取有效图像,包括:The non-volatile readable storage medium of claim 15, wherein preprocessing the original image to obtain a valid image comprises:
    对所述原始图像进行放大和灰度化处理,获取灰度化图像;Performing enlargement and graying processing on the original image to obtain a grayed image;
    对所述灰度化图像进行标准化处理,获取所述有效图像,其中,所述标准化处理的公式为
    Figure PCTCN2018094222-appb-100007
    X是所述灰度化图像M的像素值,X′是所述有效图像的像素值,Mmin是灰度化图像M中最小的像素值,Mmax是灰度化图像M中最大的像素值。
    Performing normalization processing on the grayscale image to obtain the effective image, wherein the formula of the normalization processing is
    Figure PCTCN2018094222-appb-100007
    X is the pixel value of the grayscale image M, X ′ is the pixel value of the effective image, Mmin is the smallest pixel value in the grayscale image M, and Mmax is the largest pixel value in the grayscale image M.
  17. 如权利要求15所述的非易失性可读存储介质,其特征在于,所述采用核密度估计算法和对所述有效图像进行处理,去除所述背景图片,获取包括所述手写汉字的目标图像,包括:The non-volatile readable storage medium according to claim 15, wherein the kernel density estimation algorithm and the effective image are processed to remove the background picture and obtain a target including the handwritten Chinese character Images, including:
    对所述有效图像中的像素值进行统计,获取有效图像直方图;Performing statistics on pixel values in the effective image to obtain an effective image histogram;
    采用高斯核密度估算方法对所述有效图像直方图进行处理,获取与有效图像直方图对应的至少一个频率极大值和至少一个频率极小值;Processing the effective image histogram by using a Gaussian kernel density estimation method to obtain at least one frequency maximum and at least one frequency minimum corresponding to the effective image histogram;
    基于所述频率极大值和频率极小值对所述有效图像进行分层切分处理,获取分层图像;Performing hierarchical segmentation processing on the effective image based on the frequency maximum and frequency minimum to obtain a layered image;
    基于所述分层图像,获取包括所述手写汉字的目标图像。Based on the layered image, a target image including the handwritten Chinese character is acquired.
  18. 如权利要求17所述的非易失性可读存储介质,其特征在于,所述基于所述分层图像,获取包括所述手写汉字的目标图像,包括:The non-volatile readable storage medium according to claim 17, wherein the obtaining a target image including the handwritten Chinese character based on the layered image comprises:
    对所述分层图像进行二值化处理,获取二值化图像;Performing a binarization process on the layered image to obtain a binarized image;
    对所述二值化图像中的像素进行检测标记,获取所述二值化图像对应的连通区域;Detect and mark pixels in the binarized image to obtain a connected area corresponding to the binarized image;
    对所述二值化图像对应的连通区域进行腐蚀和叠加处理,获取所述包括手写汉字的目标图像。Eroding and superimposing the connected area corresponding to the binary image to obtain the target image including handwritten Chinese characters.
  19. 如权利要求15所述的非易失性可读存储介质,其特征在于,所述手写字样本获取方法还包括:预先训练所述目标手写字识别模型;The non-volatile readable storage medium according to claim 15, wherein the method for acquiring handwriting samples further comprises: pre-training the target handwriting recognition model;
    所述预先训练目标手写字识别模型,包括:The pre-trained target handwriting recognition model includes:
    获取训练手写汉字图像;Obtain training handwritten Chinese character images;
    采用垂直投影法对所述训练手写汉字图像进行单字体切割,获取训练单字体图像;Performing a single font cutting on the training handwritten Chinese character image by using a vertical projection method to obtain a training single font image;
    对所述训练单字体图像进行顺序标注,并将标注好的训练单字体图像输入到长短时记忆神经网络中进行训练,采用随机梯度下降算法对所述长短时记忆神经网络的网络参数进行更新,获取所述目标手写字识别模型。Sequentially labeling the training single font image, and inputting the labeled training single font image into a long-term and short-term memory neural network for training, and using a random gradient descent algorithm to update network parameters of the long-term and short-term memory neural network, Acquiring the target handwriting recognition model.
  20. 如权利要求19所述的非易失性可读存储介质,其特征在于,所述将标注好的训练单字体图像输入到长短时记忆神经网络中进行训练,采用随机梯度下降算法对所述长短时记忆神经网络的网络参数进行更新,获取所述目标手写字识别模型,包括:The non-volatile readable storage medium according to claim 19, wherein the labeled training single font image is input to a long-term and short-term memory neural network for training, and a random gradient descent algorithm is used for the length Updating the network parameters of the memory neural network to obtain the target handwriting recognition model includes:
    在长短时记忆神经网络的隐藏层采用第一激活函数对所述单字体图像进行处理,获取携带激活状态标识的神经元;Processing the single-font image using a first activation function in a hidden layer of the long-term and short-term memory neural network to obtain a neuron carrying an activation state identifier;
    在所述长短时记忆神经网络的隐藏层采用第二激活函数对所述携带激活状态标识的神经元进行处理,获取长短时记忆神经网络隐藏层的输出值;Applying a second activation function to the neuron carrying the activation state identifier in the hidden layer of the long-term and short-term memory neural network to obtain the output value of the hidden layer of the long-term and short-term memory neural network;
    根据所述长短时记忆神经网络隐藏层的输出值,采用随机梯度下降算法对所述长短时记忆神经网络的网络参数进行更新,获取所述目标手写字识别模型;所述随机梯度下降算法的计算公式具体为
    Figure PCTCN2018094222-appb-100008
    Figure PCTCN2018094222-appb-100009
    其中,J(θ)为损失函数,m表示选取的训练单字体图像的数量且m=1,θ j表示第j层所述长短时记忆神经网络的网络参数,h θ(x)表示所述长短时记忆神经网络隐藏层的输出值,(x i,y i)表示第i个所述训练单字体图像。
    According to the output value of the hidden layer of the long-term and short-term memory neural network, the network parameters of the long-term and short-term memory neural network are updated by using a random gradient descent algorithm to obtain the target handwriting recognition model; calculation of the random gradient descent algorithm The formula is specifically
    Figure PCTCN2018094222-appb-100008
    with
    Figure PCTCN2018094222-appb-100009
    Among them, J (θ) is a loss function, m is the number of selected single font font images and m = 1, θ j is the network parameter of the long-term and short-term memory neural network in the j-th layer, and h θ (x) is the The output value of the hidden layer of the long-term and short-term memory neural network, (x i , y i ) represents the i-th training single font image.
PCT/CN2018/094222 2018-06-04 2018-07-03 Method and apparatus for recognizing handwritten chinese character image, computer device, and storage medium WO2019232850A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810564691.6 2018-06-04
CN201810564691.6A CN109002756A (en) 2018-06-04 2018-06-04 Handwritten Chinese character image recognition methods, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2019232850A1 true WO2019232850A1 (en) 2019-12-12

Family

ID=64574205

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/094222 WO2019232850A1 (en) 2018-06-04 2018-07-03 Method and apparatus for recognizing handwritten chinese character image, computer device, and storage medium

Country Status (2)

Country Link
CN (1) CN109002756A (en)
WO (1) WO2019232850A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111160369A (en) * 2019-12-25 2020-05-15 携程旅游信息技术(上海)有限公司 Method, system, electronic device and storage medium for cracking Chinese character verification code
CN112131834A (en) * 2020-09-24 2020-12-25 云南民族大学 West wave font generation and identification method
CN112634262A (en) * 2020-12-31 2021-04-09 浙江优学智能科技有限公司 Writing quality evaluation method based on Internet
CN113343814A (en) * 2021-05-31 2021-09-03 太原理工大学 Handwritten digital image recognition method based on single-node photon reserve pool calculation

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110751024A (en) * 2019-09-06 2020-02-04 平安科技(深圳)有限公司 User identity identification method and device based on handwritten signature and terminal equipment
CN111626284B (en) * 2020-05-26 2023-10-03 广东小天才科技有限公司 Method and device for removing handwriting fonts, electronic equipment and storage medium
CN113128470B (en) * 2021-05-13 2023-04-07 北京有竹居网络技术有限公司 Stroke recognition method and device, readable medium and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103207997A (en) * 2013-04-15 2013-07-17 浙江捷尚视觉科技有限公司 Kernel density estimation-based license plate character segmentation method
CN105184292A (en) * 2015-08-26 2015-12-23 北京云江科技有限公司 Method for analyzing and recognizing structure of handwritten mathematical formula in natural scene image
CN105512692A (en) * 2015-11-30 2016-04-20 华南理工大学 BLSTM-based online handwritten mathematical expression symbol recognition method
CN106022273A (en) * 2016-05-24 2016-10-12 华东理工大学 Handwritten form identification system of BP neural network based on dynamic sample selection strategy

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103207997A (en) * 2013-04-15 2013-07-17 浙江捷尚视觉科技有限公司 Kernel density estimation-based license plate character segmentation method
CN105184292A (en) * 2015-08-26 2015-12-23 北京云江科技有限公司 Method for analyzing and recognizing structure of handwritten mathematical formula in natural scene image
CN105512692A (en) * 2015-11-30 2016-04-20 华南理工大学 BLSTM-based online handwritten mathematical expression symbol recognition method
CN106022273A (en) * 2016-05-24 2016-10-12 华东理工大学 Handwritten form identification system of BP neural network based on dynamic sample selection strategy

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111160369A (en) * 2019-12-25 2020-05-15 携程旅游信息技术(上海)有限公司 Method, system, electronic device and storage medium for cracking Chinese character verification code
CN111160369B (en) * 2019-12-25 2024-03-05 携程旅游信息技术(上海)有限公司 Method, system, electronic equipment and storage medium for cracking Chinese character verification code
CN112131834A (en) * 2020-09-24 2020-12-25 云南民族大学 West wave font generation and identification method
CN112131834B (en) * 2020-09-24 2023-12-29 云南民族大学 West wave font generating and identifying method
CN112634262A (en) * 2020-12-31 2021-04-09 浙江优学智能科技有限公司 Writing quality evaluation method based on Internet
CN113343814A (en) * 2021-05-31 2021-09-03 太原理工大学 Handwritten digital image recognition method based on single-node photon reserve pool calculation

Also Published As

Publication number Publication date
CN109002756A (en) 2018-12-14

Similar Documents

Publication Publication Date Title
WO2019232853A1 (en) Chinese model training method, chinese image recognition method, device, apparatus and medium
WO2019232843A1 (en) Handwritten model training method and apparatus, handwritten image recognition method and apparatus, and device and medium
CN108710866B (en) Chinese character model training method, chinese character recognition method, device, equipment and medium
WO2019232850A1 (en) Method and apparatus for recognizing handwritten chinese character image, computer device, and storage medium
WO2019232852A1 (en) Handwriting training sample obtaining method and apparatus, and device and medium
WO2019232872A1 (en) Handwritten character model training method, chinese character recognition method, apparatus, device, and medium
WO2019232873A1 (en) Character model training method, character recognition method, apparatuses, device and medium
JP6831480B2 (en) Text detection analysis methods, equipment and devices
WO2021027336A1 (en) Authentication method and apparatus based on seal and signature, and computer device
WO2021017260A1 (en) Multi-language text recognition method and apparatus, computer device, and storage medium
WO2019232849A1 (en) Chinese character model training method, handwritten character recognition method, apparatuses, device and medium
WO2017020723A1 (en) Character segmentation method and device and electronic device
Cheang et al. Segmentation-free vehicle license plate recognition using ConvNet-RNN
CN110647829A (en) Bill text recognition method and system
KR101896357B1 (en) Method, device and program for detecting an object
CN109086654B (en) Handwriting model training method, text recognition method, device, equipment and medium
CN104915972A (en) Image processing apparatus, image processing method and program
JP2022532177A (en) Forged face recognition methods, devices, and non-temporary computer-readable storage media
WO2019232870A1 (en) Method for acquiring handwritten character training sample, apparatus, computer device, and storage medium
CN113011357A (en) Depth fake face video positioning method based on space-time fusion
CN112926654A (en) Pre-labeling model training and certificate pre-labeling method, device, equipment and medium
CN111723815A (en) Model training method, image processing method, device, computer system, and medium
JP2019153293A (en) Text image processing using stroke-aware max-min pooling for ocr system employing artificial neural network
CN112766218A (en) Cross-domain pedestrian re-identification method and device based on asymmetric joint teaching network
Haliassos et al. Classification and detection of symbols in ancient papyri

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18921404

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 12/03/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18921404

Country of ref document: EP

Kind code of ref document: A1