WO2019232853A1 - Chinese model training method, chinese image recognition method, device, apparatus and medium - Google Patents

Chinese model training method, chinese image recognition method, device, apparatus and medium Download PDF

Info

Publication number
WO2019232853A1
WO2019232853A1 PCT/CN2018/094235 CN2018094235W WO2019232853A1 WO 2019232853 A1 WO2019232853 A1 WO 2019232853A1 CN 2018094235 W CN2018094235 W CN 2018094235W WO 2019232853 A1 WO2019232853 A1 WO 2019232853A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
chinese
neural network
training
long
Prior art date
Application number
PCT/CN2018/094235
Other languages
French (fr)
Chinese (zh)
Inventor
高梁梁
周罡
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019232853A1 publication Critical patent/WO2019232853A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/24Character recognition characterised by the processing or recognition method
    • G06V30/242Division of the character sequences into groups prior to recognition; Selection of dictionaries
    • G06V30/244Division of the character sequences into groups prior to recognition; Selection of dictionaries using graphical properties, e.g. alphabet type or font
    • G06V30/2455Discrimination between machine-print, hand-print and cursive writing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs

Definitions

  • the present application relates to the field of image recognition, and in particular, to a Chinese model training, a Chinese image recognition method, a device, a device, and a medium.
  • a Chinese model training method includes:
  • the original handwriting recognition model is tested using the trained handwritten Chinese images in the test set, and the target handwriting recognition model is obtained when the test accuracy is greater than a preset accuracy rate.
  • a Chinese model training device includes:
  • Training handwritten Chinese image acquisition module for acquiring training handwritten Chinese images
  • a training handwritten Chinese image division module configured to divide the trained handwritten Chinese image into a training set and a test set according to a preset ratio
  • the original handwriting recognition model acquisition module is used to sequentially label the trained handwritten Chinese images in the training set, and input the labeled trained handwritten Chinese images into a convolutional neural network-long-term short-term memory neural network for training.
  • the time series classification algorithm updates the network parameters of the convolutional neural network-long-term and short-term memory neural network to obtain the original handwriting recognition model;
  • a target handwriting recognition model acquisition module is used to test the original handwriting recognition model using the trained handwritten Chinese images in the test set, and obtain a target handwriting recognition model when the test accuracy rate is greater than a preset accuracy rate.
  • a computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, the following steps are implemented:
  • the original handwriting recognition model is tested using the trained handwritten Chinese images in the test set, and the target handwriting recognition model is obtained when the test accuracy is greater than a preset accuracy rate.
  • a non-volatile storage medium stores a computer program.
  • the computer program is executed by a processor, the following steps are implemented:
  • the original handwriting recognition model is tested using the trained handwritten Chinese images in the test set, and the target handwriting recognition model is obtained when the test accuracy is greater than a preset accuracy rate.
  • a Chinese image recognition method includes:
  • the text area to be recognized is input into a target handwriting recognition model for recognition, and handwritten Chinese characters corresponding to each of the text area to be recognized are obtained; wherein the target handwriting recognition model is obtained by using the Chinese model training method.
  • a Chinese image recognition device includes:
  • a to-be-recognized Chinese image acquisition module configured to obtain the to-be-recognized Chinese image, wherein the to-be-recognized Chinese image includes handwritten Chinese characters and a background picture;
  • An original image acquisition module configured to pre-process the Chinese image to be identified to obtain an original image
  • a target image acquisition module configured to process the original image by using a kernel density estimation algorithm, remove a background picture, and obtain a target image including the handwritten Chinese character;
  • a to-be-recognized text area acquisition module configured to use the text positioning technology to perform text positioning on the target image to obtain the to-be-recognized text area;
  • a handwritten Chinese character acquisition module is configured to input a text area to be recognized into a target handwriting recognition model for recognition, and obtain a handwritten Chinese character corresponding to each of the text area to be recognized; wherein the target handwriting recognition model adopts the Chinese model Obtained by the training method.
  • a computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, the following steps are implemented:
  • the text area to be recognized is input into a target handwriting recognition model for recognition, and handwritten Chinese characters corresponding to each of the text area to be recognized are obtained; wherein the target handwriting recognition model is obtained by using the Chinese model training method.
  • One or more non-volatile readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the following steps:
  • the text area to be recognized is input to a target handwriting recognition model for recognition, and handwritten Chinese characters corresponding to each of the text area to be recognized are obtained; wherein the target handwriting recognition model is obtained by using the Chinese model training method.
  • FIG. 1 is an application scenario diagram of a Chinese model training method or a Chinese image recognition method in an embodiment of the present application
  • FIG. 2 is a flowchart of a Chinese model training method according to an embodiment of the present application.
  • FIG. 3 is a specific flowchart of step S13 in FIG. 2;
  • FIG. 4 is a schematic diagram of a Chinese model training device according to an embodiment of the present application.
  • FIG. 5 is a flowchart of a Chinese image recognition method according to an embodiment of the present application.
  • step S22 in FIG. 5 is a specific flowchart of step S22 in FIG. 5;
  • FIG. 7 is a specific flowchart of step S23 in FIG. 5;
  • step S234 in FIG. 7 is a specific flowchart of step S234 in FIG. 7;
  • FIG. 9 is a schematic diagram of a Chinese image recognition device according to an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a computer device according to an embodiment of the present application.
  • the Chinese model training method provided in the embodiment of the present application can be applied in the application environment shown in FIG. 1.
  • the application environment of the Chinese model training method includes a server and a computer device, wherein the computer device communicates with the server through a network, and the computer device is a device that can interact with the user, including, but not limited to, a computer, a smart phone, and a tablet device.
  • the Chinese model training method provided in the embodiment of the present application is applied to a server.
  • a Chinese model training method is provided.
  • the method is applied to the server in FIG. 1 as an example, and includes the following steps:
  • the training handwritten Chinese image is a sample image collected from an open source library for model training in advance.
  • the training handwritten Chinese image includes N (N is a positive integer) handwriting samples corresponding to each Chinese in the Chinese secondary word library.
  • the Chinese secondary character library is a very useful Chinese character library that is coded in the order of radical strokes of Chinese characters. Specifically, N handwriting samples handwritten by different people in the open source library are collected to enable the server to obtain training handwritten Chinese images. Because different users have different writing habits, N handwriting samples (that is, training handwritten Chinese images) are used for Training greatly improves the generalization of the model.
  • the training set is a learning sample data set, which is to establish a classifier by matching some parameters, that is, training the machine learning model using the target training text data in the training set to determine the parameters of the machine learning model.
  • a test set is used to test the discrimination capabilities of a trained machine learning model, such as accuracy.
  • the preset ratio is a preset ratio for dividing the training handwritten Chinese image.
  • the training handwritten Chinese image can be divided according to a ratio of 9: 1, that is, 90% of the training handwritten Chinese image can be used as the training set, and the remaining 10% of the training handwritten Chinese image can be used as the test set.
  • S13 Annotate the training handwritten Chinese images in the training set in sequence, and input the labeled trained handwritten Chinese images into the convolutional neural network-long-term and short-term memory neural network for training, and use a time-series classification algorithm for the convolutional neural network-length
  • the network parameters of the memory neural network are updated to obtain the original handwriting recognition model.
  • the original handwriting recognition model is a model obtained through multiple iterations of long-term and short-term memory neural networks.
  • Long-short-term memory neural (LSTM) network is a kind of time recursive neural network, which is suitable for processing and predicting important events with time series, and the time series interval and delay are relatively long.
  • Convolutional neural network (CNN) is a locally connected network. Compared with a fully connected network, its biggest feature is local connectivity and weight sharing. For a certain pixel p in an image, the closer the pixel p to the pixel p is, the more influence it has, that is, the greater the local connectivity.
  • the weight of a certain area can also be used for another area, that is, the weight sharing.
  • Weight sharing can be understood as convolution kernel sharing.
  • a convolutional neural network CNN
  • CTC Connectionist Temporal Classification
  • the server performs labeling according to the chronological order of the training handwritten Chinese images, and inputs the labeled training handwritten Chinese images into a convolutional neural network-long-term short-term memory neural network for training to obtain the original handwriting recognition model.
  • each training handwritten Chinese image is arranged in order.
  • the training handwritten Chinese image is "I am very happy today"
  • each training handwritten Chinese image can be labeled with Arabic numerals from left to right, that is, "Today (1) days (2) very (3) open (4) heart (5)”, so that the training handwritten Chinese image has timeliness, so that the original handwriting recognition model can be trained in connection with the context and improve the accuracy of the model .
  • (1), (2), (3), (4), and (5) are sequential tags.
  • Long-term short-term memory neural network has three layers of network structure: input layer, hidden layer and output layer.
  • the input layer is the first layer of the long-term and short-term memory neural network, which is used to receive external signals, that is, it is responsible for receiving training handwritten Chinese images.
  • the output layer is the last layer of the long-term and short-term memory neural network, which is used to output signals to the outside world, that is, it is responsible for outputting the calculation results of the long-term and short-term memory neural network.
  • Hidden layers are layers other than the input layer and the output layer of the long-term and short-term memory neural network. They are used to process the Chinese image features extracted by the convolutional neural network to obtain the calculation results of the long-term and short-term memory neural network. Understandably, using long-short-term memory neural network for model training increases the timeliness of training handwritten Chinese images, so as to train the training handwritten Chinese images according to the context, thereby improving the accuracy of the target handwriting recognition model.
  • step S13 the training handwritten Chinese images in the training set are sequentially labeled, and the labeled training handwritten Chinese images are input to a convolutional neural network-long-term and short-term memory neural network.
  • the training is performed in time series, and the time series classification algorithm is used to update the network parameters of the convolutional neural network-long and short-term memory neural network to obtain the original handwriting recognition model.
  • the specific steps include the following steps:
  • S131 Perform feature extraction on the trained handwritten Chinese image in a convolutional neural network to obtain Chinese image features.
  • Chinese image features are image features corresponding to the training handwritten Chinese image obtained by extracting the features of the training handwritten Chinese image using a convolutional neural network.
  • the convolutional neural network model includes a convolutional layer and a pooling layer.
  • the trained handwritten Chinese image is input into the convolutional neural network model for training.
  • the output of the convolutional layer of each layer is obtained through the calculation of the convolutional layer of each layer.
  • the maximum pooling downsampling is used to reduce the output of the convolution layer in the pooling layer.
  • pool refers to the downsampling calculation.
  • the downsampling calculation can choose the maximum pooling method.
  • the maximum pooling is actually taking the maximum value in the m * m sample.
  • the Chinese image feature carries an order label, and the order label of the Chinese image feature is consistent with the order label of the training handwritten Chinese image corresponding to the Chinese image feature.
  • the first activation function is used to process the features of the Chinese image to obtain the neurons carrying the identification of the activation state.
  • each neuron in the hidden layer of the long-term and short-term memory neural network includes three gates, which are an input gate, a forgetting gate, and an output gate, respectively.
  • the forget gate determines the past information to be discarded in the neuron.
  • the input gate determines the information to be added to the neuron.
  • the output gate determines the information to be output in the neuron.
  • the first activation function is a function for activating a neuron state.
  • the state of the neuron determines the information discarded, added, and output by each gate (ie, input gate, forget gate, and output gate).
  • the activation status flag includes a pass flag and a fail flag.
  • the identifiers corresponding to the input gate, the forget gate, and the output gate in this embodiment are i, f, and o, respectively.
  • the Sigmoid (S-shaped growth curve) function is specifically selected as the first activation function.
  • the Sigmoid function is a S-shaped function common in biology. In information science, due to its single increase and inverse function single increase In other properties, the Sigmoid function is often used as a threshold function for neural networks, mapping variables between 0-1. The calculation formula for its activation function is Among them, z represents the output value of the forget gate.
  • the forgetting gate includes a forgetting threshold.
  • a neuron carrying an activation state identifier as a pass identifier is obtained.
  • F t represents the forgetting threshold (that is, the activation state)
  • W f represents the weight matrix of the forgetting gate
  • b f represents the weight bias term of the forgetting gate
  • h t-1 represents the output of the neuron at the previous moment
  • x t represents Input data (ie Chinese image features) at time t, where t is the current time and t-1 is the previous time.
  • the forgetting gate also includes the forgetting threshold. Calculating the Chinese image features through the calculating formula of the forgetting gate will obtain a 0-1 interval scalar. This scalar determines the past information received by the neuron based on the comprehensive judgment of the current state and the past state. Proportion to achieve data reduction, reduce the amount of calculation, and improve training efficiency.
  • a second activation function is used to process the neuron carrying the identification of the activation state to obtain the output of the long-term and short-term memory neural network output layer.
  • the second activation function is used to carry the activation state identifier to perform calculation through the identified neurons to obtain the output of the hidden layer.
  • a tanh (hyperbolic tangent) function is used as the activation function of the input gate (ie, the second activation function).
  • Non-linear factors can be added to make the trained target handwriting recognition model Able to solve more complex problems.
  • the activation function tanh (hyperbolic tangent) has the advantage of fast convergence speed, which can save training time and improve training efficiency.
  • the output of the input gate is calculated by a calculation formula of the input gate.
  • the input gate further includes a calculation formula input threshold
  • W i is the weight of input gates value matrix
  • i t represents the input threshold
  • b i represents the bias term of the input gate
  • calculating the Chinese image features through the calculation formula of the input gate will obtain a 0-1 interval scalar (that is, the input threshold).
  • This scalar controls the neuron according to the current state Comprehensively judge with the past state the proportion of the current information received, that is, the proportion of the newly input information, to reduce the amount of calculation and improve the training efficiency.
  • the calculation formula of the state of the neuron is adopted.
  • W i represents the weight calculation unit state weight matrix
  • i t represents the input threshold
  • b i represents a bias term input gates
  • b c indicates cell state to the right input gates value matrix
  • W c Bias term Represents the state of the neuron at the last moment
  • C t represents the state of the neuron at time t.
  • the target output is the output of the long-term and short-term memory neural network output layer
  • a is the forward output of the long- and short-term memory neural network hidden layer
  • b is the backward output of the long- and short-term memory neural network hidden layer.
  • the forward output of the hidden layer of the long-term and short-term memory neural network refers to the probability of the Chinese image features corresponding to the u-th order labels output by the hidden layer of the long-term and short-term memory neural network in time sequence.
  • Backward output refers to the probability of Chinese image features corresponding to the u-th order label output by the hidden layer of the memory neural network in reverse order in time. For example, “I'm in a good mood today" assuming that the Chinese image corresponding to the u-th sequential label feature is "day”, and the output of the hidden layer of the memory neural network at time t-1 is "today". The output of the hidden layer of the neural network is "now” and the length of the short-term memory at time t.
  • the input of the neural network input layer is "day”.
  • the output of the hidden layer of the memory time at time t may be calculated.
  • the forward output of the hidden layer of the long-term memory neural network refers to the probability that the output of the hidden layer of the long-term memory neural network at time t is" day ".
  • the output of the hidden layer of the short-term memory neural network at time t + 1 is "heart"
  • the output of the hidden layer of the short-term memory neural network at time t + 1 and the input of the input layer of the short-term memory neural network at time t “ “Day” calculates the output of the hidden layer of the memory neural network at time t
  • the output of the hidden layer of the memory neural network at time t may include "day, Yamato and wood”
  • the backward output of the hidden layer of the long-term memory neural network Refers to the probability of "day” output at time t.
  • a time series classification algorithm is used to update the network parameters of the convolutional neural network and the long- and short-term memory neural network to obtain a target handwriting recognition model.
  • the network parameters of the convolutional neural network-long-short-term memory neural network are weights and biases.
  • First, according to the forward output formula of the hidden layer of the long-term memory neural network Calculate the forward output of the Chinese image features corresponding to the u-th order label at time t in the hidden layer of the memory neural network. among them, Indicates the probability that the output is a space at time t, a (t-1, i) indicates the forward output of the i-th Chinese image feature at time t-1, and l 'indicates the number of sequential labels.
  • Formula for backward output of hidden layer of long-term short-term memory neural network Calculate the backward output of the Chinese image feature corresponding to the uth order label at time t in the hidden layer of the memory neural network. Represents the probability of output as a space at time (t + 1), and a (t + 1, i) represents the backward output of the Chinese image feature corresponding to the ith sequence label at time t + 1 in the hidden layer of the memory neural network. Spaces indicate white space characters in the output layer of the memory neural network output layer.
  • a loss function is constructed by using a formula of a time-series classification algorithm according to the output of the output layer of the long-term and short-term memory neural network.
  • x) a (t, u) b (t, u), where , P (z
  • the forward output of the hidden layer of the memory neural network, and b (t, u) represents the backward output of the Chinese image feature corresponding to the uth sequential label at the t-th time in the hidden layer of the memory neural network.
  • the original handwriting recognition model is obtained by updating the network parameters in the short-term memory neural network and the convolutional neural network by obtaining a partial derivative of E loss .
  • the formula for finding partial derivatives is ⁇ is a network parameter, specifically weights and biases in a network of a convolutional neural network and a long-short-term memory neural network.
  • step S14 all the training handwritten Chinese images in the test set are input to the original handwriting recognition model for testing, and the test accuracy rate is obtained (that is, the number of accurate prediction results is divided by the number of all training handwritten Chinese images in the training set). Then judge whether the test accuracy rate is greater than the preset accuracy rate.
  • test accuracy rate is greater than the preset accuracy rate, the original handwriting recognition model is deemed to be more accurate, and the original handwriting recognition model is used as the target handwriting recognition model; otherwise, If the test accuracy rate is not greater than the preset accuracy rate, it is determined that the prediction result of the original handwriting recognition model is not accurate enough, and it is still necessary to use steps S11-S13 for training, and then test again until the test accuracy rate reaches the preset accuracy rate. , Stop training, and further improve the accuracy of the target handwriting recognition model.
  • the training handwritten Chinese image is first obtained, and the training handwritten Chinese image is divided into a training set and a test set according to a preset ratio, so that the training handwritten Chinese image in the training set is labeled sequentially so that the training handwritten Chinese image has Timing.
  • the labeled handwritten Chinese image is input to the convolutional neural network-long and short-term memory neural network for training.
  • the convolutional neural network-long and short-term memory neural network trains the handwritten Chinese according to the context.
  • Image training using time series classification algorithm to update the network parameters of convolutional neural network-long-term short-term memory neural network, to obtain the original handwriting recognition model, solve the time series problem of uncertain alignment relationship between input features and output labels, Realize end-to-end output and improve the generalization of the original handwriting recognition model.
  • the original handwriting recognition model is tested using the training handwritten Chinese images in the test set.
  • the test accuracy is greater than the preset accuracy rate, the target handwriting recognition model is obtained, which further improves the accuracy of the target handwriting recognition model.
  • a Chinese model training device is provided, and the Chinese model training device corresponds to the Chinese model training method in the above embodiment one-to-one.
  • the Chinese model training device includes a training handwritten Chinese image acquisition module 11, a training handwritten Chinese image division module 12, an original handwriting recognition model acquisition module 13 and a target handwriting recognition model acquisition module 14, each functional module is detailed described as follows:
  • a training handwritten Chinese image acquisition module 11 is configured to acquire a training handwritten Chinese image.
  • the training handwritten Chinese image division module 12 is configured to divide the training handwritten Chinese image into a training set and a test set according to a preset ratio.
  • the original handwriting recognition model acquisition module 13 is used for sequentially labeling the training handwritten Chinese images in the training set, and inputting the labeled training handwritten Chinese images into the convolutional neural network-long and short-term memory neural network for training, using time series
  • the classification algorithm updates the network parameters of the convolutional neural network-long-term short-term memory neural network to obtain the original handwriting recognition model.
  • the original handwriting recognition model acquisition module 13 includes a Chinese image feature acquisition unit 131, an activation state neuron acquisition unit 132, an output layer output acquisition unit 133, and a target recognition model acquisition unit 134.
  • the Chinese image feature acquiring unit 131 is configured to perform feature extraction on a trained handwritten Chinese image in a convolutional neural network to acquire Chinese image features.
  • the activation state neuron acquisition unit 132 is configured to process a Chinese image feature using a first activation function in a hidden layer of a long-term and short-term memory neural network to acquire a neuron carrying an activation state identifier.
  • the output layer output obtaining unit 133 is configured to process the neuron carrying the activation state identifier in the hidden layer of the long-term and short-term memory neural network to obtain the output of the output layer of the long-term and short-term memory neural network.
  • the target recognition model acquisition unit 134 is configured to update the network parameters of the convolutional neural network-long-term and short-term memory neural network by using a time-series classification algorithm according to the output of the long-term and short-term memory neural network output layer to obtain a target handwriting recognition model.
  • the target handwriting recognition model acquisition module 14 is used to test the original handwriting recognition model using the training handwritten Chinese images in the test set. When the test accuracy is greater than a preset accuracy rate, the target handwriting recognition model is obtained.
  • x) a (t, u) b (t, u), Among them, p (z
  • b (t, u) represents the Chinese image features corresponding to the u-th order label at time t in the hidden layer of the long-term and short-term memory neural network.
  • Each module in the aforementioned Chinese model training device may be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the hardware in or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 10.
  • the computer device includes a processor, a memory, a network interface, and a database connected through a system bus.
  • the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, a computer program, and a database.
  • the internal memory provides an environment for running the operating system and computer programs in a non-volatile storage medium.
  • the database of the computer equipment is used to store data generated or obtained during the execution of the Chinese model training method, such as a target handwriting recognition model.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer program is executed by a processor to implement a Chinese model training method.
  • a computer device including a memory, a processor, and a computer program stored on the memory and executable on the processor.
  • the processor executes the computer program, the following steps are performed: acquiring a training handwritten Chinese image; Divide the training handwritten Chinese image into a training set and a test set according to a preset ratio; annotate the training handwritten Chinese image in the training set in sequence, and input the labeled training handwritten Chinese image to the convolutional neural network-long-term memory neural network
  • the training was performed in time series, and the time series classification algorithm was used to update the network parameters of the convolutional neural network and long-term short-term memory neural network to obtain the original handwriting recognition model.
  • the original handwriting recognition model was tested using the training handwritten Chinese images in the test set. When the test accuracy is greater than a preset accuracy, a target handwriting recognition model is obtained.
  • the processor executes the computer program
  • the following steps are further implemented: feature extraction of the trained handwritten Chinese image in the convolutional neural network to obtain the Chinese image features; and the first activation of the hidden layer of the long-term memory neural network using the first activation
  • the function processes the Chinese image features to obtain the neurons carrying the activation state identifier.
  • the second activation function is used to process the neurons carrying the activation state identifier to obtain the long-term and short-term memory neural network output layer.
  • a time series classification algorithm is used to update the network parameters of the convolutional neural network-long-and-short-term memory neural network to obtain the target handwriting recognition model.
  • x) a (t, u) b (t, u), Among them, p (z
  • the forward output of the image feature in the hidden layer of the long-term memory neural network, b (t, u) represents the backward output of the Chinese image feature corresponding to the u-th order label at the t-th time in the hidden layer of the short-term memory neural network.
  • one or more non-volatile readable storage media storing computer-readable instructions are provided, and when the computer-readable instructions are executed by one or more processors, the one or more Each processor executes the following steps: obtaining a training handwritten Chinese image; dividing the training handwritten Chinese image into a training set and a test set according to a preset ratio; sequentially labeling the training handwritten Chinese image in the training set, and labeling the labeled training handwritten Chinese image
  • the image is input to the convolutional neural network-long and short-term memory neural network for training, and the time series classification algorithm is used to update the network parameters of the convolutional neural network-long and short-term memory neural network to obtain the original handwriting recognition model; training in the test set is used
  • the handwritten Chinese image is used to test the original handwriting recognition model. When the test accuracy is greater than a preset accuracy rate, the target handwriting recognition model is obtained.
  • the execution of the one or more processors further implements the following steps: performing training on a handwritten Chinese image in a convolutional neural network Feature extraction to obtain Chinese image features; the first activation function is used to process the Chinese image features in the hidden layer of the long-term and short-term memory neural network to obtain the neurons carrying the activation status identifier; the second layer is used in the hidden layer of the long-term and short-term memory neural network.
  • the activation function processes the neurons carrying the identification of the activation state to obtain the output of the long-term and short-term memory neural network output layer; according to the output of the long-term and short-term memory neural network output layer, a time-series classification algorithm is used for the convolutional neural network-long-and-short-term memory neural network The network parameters are updated to obtain the target handwriting recognition model.
  • x) a (t, u) b (t, u), Among them, p (z
  • the forward output of the image feature in the hidden layer of the long-term memory neural network, b (t, u) represents the backward output of the Chinese image feature corresponding to the u-th order label at the t-th time in the hidden layer of the short-term memory neural network.
  • a Chinese image recognition method is provided.
  • the method is applied to the server in FIG. 1 as an example, and includes the following steps:
  • S21 Acquire a Chinese image to be identified.
  • the Chinese image to be identified includes handwritten Chinese characters and background pictures.
  • the Chinese image to be identified is an unprocessed image containing handwritten Chinese characters collected by a collection module on a computer device.
  • the Chinese image to be recognized includes handwritten Chinese characters and background pictures.
  • the background picture is a noise picture other than handwritten Chinese characters in the Chinese image to be identified. Noise pictures are pictures that interfere with handwritten Chinese characters.
  • the user can collect the Chinese image to be recognized containing handwritten Chinese characters and upload it to the server through the acquisition module on the computer device, so that the server acquires the Chinese image to be recognized.
  • the acquisition module includes but is not limited to camera shooting and local upload.
  • the original image is an image obtained by pre-processing the Chinese image to be identified and excluding interference factors.
  • the Chinese image to be identified may contain multiple interference factors, such as numerous colors, it is not conducive to subsequent identification. Therefore, the Chinese image to be identified needs to be pre-processed to obtain the original image that excludes interference factors.
  • the original image can be understood as the image obtained after the background image is excluded from the Chinese image to be identified.
  • step S22 the Chinese image to be recognized is pre-processed to obtain the original image, which specifically includes the following steps:
  • S221 Enlarge and grayscale the Chinese image to be recognized to obtain a grayscale image.
  • the grayscale image is a grayscale image obtained after the Chinese image to be recognized is enlarged and grayscale processed.
  • the grayed image includes a matrix of pixel values.
  • the pixel value matrix refers to a matrix containing pixel values corresponding to each pixel in a Chinese image to be identified.
  • the server uses the imread function to read the pixel value of each pixel in the Chinese image to be identified, and performs enlargement and grayscale processing on the Chinese image to be identified to obtain a grayscale image.
  • the imread function is a function in computer language for reading pixel values in an image file.
  • the pixel value is a value assigned by the computer when the original image is digitized.
  • the Chinese image to be identified may contain multiple colors, and the color itself is very susceptible to factors such as light. There are many changes in the color of similar objects, so it is difficult for the color itself to provide key information. Therefore, it is necessary to grayscale the Chinese image to be identified. Processing to eliminate interference, reduce the complexity of the image and the amount of information processing. However, if the size of the handwritten Chinese characters in the Chinese image to be recognized is small, if the grayscale processing is directly performed, the thickness of the strokes of the handwritten Chinese characters will be too small and will be excluded as interference items.
  • the server enlarges the original image according to the following formula: x ⁇ x r , where x represents an element in the matrix M, r is the number of times, and the changed element x r replaces x in the pixel value matrix M.
  • the graying process is a process for rendering the Chinese image to be recognized to have a clear black and white effect.
  • performing grayscale processing on the enlarged image includes: the color of each pixel in the Chinese image to be identified is determined by three components of R (red), G (green), and B (blue), and Each component has 256 values from 0 to 255 (0 is the darkest, and 255 is the brightest, white).
  • the grayscale image is a special color image with the same three components of R, G, and B.
  • the server can directly use the imread function to read the Chinese image to be identified, and the specific values of the three components of R, G, and B corresponding to each pixel in the grayscale image can be obtained.
  • the standardization process refers to a process of performing a standard transformation process on a grayscale image to transform it into a fixed standard form. Specifically, because the pixel values of each pixel in the grayscale image are scattered, the magnitude of the data is not uniform, which will affect the accuracy of subsequent model recognition. Therefore, the grayscale image needs to be standardized to uniformize the magnitude of the data. .
  • the server standardizes the grayscale image by using a formula for normalization processing to avoid the problem that the pixel values in the grayscale image are scattered and the order of data is not uniform.
  • the standardization formula is X is the pixel value of the grayed image M
  • X ′ is the pixel value of the original image
  • M min is the smallest pixel value in the grayed image M
  • M max is the largest pixel value in the grayed image M.
  • S23 Use the kernel density estimation algorithm to process the original image, remove the background image, and obtain a target image including handwritten Chinese characters.
  • the kernel density estimation algorithm is a non-parametric method that studies the data distribution characteristics from the data sample itself to estimate the probability density function.
  • the target image refers to an image that contains only handwritten Chinese characters by processing the original image using a kernel density estimation algorithm.
  • the server uses a kernel density estimation algorithm to process the original image to eliminate background image interference and obtain a target image including handwritten Chinese characters.
  • K (.) Is the kernel function
  • h is the pixel value range
  • x is the pixel value of the pixel whose probability density is to be estimated
  • x i is the i-th pixel value in the h range
  • n is the pixel value x in the h range.
  • step S23 the original image is processed by using a kernel density estimation algorithm to remove the background image to obtain a target image including handwritten Chinese characters, which specifically includes the following steps:
  • S231 Perform statistics on pixel values in the original image to obtain a histogram of the original image.
  • the original image histogram is a histogram obtained by statistically calculating pixel values in the original image.
  • Histogram is a kind of statistical report diagram that represents the distribution of data by a series of vertical stripes or line segments of varying heights.
  • the horizontal axis of the histogram of the original image represents pixel values
  • the vertical axis represents the appearance frequency corresponding to the pixel values.
  • the server obtains the histogram of the original image by counting the pixel values in the original image, so that it can intuitively see the distribution of the pixel values in the original image, and provides technical support for subsequent Gaussian kernel density estimation algorithms.
  • the original image histogram is processed by using a Gaussian kernel density estimation algorithm to obtain at least one frequency maximum and at least one frequency minimum corresponding to the original image histogram.
  • the Gaussian kernel density estimation algorithm refers to a kernel density estimation method in which the kernel function is a Gaussian kernel function.
  • the formula of the Gaussian kernel function is Among them, K (x) refers to a Gaussian kernel function in which pixels (independent variables) are x, x refers to a pixel value in an effective image, and e and ⁇ are constants.
  • Frequency maxima refer to the maxima at different frequency intervals in the frequency distribution histogram.
  • the frequency minimum value refers to the minimum value corresponding to the frequency maximum value in the same frequency interval in the frequency distribution histogram.
  • a Gaussian kernel density function estimation method is used to perform Gaussian smoothing on the frequency distribution histogram corresponding to the original image, and obtain a Gaussian smooth curve corresponding to the frequency distribution histogram. Based on the frequency maxima and frequency minima on the Gaussian smooth curve, obtain the pixel values on the horizontal axis corresponding to the frequency maxima and frequency minima in order to subsequently based on the obtained frequency maxima and frequency minima Corresponding pixel values are convenient for hierarchical segmentation of the original image to obtain a layered image.
  • S233 Perform hierarchical segmentation processing on the original image based on the frequency maximum and frequency minimum to obtain a layered image.
  • the layered image is an image obtained by performing layered segmentation processing on the original image based on the maximum and minimum values.
  • the server first obtains the pixel values corresponding to the maximum frequency value and the minimum frequency value, and processes the original image according to the pixel values corresponding to the maximum frequency value. How many frequency maximum values are in the original image, the corresponding original image The number of pixel values is divided into classes; then the pixel value corresponding to the minimum frequency value is used as the boundary value between the classes, and the original image is layered according to the class and the boundary between the classes to obtain the layering image.
  • the pixel values corresponding to the frequency maximum in the original image are 11, 53, 95, 116, and 158, and the pixel values corresponding to the minimum frequency are 21, 63, 105, and 135, respectively.
  • the number of frequency maxima in the original image it can be determined that the pixel values of the original image can be divided into 5 categories, the original image can be divided into 5 layers, and the pixel values corresponding to the frequency minima are used as the Boundary value, because the minimum pixel value is 0 and the maximum pixel value is 255.
  • a layered image with a pixel value of 11 can be determined, and the pixel value corresponding to the layered image is [ 0,21); a layered image with a pixel value of 53 and the corresponding pixel value is [21,63); a layered image with a pixel value of 95 and the corresponding pixel value is [ 63,105); a layered image with a pixel value of 116 and the corresponding pixel value is [105,135); a layered image with a pixel value of 158 and the corresponding layer value is [135,255].
  • S234 Obtain a target image including handwritten Chinese characters based on the layered image.
  • the server After obtaining the layered image, the server performs binarization, erosion, and superposition processing on the layered image to obtain a target image including handwritten Chinese characters.
  • the binarization process refers to a process in which the pixel value of a pixel on a layered image is set to 0 (black) or 1 (white), and the entire layered image presents an obvious black and white effect.
  • the binarized layered image is corroded to remove the background image part and retain the handwritten Chinese characters on the layered image. Because the pixel values on each layered image are pixel values belonging to different ranges, after the layered image is corroded, each layered image needs to be superimposed to generate a target image containing only handwritten Chinese characters.
  • the superimposing process refers to a process of superimposing a layered image with only a handwritten portion into an image, thereby achieving the purpose of obtaining a target image containing only handwritten Chinese characters.
  • the layered image is superimposed using the imadd function to obtain a target image containing only handwritten Chinese characters.
  • the imadd function is a function in computer language for superimposing layered images.
  • step S234 that is, based on the layered image, obtaining a target image including handwritten Chinese characters, specifically includes the following steps:
  • a binarized image refers to an image obtained by binarizing a sub-image. Specifically, after the server obtains the layered image, it compares the sampled pixel value of the layered image with a preselected threshold, and sets the pixel value greater than or equal to the threshold to 1 and the pixel value less than the threshold to 0. process.
  • the sampled pixel value is the pixel value corresponding to each pixel point in the layered image.
  • the size of the threshold value will affect the effect of the binarization process of the layered image. When the threshold value is selected properly, the effect of the binarization process on the layered image is better; when the threshold value is not selected properly, the effect of the binarization process of the layered image will be affected. effect.
  • the threshold in this embodiment is determined by the developer based on experience. Binarize the layered image to facilitate subsequent corrosion treatment.
  • S2342 Detect pixels in the binarized image to obtain a connected area corresponding to the binarized image.
  • the connected area refers to an area surrounded by adjacent pixels around a specific pixel.
  • a connected region means that the neighboring pixels around it are all 0, and a specific pixel and the neighboring pixel are 1, for example, a particular pixel is 0, and the surrounding neighboring pixels are 1, and the neighboring pixels are surrounded.
  • the resulting area is used as the connected area.
  • the binarized image corresponds to a pixel matrix, which includes rows and columns.
  • Detecting pixels in a binarized image specifically includes the following processes: (1) Scan the pixel matrix line by line, group consecutive white pixels in each line into a sequence called a cluster, and note its starting point, End point and line number.
  • the etching process is an operation for removing the content of a part of an image in morphology.
  • the built-in imerode function is used to etch the connected areas of the binary image.
  • etching the connected region corresponding to the binarized image includes the following steps: First, an n ⁇ n structural element is selected. In this embodiment, the value of 8 elements adjacent to each element in the pixel matrix is used as The connected region of this element is, therefore, the selected structural element is a 3 ⁇ 3 pixel matrix.
  • the structural element is an n ⁇ n pixel matrix, where the matrix elements include 0 or 1.
  • the binarized image is filtered based on the preset anti-corrosion capability range of the hand-written region. Partial deletion of the binary image that is not within the anti-corrosion capability of the hand-written region is obtained to obtain the anti-corrosion capability of the hand-written region in the binary image Within the range.
  • the target pixel image containing only handwritten Chinese characters can be obtained by superimposing the pixel matrix corresponding to each binarized image portion that fits the range of the corrosion resistance of the handwritten area.
  • the anti-corrosion ability of the hand-written area can adopt the formula: Calculated, s 1 represents the total area after being corroded in the binarized image, s 2 represents the total area before being corroded in the binarized image, and p is the corrosion resistance of the handwritten area.
  • the preset anti-corrosion range of the handwriting area is [0.01, 0.5], according to the formula Calculate the ratio p between the total area of each binarized image and the total area before the binarized image.
  • the ratio p of the total area after erosion to the total area before erosion in the binarized image which is not in the range of the anti-corrosion capability of the handwritten area, it means that the binarized image of the area is a background image instead of Write by hand and need to be etched to remove the background image.
  • the ratio p of the total area after erosion to the total area before erosion in the binarized image is in the range of [0.01, 0.5], it means that the binarized image of the region is a handwritten Chinese character and needs to be retained.
  • the pixel matrix corresponding to the retained binary image is superimposed to obtain a target image containing handwritten Chinese characters.
  • the binarized image is binarized to obtain a binarized image, and then pixels in the binarized image are detected and labeled to obtain a connected area corresponding to the binarized image.
  • the elements in the identical pixel matrix all become 0, the binarized image with element 0 is black, and the black part is the corroded part of the binarized image.
  • the total area of the binarized image is calculated by calculating And the ratio of the total area of the binarized image before being eroded, to determine whether the ratio is within the preset anti-corrosion range of the handwriting area, in order to remove the background image in each layered image, retain the handwritten Chinese characters, and finally replace each A layered image is superimposed to achieve the purpose of obtaining the target image.
  • S24 Use text positioning technology to perform text positioning on the target image to obtain the text area to be recognized.
  • the text region to be recognized refers to a region in the target image that contains only text. Since the target image also includes a non-Chinese character area, that is, an eroded part of the target image, in order to make the recognition result more accurate and save the recognition time of the model, it is necessary to perform text positioning on the target image.
  • Text positioning technology includes, but is not limited to, text positioning using OCR technology and ctpn network (Connectionist Text Proposal Network, text detection network). Among them, the ctpn network is a commonly used network for image text detection.
  • OCR Optical Character Recognition, Optical Character Recognition
  • OCR Optical Character Recognition
  • step S2342 First use the proximity search method from the connected areas obtained in step S2342 to randomly select one connected area as the starting connected area, and calculate the remaining connected area (other connected areas except the actual area) and the starting connected area.
  • the selected connected area whose area distance is less than a preset threshold is selected as the target connected area in order to determine the direction of the expansion operation (ie, up, down, left, and right).
  • the preset threshold is a preset threshold used to determine a distance between two connected regions.
  • Proximity search method refers to starting from a starting connected area, which can find the horizontal circumscribed rectangle of the starting connected area, and expand the connected area to the entire rectangle.
  • the expansion operation is performed on this rectangle, and the expansion direction is the method of the direction of the nearest neighboring area.
  • the expansion operation is performed only when the expansion direction is horizontal.
  • the formula for calculating the area distance is S is the initial connected region, S 'is the remaining connected region, and (x c , y c ) is the center vector difference between the two connected regions. Since the distance between the two connected regions is calculated according to the neighboring boundary, it needs to be subtracted.
  • Region length (x c ', y c '), where, (w ', z') represents the coordinate point of the lower right corner of the remaining connected area, (x ', y') represents the coordinate point of the upper left corner of the remaining connected area, (w, z) represents the coordinate point of the lower right corner of the initial connected region, and (x, y) represents the coordinate point of the upper left corner of the initial connected region. In this embodiment, this point is used as the origin coordinate.
  • the dilation process is an erosion process and is a process for expanding an image in morphology.
  • the built-in imdilate function is used to corrode the connected areas of the binary image.
  • the process of expanding the initial connected region includes the following steps: selecting an n ⁇ n structural element, in this embodiment, the value of 8 elements adjacent to each element in the pixel matrix is used as the connected region of the element. Therefore, the selected structural element is a 3 ⁇ 3 pixel matrix.
  • the structure element is an n ⁇ n pixel matrix, where the matrix elements include 0 or 1.
  • the connected area is scanned according to the direction of the target connected area, and the structure element is connected to the connected area covered by the structure element in the direction of the target connected area.
  • the logical AND operation remains unchanged if the results are all 0; if it is not all 0, the pixel matrix covered by the structural elements is changed to 1, and the part that becomes 1 is the expanded part of the initial connected region.
  • S25 Input the text area to be recognized into the target handwriting recognition model for recognition, and obtain handwritten Chinese characters corresponding to each text area to be recognized.
  • the target handwriting recognition model is obtained by using a Chinese model training method.
  • the server inputs the text area to be recognized into the target handwriting recognition model for recognition, so that the target handwriting recognition model can contact the context for recognition, obtain handwritten Chinese characters corresponding to each text area to be recognized, and improve the accuracy of recognition.
  • the user can collect the Chinese image to be recognized containing handwritten Chinese characters and upload it to the server through the acquisition module on the computer device, so that the server acquires the Chinese image to be recognized. Then, the server preprocesses the Chinese image to be recognized, and obtains the original image that excludes interference factors. Kernel density estimation algorithm is used to process the original image, remove the background image, and obtain the target image containing only handwritten Chinese characters to further eliminate interference. The text positioning technology is used to locate the text in the target image and obtain the text area to be recognized to eliminate interference from non-Chinese characters. The server inputs the text area to be recognized into the target handwriting recognition model for recognition, so that the target handwriting recognition model can contact the context for recognition, obtain the handwritten Chinese characters corresponding to each text area to be recognized, and improve the accuracy of recognition.
  • a Chinese image recognition device is provided, and the Chinese image recognition device corresponds to the Chinese image recognition method in the embodiment described above in a one-to-one manner.
  • the Chinese image recognition device includes a Chinese image acquisition module 21 to be identified, an original image acquisition module 22, a target image acquisition module 23, a text region acquisition module 24 and a handwritten Chinese character acquisition module 25.
  • the detailed description of each function module is as follows:
  • the to-be-recognized Chinese image acquisition module 21 is configured to obtain the to-be-recognized Chinese image, and the to-be-recognized Chinese image includes handwritten Chinese characters and background pictures.
  • the original image acquisition module 22 is configured to preprocess the Chinese image to be recognized to obtain an original image.
  • a target image acquisition module 23 is configured to process the original image by using a kernel density estimation algorithm, remove the background picture, and obtain a target image including handwritten Chinese characters.
  • the text region to be recognized acquisition module 24 is configured to perform text positioning on the target image by using text positioning technology to acquire the text region to be recognized.
  • a handwritten Chinese character acquisition module 25 is configured to input a text area to be recognized into a target handwriting recognition model for recognition, and obtain a handwritten Chinese character corresponding to each text area to be recognized.
  • the target handwriting recognition model is obtained by using the Chinese model training method in the foregoing embodiment.
  • the original image acquisition module 22 includes a grayscale image acquisition unit 221 and an original image acquisition unit 222.
  • a grayscale image acquisition unit 221 is configured to perform enlargement and grayscale processing on an original image to obtain a grayscale image.
  • the original image obtaining unit 222 is configured to perform normalization processing on the grayscale image to obtain the original image.
  • the formula of the normalization processing is: X is the pixel value of the grayed image M, X ′ is the pixel value of the original image, M min is the smallest pixel value in the grayed image M, and M max is the largest pixel value in the grayed image M.
  • the target image acquisition module 23 includes an original image histogram acquisition unit 231, a frequency extreme value acquisition unit 232, a layered image acquisition unit 233, and a target image acquisition unit 234.
  • the original image histogram obtaining unit 231 is configured to perform statistics on pixel values in the original image to obtain a histogram of the original image.
  • a frequency extreme value acquisition unit 232 is configured to process a histogram of the original image by using a Gaussian kernel density estimation algorithm, and obtain at least one frequency maximum value and at least one frequency extreme value acquisition unit corresponding to the histogram of the original image. Small value.
  • a layered image acquisition unit 233 is configured to perform layered segmentation processing on the original image based on the frequency maximum and frequency minimum to obtain a layered image.
  • the target image acquisition unit 234 is configured to acquire a target image including a handwritten Chinese character based on the layered image.
  • the target image acquisition unit 234 includes a binarized image acquisition subunit 2341, a connected region acquisition subunit 2342, and a target image acquisition subunit 2343.
  • a binarized image acquisition subunit 2341 is configured to perform binarization processing on the layered image to obtain a binarized image.
  • the connected region acquisition subunit 2342 is configured to detect pixels in the binarized image and obtain a connected region corresponding to the binarized image.
  • a target image acquisition subunit 2343 is configured to perform erosion and superposition processing on the connected areas corresponding to the binary image, and acquire a target image including handwritten Chinese characters.
  • Each module in the above-mentioned Chinese image recognition device may be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the hardware in or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 10.
  • the computer device includes a processor, a memory, a network interface, and a database connected through a system bus.
  • the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, a computer program, and a database.
  • the internal memory provides an environment for running the operating system and computer programs in a non-volatile storage medium.
  • the database of the computer equipment is used to store data generated or obtained during the execution of the Chinese model training method or the Chinese image recognition method, such as the target handwriting recognition model or handwritten Chinese characters.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer program is executed by a processor to implement a Chinese image recognition method.
  • a computer device including a memory, a processor, and a computer program stored on the memory and executable on the processor.
  • the processor executes the computer program, the following steps are performed: acquiring a Chinese image to be identified, The Chinese image to be recognized includes handwritten Chinese characters and background pictures; the Chinese image to be recognized is pre-processed to obtain the original image; the original image is processed by using the kernel density estimation algorithm to remove the background image to obtain the target image including the handwritten Chinese character; using text positioning technology Perform text positioning on the target image to obtain the text area to be recognized; input the text area to be recognized into the target handwriting recognition model for recognition, and obtain handwritten Chinese characters corresponding to each text area to be recognized; among them, the target handwriting recognition model uses Chinese model training method.
  • the processor executes the computer program, the following steps are further implemented: the pixel values in the original image are counted to obtain the original image histogram; the Gaussian kernel density estimation method is used to process the original image histogram to obtain the original image At least one frequency maximum and at least one frequency minimum corresponding to the image histogram; perform hierarchical segmentation processing on the original image based on the frequency maximum and frequency minimum to obtain a layered image; and based on the layered image, obtain Includes target images of handwritten Chinese characters.
  • the processor executes the computer program, the following steps are further implemented: binarizing the layered image to obtain the binarized image; detecting pixels in the binarized image to obtain a kernel density estimation algorithm Connected area corresponding to the binarized image; corroding and superimposing the connected area corresponding to the binarized image to obtain a target image including handwritten Chinese characters.
  • one or more non-volatile readable storage media storing computer-readable instructions are provided, and when the computer-readable instructions are executed by one or more processors, the one or more The processors perform the following steps: obtaining a Chinese image to be recognized, which includes handwritten Chinese characters and background pictures; preprocessing the Chinese image to be recognized to obtain the original image; processing the original image using a kernel density estimation algorithm to remove the background image To obtain a target image including handwritten Chinese characters; use text positioning technology to perform text positioning on the target image to obtain the text area to be recognized; input the text area to be recognized into the target handwriting recognition model for recognition, and obtain the correspondence of each text area to be recognized Handwritten Chinese characters; of which, the target handwriting recognition model is obtained using Chinese model training methods.
  • the execution of the one or more processors further implements the following steps: performing statistics on pixel values in the original image to obtain the original Image histogram; Gaussian kernel density estimation method is used to process the original image histogram to obtain at least one frequency maximum and at least one frequency minimum corresponding to the original image histogram; based on the frequency maximum and frequency minimum
  • the original image is subjected to layered segmentation processing to obtain a layered image; based on the layered image, a target image including handwritten Chinese characters is obtained.
  • the execution of the one or more processors further implements the following steps: binarizing the layered image to obtain two Digitized image; detect and mark the pixels in the binarized image to obtain the connected area corresponding to the kernel density estimation algorithm binarized image; etch and overlay the connected area corresponding to the binarized image to obtain handwritten Chinese characters The target image.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Character Discrimination (AREA)

Abstract

Disclosed are a Chinese model training method, a Chinese image recognition method, a device, an apparatus and a medium. The Chinese model training method comprises: obtaining Chinese images for training handwriting (S11); dividing, according to a preset proportion, the Chinese images for training handwriting into a training set and a test set (S12); labeling the Chinese images for training handwriting in the training set in sequence, and inputting the labeled Chinese images for training handwriting into a convolutional neural network-long and short time memory neural network; using the timing classification algorithm to update network parameters of the convolutional neural network-long and short time memory neural network to obtain a primitive handwriting recognition model (S13); and testing the primitive handwriting recognition model by using the Chinese images for training handwriting in the test set, and when the test accuracy is higher than a preset accuracy, obtaining a target handwriting recognition model (S14). The Chinese model training method has the advantages of high training efficiency and high recognition precision.

Description

中文模型训练、中文图像识别方法、装置、设备及介质Chinese model training, Chinese image recognition method, device, equipment and medium
本专利申请以2018年6月4日提交的申请号为201810563508.0,名称为“中文模型训练、中文图像识别方法、装置、设备及介质”的中国发明专利申请为基础,并要求其优先权。This patent application is based on a Chinese invention patent application filed on June 4, 2018 with the application number 201810563508.0 and entitled "Chinese Model Training, Chinese Image Recognition Method, Device, Equipment, and Medium", and claims priority.
技术领域Technical field
本申请涉及图像识别领域,尤其涉及一种中文模型训练、中文图像识别方法、装置、设备及介质。The present application relates to the field of image recognition, and in particular, to a Chinese model training, a Chinese image recognition method, a device, a device, and a medium.
背景技术Background technique
随着信息时代的发展,人工智能技术作为核心技术越来越多的被用来解决人们生活中的具体问题。目前,在对手写汉字图像进行识别时,由于传统的卷积神经网络或者循环神经网络的输出是固定长度的,并不能满足端到端的手写字识别,需要预先对训练图片中的文字进行定位分割,获取单个字体图像,再对单个字体图像进行训练,训练效率低。With the development of the information age, artificial intelligence technology is increasingly used as a core technology to solve specific problems in people's lives. At present, when recognizing handwritten Chinese character images, because the output of traditional convolutional neural networks or recurrent neural networks is a fixed length, it cannot meet the end-to-end handwriting recognition, and the positioning and segmentation of the text in the training pictures are required in advance. , Obtain a single font image, and then train a single font image, the training efficiency is low.
发明内容Summary of the Invention
基于此,有必要针对上述技术问题,提供一种解决目前手写字识别模型的训练效率低的中文模型训练方法、装置、设备及介质。Based on this, it is necessary to provide a Chinese model training method, device, device, and medium that solve the current technical problems of handwriting recognition models with low training efficiency in response to the above technical problems.
一种中文模型训练方法,包括:A Chinese model training method includes:
获取训练手写中文图像;Obtain training handwritten Chinese images;
将所述训练手写中文图像按预设比例划分成训练集和测试集;Dividing the training handwritten Chinese image into a training set and a test set according to a preset ratio;
对所述训练集中的训练手写中文图像进行顺序标注,并将标注好的训练手写中文图像输入到卷积神经网络-长短时记忆神经网络中进行训练,采用时序分类算法对所述卷积神经网络-长短时记忆神经网络的网络参数进行更新,获取原始手写字识别模型;Sequentially label the trained handwritten Chinese images in the training set, and input the labeled trained handwritten Chinese images into a convolutional neural network-long and short-term memory neural network for training, and adopt a time-series classification algorithm to the convolutional neural network -The network parameters of the long-term and short-term memory neural network are updated to obtain the original handwriting recognition model;
采用所述测试集中的训练手写中文图像对所述原始手写字识别模型进行测试,在测试准确率大于预设准确率时,获取目标手写字识别模型。The original handwriting recognition model is tested using the trained handwritten Chinese images in the test set, and the target handwriting recognition model is obtained when the test accuracy is greater than a preset accuracy rate.
一种中文模型训练装置,包括:A Chinese model training device includes:
训练手写中文图像获取模块,用于获取训练手写中文图像;Training handwritten Chinese image acquisition module for acquiring training handwritten Chinese images;
训练手写中文图像划分模块,用于将所述训练手写中文图像按预设比例划分成训练集和测试集;A training handwritten Chinese image division module, configured to divide the trained handwritten Chinese image into a training set and a test set according to a preset ratio;
原始手写字识别模型获取模块,用于对所述训练集中的训练手写中文图像进行顺序标注,并将标注好的训练手写中文图像输入到卷积神经网络-长短时记忆神经网络中进行训练,采用时序分类算法对所述卷积神经网络-长短时记忆神经网络的网络参数进行更新,获取原始手写字识别模型;The original handwriting recognition model acquisition module is used to sequentially label the trained handwritten Chinese images in the training set, and input the labeled trained handwritten Chinese images into a convolutional neural network-long-term short-term memory neural network for training. The time series classification algorithm updates the network parameters of the convolutional neural network-long-term and short-term memory neural network to obtain the original handwriting recognition model;
目标手写字识别模型获取模块,用于采用所述测试集中的训练手写中文图像对所述原始手写字识别模型进行测试,在测试准确率大于预设准确率时,获取目标手写字识别模型。A target handwriting recognition model acquisition module is used to test the original handwriting recognition model using the trained handwritten Chinese images in the test set, and obtain a target handwriting recognition model when the test accuracy rate is greater than a preset accuracy rate.
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如下步骤:A computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, the following steps are implemented:
获取训练手写中文图像;Obtain training handwritten Chinese images;
将所述训练手写中文图像按预设比例划分成训练集和测试集;Dividing the training handwritten Chinese image into a training set and a test set according to a preset ratio;
对所述训练集中的训练手写中文图像进行顺序标注,并将标注好的训练手写中文图像输入到卷积神经网络-长短时记忆神经网络中进行训练,采用时序分类算法对所述卷积神经网络-长短时记忆神经网络的网络参数进行更新,获取原始手写字识别模型;Sequentially label the trained handwritten Chinese images in the training set, and input the labeled trained handwritten Chinese images into a convolutional neural network-long and short-term memory neural network for training, and adopt a time-series classification algorithm to the convolutional neural network -The network parameters of the long-term and short-term memory neural network are updated to obtain the original handwriting recognition model;
采用所述测试集中的训练手写中文图像对所述原始手写字识别模型进行测试,在测试准确率大于预设准确率时,获取目标手写字识别模型。The original handwriting recognition model is tested using the trained handwritten Chinese images in the test set, and the target handwriting recognition model is obtained when the test accuracy is greater than a preset accuracy rate.
一种非易失性存储介质,所述非易失性存储介质存储有计算机程序,所述计算机程序被处理器执 行时实现如下步骤:A non-volatile storage medium stores a computer program. When the computer program is executed by a processor, the following steps are implemented:
获取训练手写中文图像;Obtain training handwritten Chinese images;
将所述训练手写中文图像按预设比例划分成训练集和测试集;Dividing the training handwritten Chinese image into a training set and a test set according to a preset ratio;
对所述训练集中的训练手写中文图像进行顺序标注,并将标注好的训练手写中文图像输入到卷积神经网络-长短时记忆神经网络中进行训练,采用时序分类算法对所述卷积神经网络-长短时记忆神经网络的网络参数进行更新,获取原始手写字识别模型;Sequentially label the trained handwritten Chinese images in the training set, and input the labeled trained handwritten Chinese images into a convolutional neural network-long and short-term memory neural network for training, and adopt a time-series classification algorithm to the convolutional neural network -The network parameters of the long-term and short-term memory neural network are updated to obtain the original handwriting recognition model;
采用所述测试集中的训练手写中文图像对所述原始手写字识别模型进行测试,在测试准确率大于预设准确率时,获取目标手写字识别模型。The original handwriting recognition model is tested using the trained handwritten Chinese images in the test set, and the target handwriting recognition model is obtained when the test accuracy is greater than a preset accuracy rate.
基于此,有必要针对上述技术问题,提供一种解决目前手写字识别不能端到端输出的中文图像识别方法、装置、设备及介质。Based on this, it is necessary to provide a method, a device, a device and a medium for recognizing Chinese images that cannot be output end-to-end by handwriting recognition in view of the above technical problems.
一种中文图像识别方法,包括:A Chinese image recognition method includes:
获取待识别中文图像,所述待识别中文图像包括手写汉字和背景图片;Obtaining a Chinese image to be identified, where the Chinese image to be identified includes handwritten Chinese characters and background pictures;
对所述待识别中文图像进行预处理,获取原始图像;Preprocessing the Chinese image to be identified to obtain an original image;
采用核密度估计算法对所述原始图像进行处理,去除所述背景图片,获取包括所述手写汉字的目标图像;Processing the original image using a kernel density estimation algorithm, removing the background picture, and obtaining a target image including the handwritten Chinese character;
采用文字定位技术对所述目标图像进行文字定位,获取待识别文字区域;Text positioning the target image using text positioning technology to obtain the text area to be recognized;
将待识别文字区域输入到目标手写字识别模型中进行识别,获取每一所述待识别文字区域对应的手写汉字;其中,目标手写字识别模型是采用所述中文模型训练方法获取的。The text area to be recognized is input into a target handwriting recognition model for recognition, and handwritten Chinese characters corresponding to each of the text area to be recognized are obtained; wherein the target handwriting recognition model is obtained by using the Chinese model training method.
一种中文图像识别装置,包括:A Chinese image recognition device includes:
待识别中文图像获取模块,用于获取待识别中文图像,所述待识别中文图像包括手写汉字和背景图片;A to-be-recognized Chinese image acquisition module, configured to obtain the to-be-recognized Chinese image, wherein the to-be-recognized Chinese image includes handwritten Chinese characters and a background picture;
原始图像获取模块,用于对所述待识别中文图像进行预处理,获取原始图像;An original image acquisition module, configured to pre-process the Chinese image to be identified to obtain an original image;
目标图像获取模块,用于采用核密度估计算法对所述原始图像进行处理,去除背景图片,获取包括所述手写汉字的目标图像;A target image acquisition module, configured to process the original image by using a kernel density estimation algorithm, remove a background picture, and obtain a target image including the handwritten Chinese character;
待识别文字区域获取模块,用于采用文字定位技术对所述目标图像进行文字定位,获取待识别文字区域;A to-be-recognized text area acquisition module, configured to use the text positioning technology to perform text positioning on the target image to obtain the to-be-recognized text area;
手写汉字获取模块,用于将待识别文字区域输入到目标手写字识别模型中进行识别,获取每一所述待识别文字区域对应的手写汉字;其中,目标手写字识别模型是采用所述中文模型训练方法获取的。A handwritten Chinese character acquisition module is configured to input a text area to be recognized into a target handwriting recognition model for recognition, and obtain a handwritten Chinese character corresponding to each of the text area to be recognized; wherein the target handwriting recognition model adopts the Chinese model Obtained by the training method.
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如下步骤:A computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, the following steps are implemented:
获取待识别中文图像,所述待识别中文图像包括手写汉字和背景图片;Obtaining a Chinese image to be identified, where the Chinese image to be identified includes handwritten Chinese characters and background pictures;
对所述待识别中文图像进行预处理,获取原始图像;Preprocessing the Chinese image to be identified to obtain an original image;
采用核密度估计算法对所述原始图像进行处理,去除所述背景图片,获取包括所述手写汉字的目标图像;Processing the original image using a kernel density estimation algorithm, removing the background picture, and obtaining a target image including the handwritten Chinese character;
采用文字定位技术对所述目标图像进行文字定位,获取待识别文字区域;Text positioning the target image using text positioning technology to obtain the text area to be recognized;
将待识别文字区域输入到目标手写字识别模型中进行识别,获取每一所述待识别文字区域对应的手写汉字;其中,目标手写字识别模型是采用所述中文模型训练方法获取的。The text area to be recognized is input into a target handwriting recognition model for recognition, and handwritten Chinese characters corresponding to each of the text area to be recognized are obtained; wherein the target handwriting recognition model is obtained by using the Chinese model training method.
一个或多个存储有计算机可读指令的非易失性可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:One or more non-volatile readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the following steps:
获取待识别中文图像,所述待识别中文图像包括手写汉字和背景图片;Obtaining a Chinese image to be identified, where the Chinese image to be identified includes handwritten Chinese characters and background pictures;
对所述待识别中文图像进行预处理,获取原始图像;Preprocessing the Chinese image to be identified to obtain an original image;
采用核密度估计算法对所述原始图像进行处理,去除所述背景图片,获取包括所述手写汉字的目标图像;Processing the original image using a kernel density estimation algorithm, removing the background picture, and obtaining a target image including the handwritten Chinese character;
采用文字定位技术对所述目标图像进行文字定位,获取待识别文字区域;Text positioning the target image using text positioning technology to obtain the text area to be recognized;
将待识别文字区域输入到目标手写字识别模型中进行识别,获取每一所述待识别文字区域对应的 手写汉字;其中,目标手写字识别模型是采用所述中文模型训练方法获取的。The text area to be recognized is input to a target handwriting recognition model for recognition, and handwritten Chinese characters corresponding to each of the text area to be recognized are obtained; wherein the target handwriting recognition model is obtained by using the Chinese model training method.
本申请的一个或多个实施例的细节在下面的附图及描述中提出。本申请的其他特征和优点将从说明书、附图以及权利要求书变得明显。Details of one or more embodiments of the present application are set forth in the accompanying drawings and description below. Other features and advantages of the application will become apparent from the description, the drawings, and the claims.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions of the embodiments of the present application more clearly, the drawings used in the description of the embodiments of the application will be briefly introduced below. Obviously, the drawings in the following description are just some embodiments of the application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without paying creative labor.
图1是本申请一实施例中中文模型训练方法或中文图像识别方法的一应用场景图;FIG. 1 is an application scenario diagram of a Chinese model training method or a Chinese image recognition method in an embodiment of the present application;
图2是本申请一实施例中中文模型训练方法的一流程图;2 is a flowchart of a Chinese model training method according to an embodiment of the present application;
图3是图2中步骤S13的一具体流程图;FIG. 3 is a specific flowchart of step S13 in FIG. 2;
图4是本申请一实施例中中文模型训练装置的一示意图;4 is a schematic diagram of a Chinese model training device according to an embodiment of the present application;
图5是本申请一实施例中中文图像识别方法的一流程图;5 is a flowchart of a Chinese image recognition method according to an embodiment of the present application;
图6是图5中步骤S22的一具体流程图;6 is a specific flowchart of step S22 in FIG. 5;
图7是图5中步骤S23的一具体流程图;FIG. 7 is a specific flowchart of step S23 in FIG. 5;
图8是图7中步骤S234的一具体流程图;8 is a specific flowchart of step S234 in FIG. 7;
图9是本申请一实施例中中文图像识别装置的一示意图;9 is a schematic diagram of a Chinese image recognition device according to an embodiment of the present application;
图10是本申请一实施例中计算机设备的一示意图。FIG. 10 is a schematic diagram of a computer device according to an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In the following, the technical solutions in the embodiments of the present application will be clearly and completely described with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of this application.
本申请实施例提供的中文模型训练方法,可应用在如图1的应用环境中。该中文模型训练方法的应用环境包括服务器和计算机设备,其中,计算机设备通过网络与服务器进行通信,计算机设备是可与用户进行人机交互的设备,包括但不限于电脑、智能手机和平板等设备。本申请实施例提供的中文模型训练方法应用于服务器。The Chinese model training method provided in the embodiment of the present application can be applied in the application environment shown in FIG. 1. The application environment of the Chinese model training method includes a server and a computer device, wherein the computer device communicates with the server through a network, and the computer device is a device that can interact with the user, including, but not limited to, a computer, a smart phone, and a tablet device. . The Chinese model training method provided in the embodiment of the present application is applied to a server.
在一实施例中,如图2所示,提供一种中文模型训练方法,以该方法应用在图1中的服务器为例进行说明,包括如下步骤:In an embodiment, as shown in FIG. 2, a Chinese model training method is provided. The method is applied to the server in FIG. 1 as an example, and includes the following steps:
S11:获取训练手写中文图像。S11: Obtain training handwritten Chinese images.
其中,训练手写中文图像是预先从开源库中采集的用于进行模型训练的样本图像。该训练手写中文图像包括中文二级字库中每一中文对应的N(N为正整数)张手写字样本。中文二级字库是按汉字的部首笔划顺序编码的非常用汉字库。具体地,采集开源库中的不同人手写的N张手写字样本,以使服务器获取训练手写中文图像,由于不同用户的书写习惯不同,因此采用N张手写字样本(即训练手写中文图像)进行训练,极大的提高了模型的泛化性。The training handwritten Chinese image is a sample image collected from an open source library for model training in advance. The training handwritten Chinese image includes N (N is a positive integer) handwriting samples corresponding to each Chinese in the Chinese secondary word library. The Chinese secondary character library is a very useful Chinese character library that is coded in the order of radical strokes of Chinese characters. Specifically, N handwriting samples handwritten by different people in the open source library are collected to enable the server to obtain training handwritten Chinese images. Because different users have different writing habits, N handwriting samples (that is, training handwritten Chinese images) are used for Training greatly improves the generalization of the model.
S12:将训练手写中文图像按预设比例划分成训练集和测试集。S12: Divide the training handwritten Chinese image into a training set and a test set according to a preset ratio.
其中,训练集(training set)是学习样本数据集,是通过匹配一些参数来建立分类器,即采用训练集中的目标训练文本数据来训练机器学习模型,以确定机器学习模型的参数。测试集(test set)是用于测试训练好的机器学习模型的分辨能力,如准确率。预设比例是预先设置的用于对训练手写中文图像进行划分的比例。本实施例中,可按照9:1的比例对训练手写中文图像进行划分,即可将90%的训练手写中文图像作为训练集,剩余10%的训练手写中文图像作为测试集。Among them, the training set is a learning sample data set, which is to establish a classifier by matching some parameters, that is, training the machine learning model using the target training text data in the training set to determine the parameters of the machine learning model. A test set is used to test the discrimination capabilities of a trained machine learning model, such as accuracy. The preset ratio is a preset ratio for dividing the training handwritten Chinese image. In this embodiment, the training handwritten Chinese image can be divided according to a ratio of 9: 1, that is, 90% of the training handwritten Chinese image can be used as the training set, and the remaining 10% of the training handwritten Chinese image can be used as the test set.
S13:对训练集中的训练手写中文图像进行顺序标注,并将标注好的训练手写中文图像输入到卷积神经网络-长短时记忆神经网络中进行训练,采用时序分类算法对卷积神经网络-长短时记忆神经网络的网络参数进行更新,获取原始手写字识别模型。S13: Annotate the training handwritten Chinese images in the training set in sequence, and input the labeled trained handwritten Chinese images into the convolutional neural network-long-term and short-term memory neural network for training, and use a time-series classification algorithm for the convolutional neural network-length The network parameters of the memory neural network are updated to obtain the original handwriting recognition model.
其中,原始手写字识别模型是经过长短时记忆神经网络多次迭代所得到的模型。长短时记忆神经(long-short term memory,简称LSTM)网络是一种时间递归神经网络,适合于处理和预测具有时间 序列,且时间序列间隔和延迟相对较长的重要事件。卷积神经网络(Convolutional Neural Network,CNN))是局部连接网络,相对于全连接网络其最大的特点就是局部连接性和权值共享性。对于一副图像中的某个像素p来说,离像素p越近的像素对其影响也就越大,即局部连接性越大。另外,根据自然图像的统计特性,某个区域的权值也可以用于另一个区域,即权值共享性。权值共享可以理解为卷积核共享,在卷积神经网络(CNN)中,将一个卷积核对给定的图像做卷积运算就可以提取一种中文图像特征,不同的卷积核可以提取不同的中文图像特征。由于卷积神经网络的局部连接性,使得模型的复杂度降低,提高模型训练的效率;并且,由于卷积神经网络的权值共享性,因此卷积神经网络可以并行学习,进一步提高模型训练效率。时序分类算法(Connectionist temporal classification,简称CTC),用于解决输入特征和输出标签之间对齐关系不确定的时间序列问题,是一种可以端到端同时优化模型参数和对齐切分的边界的算法。Among them, the original handwriting recognition model is a model obtained through multiple iterations of long-term and short-term memory neural networks. Long-short-term memory neural (LSTM) network is a kind of time recursive neural network, which is suitable for processing and predicting important events with time series, and the time series interval and delay are relatively long. Convolutional neural network (CNN) is a locally connected network. Compared with a fully connected network, its biggest feature is local connectivity and weight sharing. For a certain pixel p in an image, the closer the pixel p to the pixel p is, the more influence it has, that is, the greater the local connectivity. In addition, according to the statistical characteristics of natural images, the weight of a certain area can also be used for another area, that is, the weight sharing. Weight sharing can be understood as convolution kernel sharing. In a convolutional neural network (CNN), a convolution operation can be performed on a given image to extract a Chinese image feature. Different convolution kernels can be extracted. Different Chinese image features. Due to the local connectivity of the convolutional neural network, the complexity of the model is reduced, and the efficiency of model training is improved; and because the weights of the convolutional neural network are shared, the convolutional neural network can learn in parallel, further improving the efficiency of model training . Temporal classification algorithm (Connectionist Temporal Classification) (CTC) is used to solve the time series problem of uncertain alignment relationship between input features and output labels. It is an algorithm that can simultaneously optimize the model parameters and the boundary of the alignment and segmentation from end to end. .
具体地,服务器按照训练手写中文图像的时间顺序进行标注,并将标注好的训练手写中文图像输入到卷积神经网络-长短时记忆神经网络中进行训练,获取原始手写字识别模型。可以理解地,每个训练手写中文图像都是按顺序排列的,例如训练手写中文图像为“今天很开心”,则可按照从左到右以阿拉伯数字对每个训练手写中文图像进行标注,即“今(1)天(2)很(3)开(4)心(5)”,以使训练手写中文图像具备时序性,使得原始手写字识别模型能够联系上下文进行训练,提高模型的准确率。其中,(1)、(2)、(3)、(4)和(5)为顺序标签。Specifically, the server performs labeling according to the chronological order of the training handwritten Chinese images, and inputs the labeled training handwritten Chinese images into a convolutional neural network-long-term short-term memory neural network for training to obtain the original handwriting recognition model. Understandably, each training handwritten Chinese image is arranged in order. For example, the training handwritten Chinese image is "I am very happy today", then each training handwritten Chinese image can be labeled with Arabic numerals from left to right, that is, "Today (1) days (2) very (3) open (4) heart (5)", so that the training handwritten Chinese image has timeliness, so that the original handwriting recognition model can be trained in connection with the context and improve the accuracy of the model . Among them, (1), (2), (3), (4), and (5) are sequential tags.
长短时记忆神经网络具有输入层、隐藏层和输出层这三层网络结构。其中,输入层是长短时记忆神经网络的第一层,用于接收外界信号,即负责接收训练手写中文图像。输出层是长短时记忆神经网络的最后一层,用于向外界输出信号,即负责输出长短时记忆神经网络的计算结果。隐藏层是长短时记忆神经网络中除输入层和输出层之外的各层,用于对卷积神经网络提取的中文图像特征进行处理,获取长短时记忆神经网络的计算结果。可以理解地,采用长短时记忆神经网络进行模型训练增加了训练手写中文图像的时序性,以便根据上下文对训练手写中文图像进行训练,从而提高了目标手写字识别模型的准确率。Long-term short-term memory neural network has three layers of network structure: input layer, hidden layer and output layer. The input layer is the first layer of the long-term and short-term memory neural network, which is used to receive external signals, that is, it is responsible for receiving training handwritten Chinese images. The output layer is the last layer of the long-term and short-term memory neural network, which is used to output signals to the outside world, that is, it is responsible for outputting the calculation results of the long-term and short-term memory neural network. Hidden layers are layers other than the input layer and the output layer of the long-term and short-term memory neural network. They are used to process the Chinese image features extracted by the convolutional neural network to obtain the calculation results of the long-term and short-term memory neural network. Understandably, using long-short-term memory neural network for model training increases the timeliness of training handwritten Chinese images, so as to train the training handwritten Chinese images according to the context, thereby improving the accuracy of the target handwriting recognition model.
在一实施例中,如图3所示,步骤S13中,即对训练集中的训练手写中文图像进行顺序标注,并将标注好的训练手写中文图像输入到卷积神经网络-长短时记忆神经网络中进行训练,采用时序分类算法对卷积神经网络-长短时记忆神经网络的网络参数进行更新,获取原始手写字识别模型,具体包括如下步骤:In an embodiment, as shown in FIG. 3, in step S13, the training handwritten Chinese images in the training set are sequentially labeled, and the labeled training handwritten Chinese images are input to a convolutional neural network-long-term and short-term memory neural network. The training is performed in time series, and the time series classification algorithm is used to update the network parameters of the convolutional neural network-long and short-term memory neural network to obtain the original handwriting recognition model. The specific steps include the following steps:
S131:在卷积神经网络中对训练手写中文图像进行特征提取,获取中文图像特征。S131: Perform feature extraction on the trained handwritten Chinese image in a convolutional neural network to obtain Chinese image features.
中文图像特征是采用卷积神经网络对训练手写中文图像进行特征提取所获取到的训练手写中文图像对应的图像特征。卷积神经网络模型包括卷积层和池化层。将训练手写中文图像输入卷积神经网络模型中进行训练,通过每一层卷积层的计算,获取每一层的卷积层的输出,卷积层的输出可以通过公式a m l=σ(z m l)=σ(a m l-1*W l+b l)计算,其中,a m l表示第l层卷积层的第m个顺序标签的输出,即中文图像特征,z m l表示未采用激活函数处理前的第m个顺序标签的输出,a m l-1表示l-1层的第m个顺序标签输出(即第m个顺序标签所对应的训练手写中文图像的中文图像特征),σ表示激活函数,对于卷积层采用的激活函数σ为ReLu(Rectified Linear Unit,线性整流函数),相比其他激活函数的效果会更好),*表示卷积运算,W l表示第l层的卷积核(权值),b l表示第l层的偏置。若第l层是池化层,则在池化层采用最大池化的下样采样对卷积层的输出进行降维处理,具体降维公式为a m l=pool(a m l-1),其中,pool是指下采样计算,该下采样计算可以选择最大池化的方法,最大池化实际上就是在m*m的样本中取最大值。可以理解地,该中文图像特征携带有顺序标签,该中文图像特征的顺序标签与该中文图像特征对应的训练手写中文图像的顺序标签一致。 Chinese image features are image features corresponding to the training handwritten Chinese image obtained by extracting the features of the training handwritten Chinese image using a convolutional neural network. The convolutional neural network model includes a convolutional layer and a pooling layer. The trained handwritten Chinese image is input into the convolutional neural network model for training. The output of the convolutional layer of each layer is obtained through the calculation of the convolutional layer of each layer. The output of the convolutional layer can be calculated by the formula a m l = σ ( z m l ) = σ (a m l-1 * W l + b l ) calculation, where a m l represents the output of the m-th sequential label of the l-th convolution layer, that is, the Chinese image feature, z m l Represents the output of the m-th sequential label before the activation function is processed, and a m l-1 indicates the output of the m-th sequential label of the layer 1-1 (that is, the Chinese image of the training handwritten Chinese image corresponding to the m-th sequential label (Characteristics), σ represents the activation function, and the activation function σ used for the convolution layer is ReLu (Rectified Linear Unit, linear rectification function), which has a better effect than other activation functions), * represents the convolution operation, and W l represents The convolution kernel (weight) of the first layer, and b l represents the offset of the first layer. If the first layer is a pooling layer, the maximum pooling downsampling is used to reduce the output of the convolution layer in the pooling layer. The specific dimension reduction formula is a m l = pool (a m l-1 ) Among them, pool refers to the downsampling calculation. The downsampling calculation can choose the maximum pooling method. The maximum pooling is actually taking the maximum value in the m * m sample. Understandably, the Chinese image feature carries an order label, and the order label of the Chinese image feature is consistent with the order label of the training handwritten Chinese image corresponding to the Chinese image feature.
S132:在长短时记忆神经网络的隐藏层采用第一激活函数对中文图像特征进行处理,获取携带激 活状态标识的神经元。S132: In the hidden layer of the long-term and short-term memory neural network, the first activation function is used to process the features of the Chinese image to obtain the neurons carrying the identification of the activation state.
其中,长短时记忆神经网络的隐藏层中的每个神经元包括三个门,其分别为输入门、遗忘门和输出门。遗忘门决定了在神经元中所要丢弃的过去的信息。输入门决定了在神经元中所要增加的信息。输出门决定了在神经元中所要输出的信息。第一激活函数是用于激活神经元状态的函数。神经元状态决定了各个门(即输入门、遗忘门和输出门)的丢弃、增加和输出的信息。激活状态标识包括通过标识和不通过标识。本实施例中的输入门、遗忘门和输出门对应的标识分别为i、f和o。Among them, each neuron in the hidden layer of the long-term and short-term memory neural network includes three gates, which are an input gate, a forgetting gate, and an output gate, respectively. The forget gate determines the past information to be discarded in the neuron. The input gate determines the information to be added to the neuron. The output gate determines the information to be output in the neuron. The first activation function is a function for activating a neuron state. The state of the neuron determines the information discarded, added, and output by each gate (ie, input gate, forget gate, and output gate). The activation status flag includes a pass flag and a fail flag. The identifiers corresponding to the input gate, the forget gate, and the output gate in this embodiment are i, f, and o, respectively.
本实施例中,具体选用Sigmoid(S型生长曲线)函数作为第一激活函数,Sigmoid函数是一个在生物学中常见的S型的函数,在信息科学中,由于其单增以及反函数单增等性质,Sigmoid函数常被用作神经网络的阈值函数,将变量映射到0-1之间。其激活函数的计算公式为
Figure PCTCN2018094235-appb-000001
其中,z表示遗忘门的输出值。
In this embodiment, the Sigmoid (S-shaped growth curve) function is specifically selected as the first activation function. The Sigmoid function is a S-shaped function common in biology. In information science, due to its single increase and inverse function single increase In other properties, the Sigmoid function is often used as a threshold function for neural networks, mapping variables between 0-1. The calculation formula for its activation function is
Figure PCTCN2018094235-appb-000001
Among them, z represents the output value of the forget gate.
具体地,遗忘门中包括遗忘门限,通过计算每一神经元(中文图像特征)的激活状态,以获取携带激活状态标识为通过标识的神经元。其中,采用遗忘门的计算公式f t=σ(W f·[h t-1,x t]+b f)计算遗忘门哪些信息被接收(即只接收携带激活状态标识为通过标识的神经元),f t表示遗忘门限(即激活状态),W f表示遗忘门的权重矩阵,b f表示遗忘门的权值偏置项,h t-1表示上一时刻神经元的输出,x t表示t时刻的输入数据(即中文图像特征),t表示当前时刻,t-1表示上一时刻。遗忘门中还包括遗忘门限,通过遗忘门的计算公式对中文图像特征进行计算会得到一个0-1区间的标量,此标量决定了神经元根据当前状态和过去状态的综合判断所接收过去信息的比例,以达到数据的降维,减少计算量,提高训练效率。 Specifically, the forgetting gate includes a forgetting threshold. By calculating an activation state of each neuron (Chinese image feature), a neuron carrying an activation state identifier as a pass identifier is obtained. Among them, the calculation formula of the forgetting gate is f t = σ (W f · [h t-1 , x t ] + b f ) to calculate which information of the forgetting gate is received (that is, only the neurons carrying the activation status flag as the pass flag are received). ), F t represents the forgetting threshold (that is, the activation state), W f represents the weight matrix of the forgetting gate, b f represents the weight bias term of the forgetting gate, h t-1 represents the output of the neuron at the previous moment, and x t represents Input data (ie Chinese image features) at time t, where t is the current time and t-1 is the previous time. The forgetting gate also includes the forgetting threshold. Calculating the Chinese image features through the calculating formula of the forgetting gate will obtain a 0-1 interval scalar. This scalar determines the past information received by the neuron based on the comprehensive judgment of the current state and the past state. Proportion to achieve data reduction, reduce the amount of calculation, and improve training efficiency.
S133:在长短时记忆神经网络的隐藏层采用第二激活函数对携带激活状态标识的神经元进行处理,获取长短时记忆神经网络输出层的输出。S133: In the hidden layer of the long-term and short-term memory neural network, a second activation function is used to process the neuron carrying the identification of the activation state to obtain the output of the long-term and short-term memory neural network output layer.
具体地,在长短时记忆神经网络的隐藏层中的输入门中,采用第二激活函数携带激活状态标识为通过标识的神经元进行计算,获取隐藏层的输出。本实施例中,由于线性模型的表达能力不够,因此采用tanh(双曲正切)函数作为输入门的激活函数(即第二激活函数),可加入非线性因素使得训练出的目标手写字识别模型能够解决更复杂的问题。并且,激活函数tanh(双曲正切)具有收敛速度快的优点,可以节省训练时间,提高训练效率。Specifically, in the input gate in the hidden layer of the long-term and short-term memory neural network, the second activation function is used to carry the activation state identifier to perform calculation through the identified neurons to obtain the output of the hidden layer. In this embodiment, because the expressive ability of the linear model is insufficient, a tanh (hyperbolic tangent) function is used as the activation function of the input gate (ie, the second activation function). Non-linear factors can be added to make the trained target handwriting recognition model Able to solve more complex problems. In addition, the activation function tanh (hyperbolic tangent) has the advantage of fast convergence speed, which can save training time and improve training efficiency.
具体地,通过输入门的计算公式计算输入门的输出。其中,输入门中还包括输入门限,输入门的计算公式为i t=σ(W i·[h t-1,x t]+b i),W i为输入门的权值矩阵,i t表示输入门限,b i表示输入门的偏置项,通过输入门的计算公式对中文图像特征进行计算会得到一个0-1区间的标量(即输入门限),此标量控制了神经元根据当前状态和过去状态的综合判断所接收当前信息的比例,即接收新输入的信息的比例,以减少计算量,提高训练效率。 Specifically, the output of the input gate is calculated by a calculation formula of the input gate. Wherein the input gate further includes a calculation formula input threshold, the input gate is i t = σ (W i · [h t-1, x t] + b i), W i is the weight of input gates value matrix, i t Represents the input threshold, b i represents the bias term of the input gate, and calculating the Chinese image features through the calculation formula of the input gate will obtain a 0-1 interval scalar (that is, the input threshold). This scalar controls the neuron according to the current state Comprehensively judge with the past state the proportion of the current information received, that is, the proportion of the newly input information, to reduce the amount of calculation and improve the training efficiency.
然后,采用神经元状态的计算公式
Figure PCTCN2018094235-appb-000002
Figure PCTCN2018094235-appb-000003
计算当前神经元状态;其中,W i为输入门的权值矩阵,W c表示计算单元状态的权重矩阵,i t表示输入门限,b i表示输入门的偏置项,b c表示单元状态的偏置项,
Figure PCTCN2018094235-appb-000004
表示上一时刻的神经元状态,C t表示t时刻神经元状态。通过将神经元状态和遗忘门限(输入门限)进行点乘操作,以便模型只输出所需的信息,提高模型学习的效率。
Then, the calculation formula of the state of the neuron is adopted.
Figure PCTCN2018094235-appb-000002
with
Figure PCTCN2018094235-appb-000003
Calculate the current states of neurons; wherein, W i represents the weight calculation unit state weight matrix, i t represents the input threshold, b i represents a bias term input gates, b c indicates cell state to the right input gates value matrix, W c Bias term,
Figure PCTCN2018094235-appb-000004
Represents the state of the neuron at the last moment, and C t represents the state of the neuron at time t. By performing a dot product operation on the state of the neuron and the forgetting threshold (input threshold), the model can only output the required information, thereby improving the efficiency of model learning.
最后,采用输出门的计算公式o t=σ(W o[h t-1,x t]+b o)计算输出门中哪些信息被输出,再采用公式h t=o t*tanh(C t)计算t时刻神经元的输出,其中,o t表示输出门限,W o表示输出门的权重矩阵,b o表示输出门的偏置项,h t表示t时刻神经元的输出(即长短时记忆神经网络输出层的输出)。在隐藏层将长短时记忆神经网络隐藏层的前向输出和长短时记忆神经网络隐藏层的后向输出输入到长短时记忆神经网络的输出层后,在长短时记忆神经网络的输出层采用公式ln(a+b)=lna+ln(1+e lnb-lna)对长短时记忆神经网络隐藏层的前向输出和长短时记忆神经网络隐藏层的后向输出进行对数计算,获取目标输出,以便构造损失函数。其中,目标输出即为长短时记忆神经网络输出层的输出,a为长短时记忆神经网络隐藏层的前向输出,b为长短时记忆神经网络隐藏层的后向输出。 Finally, the output gate calculation formula o t = σ (W o [h t-1 , x t ] + b o ) is used to calculate which information is output in the output gate, and then the formula h t = o t * tanh (C t ) Calculate the output of the neuron at time t, where o t represents the output threshold, W o represents the weight matrix of the output gate, bo represents the bias term of the output gate, and h t represents the output of the neuron at time t (that is, short-term memory Output of the neural network output layer). After the hidden layer inputs the forward output of the hidden layer of the long-term and short-term memory neural network and the backward output of the hidden layer of the long-term and short-term memory neural network into the output layer of the long-term and short-term memory neural network, the formula is adopted at the output layer of the long-term and short-term memory neural network ln (a + b) = lna + ln (1 + e lnb-lna ) Logarithmic calculation of the forward output of the hidden layer of the long-term and short-term memory neural network and the backward output of the hidden layer of the long-term and short-term memory neural network, to obtain the target output To construct the loss function. Among them, the target output is the output of the long-term and short-term memory neural network output layer, a is the forward output of the long- and short-term memory neural network hidden layer, and b is the backward output of the long- and short-term memory neural network hidden layer.
其中,长短时记忆神经网络隐藏层的前向输出是指在长短时记忆神经网络隐藏层按照时间顺序输出的第u个顺序标签对应的中文图像特征的概率。后向输出是指在长短时记忆神经网络隐藏层按照时间逆顺序输出的第u个顺序标签对应的中文图像特征的概率。如“我今天心情很好”假设第u个顺序标签对应的中文图像特征为“天”,t-1时刻长短时记忆神经网络隐藏层的输出为“今”,根据t-1时刻长短时记忆神经网络隐藏层的输出“今”和t时刻的长短时记忆神经网络输入层的输入“天”计算t时刻长短时记忆神经网络隐藏层的输出,该t时刻的输出可能包括“天、大和木”,则长短时记忆神经网络隐藏层的前向输出指t时刻长短时记忆神经网络隐藏层的输出为“天”概率。假设t+1时刻长短时记忆神经网络隐藏层的输出为“心”,根据t+1时刻长短时记忆神经网络隐藏层的输出“心”和t时刻的长短时记忆神经网络输入层的输入“天”计算t时刻长短时记忆神经网络隐藏层的的输出,该t时刻长短时记忆神经网络隐藏层的的输出可能包括“天、大和木”,则长短时记忆神经网络隐藏层的后向输出指t时刻输出为“天”概率。The forward output of the hidden layer of the long-term and short-term memory neural network refers to the probability of the Chinese image features corresponding to the u-th order labels output by the hidden layer of the long-term and short-term memory neural network in time sequence. Backward output refers to the probability of Chinese image features corresponding to the u-th order label output by the hidden layer of the memory neural network in reverse order in time. For example, "I'm in a good mood today" assuming that the Chinese image corresponding to the u-th sequential label feature is "day", and the output of the hidden layer of the memory neural network at time t-1 is "today". The output of the hidden layer of the neural network is "now" and the length of the short-term memory at time t. The input of the neural network input layer is "day". The output of the hidden layer of the memory time at time t may be calculated. ", Then the forward output of the hidden layer of the long-term memory neural network refers to the probability that the output of the hidden layer of the long-term memory neural network at time t is" day ". Assume that the output of the hidden layer of the short-term memory neural network at time t + 1 is "heart", and according to the output of the hidden layer of the short-term memory neural network at time t + 1 and the input of the input layer of the short-term memory neural network at time t " "Day" calculates the output of the hidden layer of the memory neural network at time t, and the output of the hidden layer of the memory neural network at time t may include "day, Yamato and wood", then the backward output of the hidden layer of the long-term memory neural network Refers to the probability of "day" output at time t.
S134:根据长短时记忆神经网络输出层的输出,采用时序分类算法对卷积神经网络-长短时记忆神经网络的网络参数进行更新,获取目标手写字识别模型。S134: According to the output of the long-term and short-term memory neural network output layer, a time series classification algorithm is used to update the network parameters of the convolutional neural network and the long- and short-term memory neural network to obtain a target handwriting recognition model.
卷积神经网络-长短时记忆神经网络的网络参数即为权值和偏置。首先,根据长短时记忆神经网络隐藏层的前向输出公式
Figure PCTCN2018094235-appb-000005
计算t时刻第u个顺序标签对应的中文图像特征在长短时记忆神经网络隐藏层的前向输出。其中,
Figure PCTCN2018094235-appb-000006
表示t时刻输出为空格的概率,a(t-1,i)表示t-1时刻第i个中文图像特征的前向输出,l'表示顺序标签的数量。根据长短时记忆神经网络隐藏层的后向输出的公式
Figure PCTCN2018094235-appb-000007
计算t时刻第u个顺序标签对应的中文图像特征在长短时记忆神经网络隐藏层的后向输出,其中,
Figure PCTCN2018094235-appb-000008
表示(t+1)时刻输出为空格的概率,a(t+1,i)表示t+1时刻第i个顺序标签对应的中文图像特征在长短时记忆神经网络隐藏层的后向输出。空格表示长短时记忆神经网络输出层的输出中的空白字符。
The network parameters of the convolutional neural network-long-short-term memory neural network are weights and biases. First, according to the forward output formula of the hidden layer of the long-term memory neural network
Figure PCTCN2018094235-appb-000005
Calculate the forward output of the Chinese image features corresponding to the u-th order label at time t in the hidden layer of the memory neural network. among them,
Figure PCTCN2018094235-appb-000006
Indicates the probability that the output is a space at time t, a (t-1, i) indicates the forward output of the i-th Chinese image feature at time t-1, and l 'indicates the number of sequential labels. Formula for backward output of hidden layer of long-term short-term memory neural network
Figure PCTCN2018094235-appb-000007
Calculate the backward output of the Chinese image feature corresponding to the uth order label at time t in the hidden layer of the memory neural network.
Figure PCTCN2018094235-appb-000008
Represents the probability of output as a space at time (t + 1), and a (t + 1, i) represents the backward output of the Chinese image feature corresponding to the ith sequence label at time t + 1 in the hidden layer of the memory neural network. Spaces indicate white space characters in the output layer of the memory neural network output layer.
具体地,根据长短时记忆神经网络输出层的输出采用时序分类算法的公式构建损失函数。该时序分类算法的公式具体为:E loss=-ln∑ (x,z)∈Sp(z|x),p(z|x)=a(t,u)b(t,u),其中,p(z|x)表示输入 中文图像特征x在长短时记忆神经网络输出层的输出为z的概率,a(t,u)表示第t时刻第u个顺序标签对应的中文图像特征在长短时记忆神经网络隐藏层的前向输出,b(t,u)表示第t时刻第u个顺序标签对应的中文图像特征在长短时记忆神经网络隐藏层的后向输出。最后,在获取E loss后,通过对E loss求偏导,更新长短时记忆神经网络和卷积神经网络中的网络参数,获取原始手写字识别模型。其中,求偏导的公式为
Figure PCTCN2018094235-appb-000009
θ为网络参数,具体为卷积神经网络和长短时记忆神经网络的网络中的权值和偏置。
Specifically, a loss function is constructed by using a formula of a time-series classification algorithm according to the output of the output layer of the long-term and short-term memory neural network. The formula of the time-series classification algorithm is specifically: E loss = -ln∑ (x, z) ∈ S p (z | x), p (z | x) = a (t, u) b (t, u), where , P (z | x) represents the probability that the input Chinese image feature x will be z in the output layer of the memory neural network in the short and long term, and a (t, u) represents the Chinese image feature corresponding to the uth sequence label at time t. The forward output of the hidden layer of the memory neural network, and b (t, u) represents the backward output of the Chinese image feature corresponding to the uth sequential label at the t-th time in the hidden layer of the memory neural network. Finally, after obtaining E loss , the original handwriting recognition model is obtained by updating the network parameters in the short-term memory neural network and the convolutional neural network by obtaining a partial derivative of E loss . Among them, the formula for finding partial derivatives is
Figure PCTCN2018094235-appb-000009
θ is a network parameter, specifically weights and biases in a network of a convolutional neural network and a long-short-term memory neural network.
S14:采用测试集中的训练手写中文图像对原始手写字识别模型进行测试,在测试准确率大于预设准确率时,获取目标手写字识别模型。S14: The original handwriting recognition model is tested by using the training handwritten Chinese images in the test set, and the target handwriting recognition model is obtained when the test accuracy is greater than a preset accuracy rate.
具体地,步骤S14中,将测试集中所有训练手写中文图像输入原始手写字识别模型进行测试,获取测试准确率(即将所有预测结果准确的数量除以训练集中所有训练手写中文图像的数量)。再判断测试准确率是否大于预设准确率,若测试准确率大于预设准确率,则认定该原始手写字识别模型较准确,以将该原始手写字识别模型作为目标手写字识别模型;反之,若测试准确率不大于预设准确率,则认定该原始手写字识别模型的预测结果不够准确,仍需再采用步骤S11-S13进行训练后,再次进行测试,直至测试准确率达到预设准确率,停止训练,进一步提高目标手写字识别模型准确率。Specifically, in step S14, all the training handwritten Chinese images in the test set are input to the original handwriting recognition model for testing, and the test accuracy rate is obtained (that is, the number of accurate prediction results is divided by the number of all training handwritten Chinese images in the training set). Then judge whether the test accuracy rate is greater than the preset accuracy rate. If the test accuracy rate is greater than the preset accuracy rate, the original handwriting recognition model is deemed to be more accurate, and the original handwriting recognition model is used as the target handwriting recognition model; otherwise, If the test accuracy rate is not greater than the preset accuracy rate, it is determined that the prediction result of the original handwriting recognition model is not accurate enough, and it is still necessary to use steps S11-S13 for training, and then test again until the test accuracy rate reaches the preset accuracy rate. , Stop training, and further improve the accuracy of the target handwriting recognition model.
本实施例中,先获取训练手写中文图像,并按预设比例将训练手写中文图像划分成训练集和测试集,以便对训练集中的训练手写中文图像进行顺序标注,以使训练手写中文图像具备时序性。将标注好的训练手写中文图像输入到卷积神经网络-长短时记忆神经网络中进行训练,根据训练手写中文图像的时序性,以便卷积神经网络-长短时记忆神经网络根据上下文对训练手写中文图像进行训练,采用时序分类算法对卷积神经网络-长短时记忆神经网络的网络参数进行更新,获取原始手写字识别模型,解决了输入特征和输出标签之间对齐关系不确定的时间序列问题,实现端到端的输出,提高原始手写字识别模型的泛化性。最后,采用测试集中的训练手写中文图像对原始手写字识别模型进行测试,在测试准确率大于预设准确率时,获取目标手写字识别模型,进一步提高了目标手写字识别模型的准确率。In this embodiment, the training handwritten Chinese image is first obtained, and the training handwritten Chinese image is divided into a training set and a test set according to a preset ratio, so that the training handwritten Chinese image in the training set is labeled sequentially so that the training handwritten Chinese image has Timing. The labeled handwritten Chinese image is input to the convolutional neural network-long and short-term memory neural network for training. According to the time series of the trained handwritten Chinese image, the convolutional neural network-long and short-term memory neural network trains the handwritten Chinese according to the context. Image training, using time series classification algorithm to update the network parameters of convolutional neural network-long-term short-term memory neural network, to obtain the original handwriting recognition model, solve the time series problem of uncertain alignment relationship between input features and output labels, Realize end-to-end output and improve the generalization of the original handwriting recognition model. Finally, the original handwriting recognition model is tested using the training handwritten Chinese images in the test set. When the test accuracy is greater than the preset accuracy rate, the target handwriting recognition model is obtained, which further improves the accuracy of the target handwriting recognition model.
在一实施例中,提供一种中文模型训练装置,该中文模型训练装置与上述实施例中中文模型训练方法一一对应。如图4所示,该中文模型训练装置包括训练手写中文图像获取模块11、训练手写中文图像划分模块12、原始手写字识别模型获取模块13和目标手写字识别模型获取模块14,各功能模块详细说明如下:In one embodiment, a Chinese model training device is provided, and the Chinese model training device corresponds to the Chinese model training method in the above embodiment one-to-one. As shown in FIG. 4, the Chinese model training device includes a training handwritten Chinese image acquisition module 11, a training handwritten Chinese image division module 12, an original handwriting recognition model acquisition module 13 and a target handwriting recognition model acquisition module 14, each functional module is detailed described as follows:
训练手写中文图像获取模块11,用于获取训练手写中文图像。A training handwritten Chinese image acquisition module 11 is configured to acquire a training handwritten Chinese image.
训练手写中文图像划分模块12,用于将训练手写中文图像按预设比例划分成训练集和测试集。The training handwritten Chinese image division module 12 is configured to divide the training handwritten Chinese image into a training set and a test set according to a preset ratio.
原始手写字识别模型获取模块13,用于对训练集中的训练手写中文图像进行顺序标注,并将标注好的训练手写中文图像输入到卷积神经网络-长短时记忆神经网络中进行训练,采用时序分类算法对卷积神经网络-长短时记忆神经网络的网络参数进行更新,获取原始手写字识别模型。The original handwriting recognition model acquisition module 13 is used for sequentially labeling the training handwritten Chinese images in the training set, and inputting the labeled training handwritten Chinese images into the convolutional neural network-long and short-term memory neural network for training, using time series The classification algorithm updates the network parameters of the convolutional neural network-long-term short-term memory neural network to obtain the original handwriting recognition model.
具体地,原始手写字识别模型获取模块13包括中文图像特征获取单元131、激活状态神经元获取单元132、输出层输出获取单元133和目标识别模型获取单元134。Specifically, the original handwriting recognition model acquisition module 13 includes a Chinese image feature acquisition unit 131, an activation state neuron acquisition unit 132, an output layer output acquisition unit 133, and a target recognition model acquisition unit 134.
中文图像特征获取单元131,用于在卷积神经网络中对训练手写中文图像进行特征提取,获取中文图像特征。The Chinese image feature acquiring unit 131 is configured to perform feature extraction on a trained handwritten Chinese image in a convolutional neural network to acquire Chinese image features.
激活状态神经元获取单元132,用于在长短时记忆神经网络的隐藏层采用第一激活函数对中文图像特征进行处理,获取携带激活状态标识的神经元。The activation state neuron acquisition unit 132 is configured to process a Chinese image feature using a first activation function in a hidden layer of a long-term and short-term memory neural network to acquire a neuron carrying an activation state identifier.
输出层输出获取单元133,用于在长短时记忆神经网络的隐藏层采用第二激活函数对携带激活状态标识的神经元进行处理,获取长短时记忆神经网络输出层的输出。The output layer output obtaining unit 133 is configured to process the neuron carrying the activation state identifier in the hidden layer of the long-term and short-term memory neural network to obtain the output of the output layer of the long-term and short-term memory neural network.
目标识别模型获取单元134,用于根据长短时记忆神经网络输出层的输出,采用时序分类算法对 卷积神经网络-长短时记忆神经网络的网络参数进行更新,获取目标手写字识别模型。The target recognition model acquisition unit 134 is configured to update the network parameters of the convolutional neural network-long-term and short-term memory neural network by using a time-series classification algorithm according to the output of the long-term and short-term memory neural network output layer to obtain a target handwriting recognition model.
目标手写字识别模型获取模块14,用于采用测试集中的训练手写中文图像对原始手写字识别模型进行测试,在测试准确率大于预设准确率时,获取目标手写字识别模型。The target handwriting recognition model acquisition module 14 is used to test the original handwriting recognition model using the training handwritten Chinese images in the test set. When the test accuracy is greater than a preset accuracy rate, the target handwriting recognition model is obtained.
具体地,时序分类算法的公式为:E loss=-ln∏ (x,z)∈Sp(z|x),p(z|x)=a(t,u)b(t,u),其中,p(z|x)表示输入所述中文图像特征x,在所述长短时记忆神经网络输出层的输出为z的概率,a(t,u)表示第t时刻第u个顺序标签对应的所述中文图像特征在长短时记忆神经网络隐藏层的前向输出,b(t,u)表示第t时刻第u个顺序标签对应的所述中文图像特征在长短时记忆神经网络隐藏层的后向输出。 Specifically, the formula of the time-series classification algorithm is: E loss = -ln∏ (x, z) ∈ S p (z | x), p (z | x) = a (t, u) b (t, u), Among them, p (z | x) represents the probability that the Chinese image feature x is input, and the output of the output layer of the memory neural network is z in the short-term and long-term, and a (t, u) represents the corresponding u-th order label at time t. Forward of the Chinese image features in the hidden layer of the long-term and short-term memory neural network, b (t, u) represents the Chinese image features corresponding to the u-th order label at time t in the hidden layer of the long-term and short-term memory neural network. Backward output.
关于中文模型训练装置的具体限定可以参见上文中对于中文模型训练方法的限定,在此不再赘述。上述中文模型训练装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific limitation of the Chinese model training device, refer to the limitation on the Chinese model training method described above, which is not repeated here. Each module in the aforementioned Chinese model training device may be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules may be embedded in the hardware in or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图10所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于用于存储执行中文模型训练方法过程中生成或获取的数据,如目标手写字识别模型。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种中文模型训练方法。In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 10. The computer device includes a processor, a memory, a network interface, and a database connected through a system bus. The processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for running the operating system and computer programs in a non-volatile storage medium. The database of the computer equipment is used to store data generated or obtained during the execution of the Chinese model training method, such as a target handwriting recognition model. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer program is executed by a processor to implement a Chinese model training method.
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行计算机程序时实现以下步骤:获取训练手写中文图像;将训练手写中文图像按预设比例划分成训练集和测试集;对训练集中的训练手写中文图像进行顺序标注,并将标注好的训练手写中文图像输入到卷积神经网络-长短时记忆神经网络中进行训练,采用时序分类算法对卷积神经网络-长短时记忆神经网络的网络参数进行更新,获取原始手写字识别模型;采用测试集中的训练手写中文图像对原始手写字识别模型进行测试,在测试准确率大于预设准确率时,获取目标手写字识别模型。In one embodiment, a computer device is provided, including a memory, a processor, and a computer program stored on the memory and executable on the processor. When the processor executes the computer program, the following steps are performed: acquiring a training handwritten Chinese image; Divide the training handwritten Chinese image into a training set and a test set according to a preset ratio; annotate the training handwritten Chinese image in the training set in sequence, and input the labeled training handwritten Chinese image to the convolutional neural network-long-term memory neural network The training was performed in time series, and the time series classification algorithm was used to update the network parameters of the convolutional neural network and long-term short-term memory neural network to obtain the original handwriting recognition model. The original handwriting recognition model was tested using the training handwritten Chinese images in the test set. When the test accuracy is greater than a preset accuracy, a target handwriting recognition model is obtained.
在一个实施例中,处理器执行计算机程序时还实现以下步骤:在卷积神经网络中对训练手写中文图像进行特征提取,获取中文图像特征;在长短时记忆神经网络的隐藏层采用第一激活函数对中文图像特征进行处理,获取携带激活状态标识的神经元;在长短时记忆神经网络的隐藏层采用第二激活函数对携带激活状态标识的神经元进行处理,获取长短时记忆神经网络输出层的输出;根据长短时记忆神经网络输出层的输出,采用时序分类算法对卷积神经网络-长短时记忆神经网络的网络参数进行更新,获取目标手写字识别模型。In one embodiment, when the processor executes the computer program, the following steps are further implemented: feature extraction of the trained handwritten Chinese image in the convolutional neural network to obtain the Chinese image features; and the first activation of the hidden layer of the long-term memory neural network using the first activation The function processes the Chinese image features to obtain the neurons carrying the activation state identifier. In the hidden layer of the long-term and short-term memory neural network, the second activation function is used to process the neurons carrying the activation state identifier to obtain the long-term and short-term memory neural network output layer. According to the output of the long- and short-term memory neural network output layer, a time series classification algorithm is used to update the network parameters of the convolutional neural network-long-and-short-term memory neural network to obtain the target handwriting recognition model.
具体地,时序分类算法的公式为:E loss=-ln∏ (x,z)∈Sp(z|x),p(z|x)=a(t,u)b(t,u),其中,p(z|x)表示输入中文图像特征x,在长短时记忆神经网络输出层的输出为z的概率,a(t,u)表示第t时刻第u个顺序标签对应的所述中文图像特征在长短时记忆神经网络隐藏层的前向输出,b(t,u)表示第t时刻第u个顺序标签对应的中文图像特征在长短时记忆神经网络隐藏层的后向输出。 Specifically, the formula of the time-series classification algorithm is: E loss = -ln∏ (x, z) ∈ S p (z | x), p (z | x) = a (t, u) b (t, u), Among them, p (z | x) represents the input Chinese image feature x, the probability that the output of the output layer of the memory neural network is z in a short time, and a (t, u) represents the Chinese corresponding to the uth order label at time t. The forward output of the image feature in the hidden layer of the long-term memory neural network, b (t, u) represents the backward output of the Chinese image feature corresponding to the u-th order label at the t-th time in the hidden layer of the short-term memory neural network.
在一个实施例中,提供了一个或多个存储有计算机可读指令的非易失性可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:获取训练手写中文图像;将训练手写中文图像按预设比例划分成训练集和测试集;对训练集中的训练手写中文图像进行顺 序标注,并将标注好的训练手写中文图像输入到卷积神经网络-长短时记忆神经网络中进行训练,采用时序分类算法对卷积神经网络-长短时记忆神经网络的网络参数进行更新,获取原始手写字识别模型;采用测试集中的训练手写中文图像对原始手写字识别模型进行测试,在测试准确率大于预设准确率时,获取目标手写字识别模型。In one embodiment, one or more non-volatile readable storage media storing computer-readable instructions are provided, and when the computer-readable instructions are executed by one or more processors, the one or more Each processor executes the following steps: obtaining a training handwritten Chinese image; dividing the training handwritten Chinese image into a training set and a test set according to a preset ratio; sequentially labeling the training handwritten Chinese image in the training set, and labeling the labeled training handwritten Chinese image The image is input to the convolutional neural network-long and short-term memory neural network for training, and the time series classification algorithm is used to update the network parameters of the convolutional neural network-long and short-term memory neural network to obtain the original handwriting recognition model; training in the test set is used The handwritten Chinese image is used to test the original handwriting recognition model. When the test accuracy is greater than a preset accuracy rate, the target handwriting recognition model is obtained.
在一个实施例中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行时还实现以下步骤:在卷积神经网络中对训练手写中文图像进行特征提取,获取中文图像特征;在长短时记忆神经网络的隐藏层采用第一激活函数对中文图像特征进行处理,获取携带激活状态标识的神经元;在长短时记忆神经网络的隐藏层采用第二激活函数对携带激活状态标识的神经元进行处理,获取长短时记忆神经网络输出层的输出;根据长短时记忆神经网络输出层的输出,采用时序分类算法对卷积神经网络-长短时记忆神经网络的网络参数进行更新,获取目标手写字识别模型。In one embodiment, when the computer-readable instructions are executed by one or more processors, the execution of the one or more processors further implements the following steps: performing training on a handwritten Chinese image in a convolutional neural network Feature extraction to obtain Chinese image features; the first activation function is used to process the Chinese image features in the hidden layer of the long-term and short-term memory neural network to obtain the neurons carrying the activation status identifier; the second layer is used in the hidden layer of the long-term and short-term memory neural network. The activation function processes the neurons carrying the identification of the activation state to obtain the output of the long-term and short-term memory neural network output layer; according to the output of the long-term and short-term memory neural network output layer, a time-series classification algorithm is used for the convolutional neural network-long-and-short-term memory neural network The network parameters are updated to obtain the target handwriting recognition model.
具体地,时序分类算法的公式为:E loss=-ln∏ (x,z)∈Sp(z|x),p(z|x)=a(t,u)b(t,u),其中,p(z|x)表示输入中文图像特征x,在长短时记忆神经网络输出层的输出为z的概率,a(t,u)表示第t时刻第u个顺序标签对应的所述中文图像特征在长短时记忆神经网络隐藏层的前向输出,b(t,u)表示第t时刻第u个顺序标签对应的中文图像特征在长短时记忆神经网络隐藏层的后向输出。 Specifically, the formula of the time-series classification algorithm is: E loss = -ln∏ (x, z) ∈ S p (z | x), p (z | x) = a (t, u) b (t, u), Among them, p (z | x) represents the input Chinese image feature x, the probability that the output of the output layer of the memory neural network is z in a short time, and a (t, u) represents the Chinese corresponding to the uth order label at time t. The forward output of the image feature in the hidden layer of the long-term memory neural network, b (t, u) represents the backward output of the Chinese image feature corresponding to the u-th order label at the t-th time in the hidden layer of the short-term memory neural network.
在一实施例中,如图5所示,提供一种中文图像识别方法,以该方法应用在图1中的服务器为例进行说明,包括如下步骤:In one embodiment, as shown in FIG. 5, a Chinese image recognition method is provided. The method is applied to the server in FIG. 1 as an example, and includes the following steps:
S21:获取待识别中文图像,待识别中文图像包括手写汉字和背景图片。S21: Acquire a Chinese image to be identified. The Chinese image to be identified includes handwritten Chinese characters and background pictures.
其中,待识别中文图像是由计算机设备上的采集模块采集到的未经处理的包含手写汉字的图像。该待识别中文图像包括手写汉字和背景图片。背景图片是待识别中文图像中除手写汉字之外的噪声图片。噪声图片是对手写汉字造成干扰的图片。本实施例中,用户可通过计算机设备上的采集模块采集包含手写汉字的待识别中文图像上传到服务器,以使服务器获取待识别中文图像。该采集模块包括但不限于相机拍摄和本地上传。The Chinese image to be identified is an unprocessed image containing handwritten Chinese characters collected by a collection module on a computer device. The Chinese image to be recognized includes handwritten Chinese characters and background pictures. The background picture is a noise picture other than handwritten Chinese characters in the Chinese image to be identified. Noise pictures are pictures that interfere with handwritten Chinese characters. In this embodiment, the user can collect the Chinese image to be recognized containing handwritten Chinese characters and upload it to the server through the acquisition module on the computer device, so that the server acquires the Chinese image to be recognized. The acquisition module includes but is not limited to camera shooting and local upload.
S22:对待识别中文图像进行预处理,获取原始图像。S22: Preprocess the Chinese image to be recognized to obtain the original image.
其中,原始图像是对待识别中文图像进行预处理后得到的排除干扰因素的图像。具体地,由于待识别中文图像中可能包含多种干扰因素,如色彩繁多,不利于后续的识别。因此需要对待识别中文图像进行预处理,以获取排除干扰因素的原始图像,该原始图像可以理解为待识别中文图像排除背景图片后获取的图片。Among them, the original image is an image obtained by pre-processing the Chinese image to be identified and excluding interference factors. Specifically, as the Chinese image to be identified may contain multiple interference factors, such as numerous colors, it is not conducive to subsequent identification. Therefore, the Chinese image to be identified needs to be pre-processed to obtain the original image that excludes interference factors. The original image can be understood as the image obtained after the background image is excluded from the Chinese image to be identified.
在一实施例中,如图6所示,步骤S22中,即对待识别中文图像进行预处理,获取原始图像,具体包括如下步骤:In an embodiment, as shown in FIG. 6, in step S22, the Chinese image to be recognized is pre-processed to obtain the original image, which specifically includes the following steps:
S221:对待识别中文图像进行放大和灰度化处理,获取灰度化图像。S221: Enlarge and grayscale the Chinese image to be recognized to obtain a grayscale image.
其中,灰度化图像是对待识别中文图像进行放大和灰度化处理后获取的灰度化图像。该灰度化图像包括一像素值矩阵。像素值矩阵是指包含待识别中文图像中每个像素对应的像素值的矩阵。本实施例中,服务器采用imread函数读取待识别中文图像中每个像素的像素值,并对待识别中文图像进行放大和灰度化处理,获取灰度化图像。imread函数是计算机语言中的一个函数,用于读取图像文件中的像素值。像素值是原始图像被数字化时由计算机赋予的值。The grayscale image is a grayscale image obtained after the Chinese image to be recognized is enlarged and grayscale processed. The grayed image includes a matrix of pixel values. The pixel value matrix refers to a matrix containing pixel values corresponding to each pixel in a Chinese image to be identified. In this embodiment, the server uses the imread function to read the pixel value of each pixel in the Chinese image to be identified, and performs enlargement and grayscale processing on the Chinese image to be identified to obtain a grayscale image. The imread function is a function in computer language for reading pixel values in an image file. The pixel value is a value assigned by the computer when the original image is digitized.
由于待识别中文图像中可能包含多种颜色,而颜色本身,非常容易受到光照等因素的影响,同类的物体颜色有很多变化,所以颜色本身难以提供关键信息,因此需要对待识别中文图像进行灰度化处理,以排除干扰,减少图像的复杂度和信息处理量。但由于待识别中文图像中的手写汉字的尺寸较小时,若直接进行灰度化处理,会导致手写汉字的笔画的厚度过小,会被当成干扰项排除,因此为了增加文字笔画的厚度,需先将待识别中文图像进行放大处理,再进行灰度化处理,以避免直接进行灰度化处理,导致手写汉字的笔画的厚度过小被当成干扰项排除的问题。The Chinese image to be identified may contain multiple colors, and the color itself is very susceptible to factors such as light. There are many changes in the color of similar objects, so it is difficult for the color itself to provide key information. Therefore, it is necessary to grayscale the Chinese image to be identified. Processing to eliminate interference, reduce the complexity of the image and the amount of information processing. However, if the size of the handwritten Chinese characters in the Chinese image to be recognized is small, if the grayscale processing is directly performed, the thickness of the strokes of the handwritten Chinese characters will be too small and will be excluded as interference items. Therefore, in order to increase the thickness of the text strokes, Enlarging the Chinese image to be identified first, and then graying it to avoid the graying process directly, which leads to the problem that the thickness of strokes of handwritten Chinese characters is too small to be excluded as interference items.
具体地,服务器按照如下公式对原始图像进行放大处理:x→x r,其中,x代表矩阵M中的元素,r 为次数,将变化后的元素x r替换像素值矩阵M中x。 Specifically, the server enlarges the original image according to the following formula: x → x r , where x represents an element in the matrix M, r is the number of times, and the changed element x r replaces x in the pixel value matrix M.
灰度化处理是将待识别中文图像呈现出明显的黑白效果的处理。具体地,对放大后的图像进行灰度化处理包括:待识别中文图像中的每个像素的颜色都是通过R(红)、G(绿)和B(蓝)三个分量决定的,而每个分量有0-255这256种值可取(0最暗表示黑色,255最亮表示白色)。而灰度化图像是R、G和B三个分量相同的一种特殊的彩色图像。本实施例中,服务器可直接采用imread函数读取待识别中文图像,即可获取灰度化图像中每个像素对应的R、G和B三个分量的具体数值。The graying process is a process for rendering the Chinese image to be recognized to have a clear black and white effect. Specifically, performing grayscale processing on the enlarged image includes: the color of each pixel in the Chinese image to be identified is determined by three components of R (red), G (green), and B (blue), and Each component has 256 values from 0 to 255 (0 is the darkest, and 255 is the brightest, white). The grayscale image is a special color image with the same three components of R, G, and B. In this embodiment, the server can directly use the imread function to read the Chinese image to be identified, and the specific values of the three components of R, G, and B corresponding to each pixel in the grayscale image can be obtained.
S222:对灰度化图像进行标准化处理,获取原始图像。S222: Standardize the grayscale image to obtain the original image.
其中,标准化处理是指对灰度化图像进行标准的变换处理,使之变换为一固定标准形式的处理。具体地,由于灰度化图像中每个像素的像素值比较分散,导致数据的数量级不统一,会影响后续模型识别的准确率,因此需要将灰度化图像进行标准化处理,以统一数据的数量级。Among them, the standardization process refers to a process of performing a standard transformation process on a grayscale image to transform it into a fixed standard form. Specifically, because the pixel values of each pixel in the grayscale image are scattered, the magnitude of the data is not uniform, which will affect the accuracy of subsequent model recognition. Therefore, the grayscale image needs to be standardized to uniformize the magnitude of the data. .
具体地,服务器采用标准化处理的公式对灰度化图像进行标准化处理,以避免灰度化图像中像素值较分散,导致数据的数量级不统一的问题。其中,标准化处理的公式为
Figure PCTCN2018094235-appb-000010
X是灰度化图像M的像素值,X′是原始图像的像素值,M min是灰度化图像M中最小的像素值,M max是灰度化图像M中最大的像素值。
Specifically, the server standardizes the grayscale image by using a formula for normalization processing to avoid the problem that the pixel values in the grayscale image are scattered and the order of data is not uniform. Among them, the standardization formula is
Figure PCTCN2018094235-appb-000010
X is the pixel value of the grayed image M, X ′ is the pixel value of the original image, M min is the smallest pixel value in the grayed image M, and M max is the largest pixel value in the grayed image M.
S23:采用核密度估计算法对原始图像进行处理,去除背景图片,获取包括手写汉字的目标图像。S23: Use the kernel density estimation algorithm to process the original image, remove the background image, and obtain a target image including handwritten Chinese characters.
其中,核密度估计算法(kernel density estimation)是一种从数据样本本身出发研究数据分布特征,用于估计概率密度函数的非参数方法。目标图像是指采用核密度估计算法对原始图像进行处理获取只包含手写汉字的图像。具体地,服务器采用核密度估计算法对原始图像进行处理,以排除背景图片干扰,获取包括手写汉字的目标图像。Among them, the kernel density estimation algorithm (kernel density estimation) is a non-parametric method that studies the data distribution characteristics from the data sample itself to estimate the probability density function. The target image refers to an image that contains only handwritten Chinese characters by processing the original image using a kernel density estimation algorithm. Specifically, the server uses a kernel density estimation algorithm to process the original image to eliminate background image interference and obtain a target image including handwritten Chinese characters.
具体地,核密度估计算法的计算公式为
Figure PCTCN2018094235-appb-000011
其中,K(.)为核函数,h为像素值范围,x为要估计概率密度的像素的像素值,x i为h范围内的第i个像素值,n为h范围内的像素值x的个数,
Figure PCTCN2018094235-appb-000012
表示像素的估计概率密度。
Specifically, the calculation formula of the kernel density estimation algorithm is
Figure PCTCN2018094235-appb-000011
Among them, K (.) Is the kernel function, h is the pixel value range, x is the pixel value of the pixel whose probability density is to be estimated, x i is the i-th pixel value in the h range, and n is the pixel value x in the h range. Number of
Figure PCTCN2018094235-appb-000012
Represents the estimated probability density of a pixel.
在一实施例中,如图7所示,步骤S23中,即采用核密度估计算法对原始图像进行处理,去除背景图片,获取包括手写汉字的目标图像,具体包括如下步骤:In an embodiment, as shown in FIG. 7, in step S23, the original image is processed by using a kernel density estimation algorithm to remove the background image to obtain a target image including handwritten Chinese characters, which specifically includes the following steps:
S231:对原始图像中的像素值进行统计,获取原始图像直方图。S231: Perform statistics on pixel values in the original image to obtain a histogram of the original image.
其中,原始图像直方图是对原始图像中的像素值进行统计所获取的直方图。直方图(Histogram)是由一系列高度不等的纵向条纹或线段表示数据分布的情况的一种统计报告图。本实施例中,原始图像直方图的横轴表示像素值,纵轴表示像素值对应的出现频率。服务器通过对原始图像中的像素值进行统计,获取原始图像直方图,以便能够直观的看到原始图像中像素值的分布情况,为后续高斯核密度估计算法进行估计提供技术支持。The original image histogram is a histogram obtained by statistically calculating pixel values in the original image. Histogram (Histogram) is a kind of statistical report diagram that represents the distribution of data by a series of vertical stripes or line segments of varying heights. In this embodiment, the horizontal axis of the histogram of the original image represents pixel values, and the vertical axis represents the appearance frequency corresponding to the pixel values. The server obtains the histogram of the original image by counting the pixel values in the original image, so that it can intuitively see the distribution of the pixel values in the original image, and provides technical support for subsequent Gaussian kernel density estimation algorithms.
S232:采用高斯核密度估计算法对原始图像直方图进行处理,获取与原始图像直方图对应的至少一个频率极大值和至少一个频率极小值。S232: The original image histogram is processed by using a Gaussian kernel density estimation algorithm to obtain at least one frequency maximum and at least one frequency minimum corresponding to the original image histogram.
其中,高斯核密度估计算法是指核密度估计算法中的核函数为高斯核函数的核密度估计方法。高斯核函数的公式为
Figure PCTCN2018094235-appb-000013
其中,K (x)指像素(自变量)为x的高斯核函数,x指有效图像中的像素值,e和π为常数。频率极大值指在频率分布直方图中,不同频率区间上的极大值。频率极小值指在频率分布直方图中,在同一频率区间上与频率极大值相对应的极小值。
Among them, the Gaussian kernel density estimation algorithm refers to a kernel density estimation method in which the kernel function is a Gaussian kernel function. The formula of the Gaussian kernel function is
Figure PCTCN2018094235-appb-000013
Among them, K (x) refers to a Gaussian kernel function in which pixels (independent variables) are x, x refers to a pixel value in an effective image, and e and π are constants. Frequency maxima refer to the maxima at different frequency intervals in the frequency distribution histogram. The frequency minimum value refers to the minimum value corresponding to the frequency maximum value in the same frequency interval in the frequency distribution histogram.
具体地,采用高斯核密度函数估算方法对原始图像对应的频率分布直方图进行高斯平滑处理,获取该频率分布直方图对应的高斯平滑曲线。基于该高斯平滑曲线上的频率极大值和频率极小值,获取频率极大值和频率极小值对应横轴上的像素值,以便后续基于获取到的频率极大值和频率极小值对应的像素值便于对原始图像进行分层切分处理,获取分层图像。Specifically, a Gaussian kernel density function estimation method is used to perform Gaussian smoothing on the frequency distribution histogram corresponding to the original image, and obtain a Gaussian smooth curve corresponding to the frequency distribution histogram. Based on the frequency maxima and frequency minima on the Gaussian smooth curve, obtain the pixel values on the horizontal axis corresponding to the frequency maxima and frequency minima in order to subsequently based on the obtained frequency maxima and frequency minima Corresponding pixel values are convenient for hierarchical segmentation of the original image to obtain a layered image.
S233:基于频率极大值和频率极小值对原始图像进行分层切分处理,获取分层图像。S233: Perform hierarchical segmentation processing on the original image based on the frequency maximum and frequency minimum to obtain a layered image.
其中,分层图像是基于极大值和极小值对原始图像进行分层切分处理所获取的图像。服务器先获取频率极大值和频率极小值对应的像素值,根据频率极大值对应的像素值对原始图像进行分层处理,原始图像中有多少个频率极大值,则对应的原始图像的像素值就被划分为多少类;然后以频率极小值对应的像素值作为类之间的边界值,根据类及类之间的边界,对该原始图像进行分层处理,以获取分层图像。The layered image is an image obtained by performing layered segmentation processing on the original image based on the maximum and minimum values. The server first obtains the pixel values corresponding to the maximum frequency value and the minimum frequency value, and processes the original image according to the pixel values corresponding to the maximum frequency value. How many frequency maximum values are in the original image, the corresponding original image The number of pixel values is divided into classes; then the pixel value corresponding to the minimum frequency value is used as the boundary value between the classes, and the original image is layered according to the class and the boundary between the classes to obtain the layering image.
如原始图像中的频率极大值对应的像素值分别为11、53、95、116和158,频率极小值对应的像素值分别为21、63、105和135。根据原始图像中的频率极大值的个数可以确定该原始图像的像素值可以被分为5类,该原始图像可以被分为5层,频率极小值对应的像素值作为类之间的边界值,由于最小的像素值为0,最大的像素值为255,因此,根据类之间的边界值则可以确定以像素值为11的分层图像,该分层图像对应的像素值为[0,21);以像素值为53的分层图像,该分层图像对应的像素值为[21,63);以像素值为95的分层图像,该分层图像对应的像素值为[63,105);以像素值为116的分层图像,该分层图像对应的像素值为[105,135);以像素值为158的分层图像,该分层图像对应的像素值为[135,255]。For example, the pixel values corresponding to the frequency maximum in the original image are 11, 53, 95, 116, and 158, and the pixel values corresponding to the minimum frequency are 21, 63, 105, and 135, respectively. According to the number of frequency maxima in the original image, it can be determined that the pixel values of the original image can be divided into 5 categories, the original image can be divided into 5 layers, and the pixel values corresponding to the frequency minima are used as the Boundary value, because the minimum pixel value is 0 and the maximum pixel value is 255. Therefore, according to the boundary value between classes, a layered image with a pixel value of 11 can be determined, and the pixel value corresponding to the layered image is [ 0,21); a layered image with a pixel value of 53 and the corresponding pixel value is [21,63); a layered image with a pixel value of 95 and the corresponding pixel value is [ 63,105); a layered image with a pixel value of 116 and the corresponding pixel value is [105,135); a layered image with a pixel value of 158 and the corresponding layer value is [135,255].
S234:基于分层图像,获取包括手写汉字的目标图像。S234: Obtain a target image including handwritten Chinese characters based on the layered image.
服务器在获取分层图像后,对分层图像进行二值化、腐蚀和叠加处理,以获取包括手写汉字的目标图像。其中,二值化处理是指将分层图像上的像素点的像素值设置为0(黑色)或1(白色),将整个分层图像呈现出明显的黑白效果的处理。对分层图像进行二值化处理后,对二值化处理后的分层图像进行腐蚀处理,去除背景图片部分,保留分层图像上的手写汉字部分。由于每个分层图像上的像素值是属于不同范围的像素值,因此,对分层图像进行腐蚀处理后,还需要将每个分层图像叠加,生成仅含有手写汉字的目标图像。其中,叠加处理指将分层后的仅保留有手写字部分的图像叠加成一个图像的处理过程,从而实现获取只包含手写汉字的目标图像的目的。本实施例中,采用imadd函数对分层图像进行叠加处理,以获取只包含手写汉字的目标图像。imadd函数是计算机语言中的一个函数,用于对分层图像进行叠加。After obtaining the layered image, the server performs binarization, erosion, and superposition processing on the layered image to obtain a target image including handwritten Chinese characters. The binarization process refers to a process in which the pixel value of a pixel on a layered image is set to 0 (black) or 1 (white), and the entire layered image presents an obvious black and white effect. After the layered image is binarized, the binarized layered image is corroded to remove the background image part and retain the handwritten Chinese characters on the layered image. Because the pixel values on each layered image are pixel values belonging to different ranges, after the layered image is corroded, each layered image needs to be superimposed to generate a target image containing only handwritten Chinese characters. The superimposing process refers to a process of superimposing a layered image with only a handwritten portion into an image, thereby achieving the purpose of obtaining a target image containing only handwritten Chinese characters. In this embodiment, the layered image is superimposed using the imadd function to obtain a target image containing only handwritten Chinese characters. The imadd function is a function in computer language for superimposing layered images.
在一个实施例中,如图8所示,步骤S234中,即基于分层图像,获取包括手写汉字的目标图像,具体包括如下步骤:In one embodiment, as shown in FIG. 8, in step S234, that is, based on the layered image, obtaining a target image including handwritten Chinese characters, specifically includes the following steps:
S2341:对分层图像进行二值化处理,获取二值化图像。S2341: Binarize the layered image to obtain a binarized image.
二值化图像指对分图像进行二值化处理获取的图像。具体地,服务器获取分层图像后,基于分层图像的采样像素值和预先选取的阈值进行比较,将采样像素值大于或等于阈值的像素值设置为1,小于阈值的像素值设置为0的过程。采样像素值是分层图像中每一像素点对应的像素值。阈值的大小会影响分层图像二值化处理的效果,阈值选取合适时,对分层图像进行二值化处理的效果较好;阈值选取不合适时,会影响分层图像二值化处理的效果。为了方便操作,简化计算过程,本实施例中的阈值是由开发人员根据经验确定。对分层图像进行二值化处理,方便后续进行腐蚀处理。A binarized image refers to an image obtained by binarizing a sub-image. Specifically, after the server obtains the layered image, it compares the sampled pixel value of the layered image with a preselected threshold, and sets the pixel value greater than or equal to the threshold to 1 and the pixel value less than the threshold to 0. process. The sampled pixel value is the pixel value corresponding to each pixel point in the layered image. The size of the threshold value will affect the effect of the binarization process of the layered image. When the threshold value is selected properly, the effect of the binarization process on the layered image is better; when the threshold value is not selected properly, the effect of the binarization process of the layered image will be affected. effect. To facilitate operations and simplify the calculation process, the threshold in this embodiment is determined by the developer based on experience. Binarize the layered image to facilitate subsequent corrosion treatment.
S2342:对二值化图像中的像素进行检测标记,获取二值化图像对应的连通区域。S2342: Detect pixels in the binarized image to obtain a connected area corresponding to the binarized image.
其中,连通区域是指某一特定像素周围的邻接像素所围成的区域。在二值化图像中连通区域是指其周围的邻接像素均为0,某一特定像素与邻接像素为1,例如某特定像素为0,其周围的邻接像素为1,则将邻接像素所围成的区域作为连通区域。The connected area refers to an area surrounded by adjacent pixels around a specific pixel. In a binarized image, a connected region means that the neighboring pixels around it are all 0, and a specific pixel and the neighboring pixel are 1, for example, a particular pixel is 0, and the surrounding neighboring pixels are 1, and the neighboring pixels are surrounded. The resulting area is used as the connected area.
具体地,二值化图像对应一像素矩阵,其中包含行和列。对二值化图像中的像素进行检测标记具体包括如下过程:(1)对像素矩阵进行逐行扫描,把每一行中连续的白色像素组成一个序列称为一个团,并记下它的起点、终点以及所在的行号。(2)对于除了第一行外的所有行里的团,如果它与前一行中的所有团都没有重合区域,则给它一个新的标号;如果它仅与上一行中一个团有重合区域,则将上一行的那个团的标号赋给它;如果它与上一行的2个以上的团有重合区域,则给当前团赋一个相关联团的最小标号,并将上一行的这几个团中的标记写入等价对,说明它们属于一类。例如,若第二行中与上一行 有2个团(1和2)有重合区域,则赋予该团上一行的2个团中的最小标号即1,并将上一行的这几个团中的标记写入等价对即将(1,2)记为等价对。等价对是指互相连通的两个团的标记,例如(1,2)表示标记1的团与标记2的团互相连通即为一个连通区域。本实施例中是以像素矩阵中某个特定像素相邻的8个邻接像素作为该元素的连通区域。Specifically, the binarized image corresponds to a pixel matrix, which includes rows and columns. Detecting pixels in a binarized image specifically includes the following processes: (1) Scan the pixel matrix line by line, group consecutive white pixels in each line into a sequence called a cluster, and note its starting point, End point and line number. (2) For the clique in all rows except the first row, if it does not overlap with any clique in the previous row, give it a new label; if it only overlaps with a clique in the previous row , Assign the label of the group in the previous line to it; if it has a coincident area with more than 2 groups in the previous line, give the current group a minimum label of the associated group, and assign these The tokens in the clique are written into equivalent pairs, indicating that they belong to a class. For example, if there are 2 clusters (1 and 2) in the second row with overlapping areas, then the smallest number given to the 2 clusters in the previous row is 1, and the groups in the previous row are assigned The equivalence pair written by the tag will be recorded as (1, 2) equivalence pair. Equivalent pairs refer to the marks of two cliques connected to each other. For example, (1, 2) indicates that the clique of mark 1 and the clique of mark 2 are connected to each other, which is a connected region. In this embodiment, eight adjacent pixels adjacent to a specific pixel in the pixel matrix are used as the connected region of the element.
S2343:对二值化图像对应的连通区域进行腐蚀和叠加处理,获取包括手写汉字的目标图像。S2343: Eroding and superimposing the connected area corresponding to the binary image to obtain a target image including handwritten Chinese characters.
其中,腐蚀处理是用于形态学中去除图像的某部分的内容的操作。采用MATLAB中内置的imerode函数对二值化图像的连通区域进行腐蚀处理。具体地,对二值化图像对应的连通区域进行腐蚀处理包括如下步骤:首先,选取一个n×n的结构元素,本实施例中是以像素矩阵中每个元素相邻的8个元素值作为该元素的连通区域的,因此,选取的结构元素为3×3的像素矩阵。结构元素是一个n×n的像素矩阵,其中的矩阵元素包括0或1。对分层二值化图像的像素矩阵进行扫描,获取像素值为1的像素点即连通区域内的像素点,比较该像素点相邻的8个邻接像素是否全为1,若全为1,则保持不变;若不全为1,则像素矩阵中该像素点相邻的8个邻接像素都变为0(黑色)。该变为0部分则为分层二值化图像被腐蚀的部分。Matlab是在数学科技应用领域中数值计算方面的应用软件。Among them, the etching process is an operation for removing the content of a part of an image in morphology. The built-in imerode function is used to etch the connected areas of the binary image. Specifically, etching the connected region corresponding to the binarized image includes the following steps: First, an n × n structural element is selected. In this embodiment, the value of 8 elements adjacent to each element in the pixel matrix is used as The connected region of this element is, therefore, the selected structural element is a 3 × 3 pixel matrix. The structural element is an n × n pixel matrix, where the matrix elements include 0 or 1. Scan the pixel matrix of the layered binary image to obtain pixels with a pixel value of 1, that is, pixels in the connected area, and compare whether the 8 adjacent pixels adjacent to the pixel are all 1, if all 1, Then remain unchanged; if not all 1, the 8 adjacent pixels adjacent to the pixel point in the pixel matrix will become 0 (black). The part that becomes 0 is the part where the layered binarized image is corroded. Matlab is an application software for numerical calculations in the field of mathematical technology applications.
基于预先设置的手写字区域抗腐蚀能力范围对二值化图像进行筛选,对于不在手写字区域抗腐蚀能力范围内的二值化图像部分删除,获取二值化图像中在手写字区域抗腐蚀能力范围内的部分。对筛选出的符合手写字区域抗腐蚀能力范围的每个二值化图像部分对应的像素矩阵进行叠加,就可以获取到仅含有手写汉字的目标图像。其中,手写字区域抗腐蚀能力可以采用公式:
Figure PCTCN2018094235-appb-000014
计算,s 1表示二值化图像中被腐蚀后的总面积,s 2表示二值化图像中被腐蚀前的总面积,p为手写字区域抗腐蚀能力。
The binarized image is filtered based on the preset anti-corrosion capability range of the hand-written region. Partial deletion of the binary image that is not within the anti-corrosion capability of the hand-written region is obtained to obtain the anti-corrosion capability of the hand-written region in the binary image Within the range. The target pixel image containing only handwritten Chinese characters can be obtained by superimposing the pixel matrix corresponding to each binarized image portion that fits the range of the corrosion resistance of the handwritten area. Among them, the anti-corrosion ability of the hand-written area can adopt the formula:
Figure PCTCN2018094235-appb-000014
Calculated, s 1 represents the total area after being corroded in the binarized image, s 2 represents the total area before being corroded in the binarized image, and p is the corrosion resistance of the handwritten area.
例如,预先设置的手写字区域抗腐蚀能力范围为[0.01,0.5],根据公式
Figure PCTCN2018094235-appb-000015
计算每个二值化图像被腐蚀后的总面积和二值化图像被腐蚀前的总面积的比值p。通过计算二值化图像中某区域腐蚀后的总面积和腐蚀前的总面积的比值p不在预先设置的手写字区域抗腐蚀能力范围内,则表示该区域的二值化图像是背景图像而不是手写字,需进行腐蚀处理,以去除该背景图像。若二值化图像中的某区域腐蚀后的总面积和腐蚀前的总面积的比值p在[0.01,0.5]范围内,则表示该区域的二值化图像是手写汉字,需保留。对保留下的二值化图像对应的像素矩阵进行叠加处理,获取含有手写汉字的目标图像。
For example, the preset anti-corrosion range of the handwriting area is [0.01, 0.5], according to the formula
Figure PCTCN2018094235-appb-000015
Calculate the ratio p between the total area of each binarized image and the total area before the binarized image. By calculating the ratio p of the total area after erosion to the total area before erosion in the binarized image, which is not in the range of the anti-corrosion capability of the handwritten area, it means that the binarized image of the area is a background image instead of Write by hand and need to be etched to remove the background image. If the ratio p of the total area after erosion to the total area before erosion in the binarized image is in the range of [0.01, 0.5], it means that the binarized image of the region is a handwritten Chinese character and needs to be retained. The pixel matrix corresponding to the retained binary image is superimposed to obtain a target image containing handwritten Chinese characters.
步骤S2341-S2343中,对分层图像进行二值化处理,获取二值化图像,然后对二值化图像中的像素进行检测标记,获取二值化图像对应的连通区域,对与结构元素不完全一致的像素矩阵中的元素都变为0,元素为0的二值化图像为黑色,该黑色部分则是二值化图像被腐蚀的部分,通过计算二值化图像被腐蚀后的总面积和二值化图像被腐蚀前的总面积的比值p,判断该比值是否在预先设置的手写字区域抗腐蚀能力范围,以便去除每一分层图像中的背景图像,保留手写汉字,最后将每一分层图像进行叠加,达到获取目标图像的目的。In steps S2341-S2343, the binarized image is binarized to obtain a binarized image, and then pixels in the binarized image are detected and labeled to obtain a connected area corresponding to the binarized image. The elements in the identical pixel matrix all become 0, the binarized image with element 0 is black, and the black part is the corroded part of the binarized image. The total area of the binarized image is calculated by calculating And the ratio of the total area of the binarized image before being eroded, to determine whether the ratio is within the preset anti-corrosion range of the handwriting area, in order to remove the background image in each layered image, retain the handwritten Chinese characters, and finally replace each A layered image is superimposed to achieve the purpose of obtaining the target image.
S24:采用文字定位技术对目标图像进行文字定位,获取待识别文字区域。S24: Use text positioning technology to perform text positioning on the target image to obtain the text area to be recognized.
其中,待识别文字区域是指目标图像中只包含文字的区域。由于目标图像中还包括非汉字区域即目标图像中的被腐蚀的部分,为了使识别结果更加准确且节省模型的识别时间,需要对目标图像进行文字定位。文字定位技术包括但不限于采用OCR技术和ctpn网络(Connectionist Text Proposal Network,文本检测网络)进行文字定位。其中,ctpn网络是用于进行图像文字检测的常用网络。OCR(Optical Character Recognition,光学字符识别)技术是指对文本资料的图像文件进行分析识别处理,获取文字及版面信息的过程。一般分为两个步骤:文字定位,即找到文字在图片中的位置和文字识别,即识别出找到的文字。本实施例中,仅采用OCR技术中文字定位的步骤。The text region to be recognized refers to a region in the target image that contains only text. Since the target image also includes a non-Chinese character area, that is, an eroded part of the target image, in order to make the recognition result more accurate and save the recognition time of the model, it is necessary to perform text positioning on the target image. Text positioning technology includes, but is not limited to, text positioning using OCR technology and ctpn network (Connectionist Text Proposal Network, text detection network). Among them, the ctpn network is a commonly used network for image text detection. OCR (Optical Character Recognition, Optical Character Recognition) technology refers to the process of analyzing and recognizing image files of text data to obtain text and layout information. Generally, it is divided into two steps: text positioning, that is, finding the position of the text in the picture, and text recognition, that is, recognizing the found text. In this embodiment, only the text positioning step in the OCR technology is used.
具体地,以OCR技术为例进行文字定位的步骤如下:Specifically, using OCR technology as an example to perform text positioning, the steps are as follows:
1、先采用邻近搜索方法从步骤S2342中获取的连通区域中,任意选取一个连通区域作为起始连通区域,计算剩余连通区域(除其实区域外的其他连通区域)与该起始连通区域之间的距离,选取区域距 离小于预设阈值的连通区域作为目标连通区域,以便确定膨胀操作的方向(即上、下、左和右)。其中,预设阈值是预先设定好的用于判断两个连通区域之间的距离的阈值。邻近搜索方法是指从一个起始连通区域出发,可以找到该起始连通区域的水平外切矩形,将连通区域扩展到整个矩形,当该起始连通区域与最邻近区域的距离小于预设阈值时,对这个矩形进行膨胀操作,其膨胀方向是最邻近区域的所在方向的方法。只有当膨胀方向为水平方向时,进行膨胀操作。其中,区域距离的计算公式具体为
Figure PCTCN2018094235-appb-000016
S为起始连通区域,S’为剩余连通区域,(x c,y c)为两个连通区域间的中心向量差,由于两个连通区域的距离是按照临近边界进行计算,因此需要减去区域长度,得到(x c',y c'),其中,
Figure PCTCN2018094235-appb-000017
(w’,z’)表示剩余连通区域右下角的坐标点,(x’,y’)表示剩余连通区域左上角的坐标点,
Figure PCTCN2018094235-appb-000018
(w,z)表示起始连通区域右下角的坐标点,(x,y)表示起始连通区域左上角的坐标点,本实施例中将该点作为原点坐标。
1. First use the proximity search method from the connected areas obtained in step S2342 to randomly select one connected area as the starting connected area, and calculate the remaining connected area (other connected areas except the actual area) and the starting connected area. The selected connected area whose area distance is less than a preset threshold is selected as the target connected area in order to determine the direction of the expansion operation (ie, up, down, left, and right). The preset threshold is a preset threshold used to determine a distance between two connected regions. Proximity search method refers to starting from a starting connected area, which can find the horizontal circumscribed rectangle of the starting connected area, and expand the connected area to the entire rectangle. When the distance between the starting connected area and the nearest neighboring area is less than a preset threshold At this time, the expansion operation is performed on this rectangle, and the expansion direction is the method of the direction of the nearest neighboring area. The expansion operation is performed only when the expansion direction is horizontal. The formula for calculating the area distance is
Figure PCTCN2018094235-appb-000016
S is the initial connected region, S 'is the remaining connected region, and (x c , y c ) is the center vector difference between the two connected regions. Since the distance between the two connected regions is calculated according to the neighboring boundary, it needs to be subtracted. Region length, get (x c ', y c '), where,
Figure PCTCN2018094235-appb-000017
(w ', z') represents the coordinate point of the lower right corner of the remaining connected area, (x ', y') represents the coordinate point of the upper left corner of the remaining connected area,
Figure PCTCN2018094235-appb-000018
(w, z) represents the coordinate point of the lower right corner of the initial connected region, and (x, y) represents the coordinate point of the upper left corner of the initial connected region. In this embodiment, this point is used as the origin coordinate.
2、基于目标连通区域的方向确定膨胀操作的方向,按照确定的膨胀方向对起始连通区域进行膨胀处理,获取待识别文字区域。膨胀处理是腐蚀处理是用于形态学中将图像进行扩大的处理。采用MATLAB中内置的imdilate函数对二值化图像的连通区域进行腐蚀处理。具体地,对起始连通区域进行膨胀处理包括如下步骤:选取一个n×n的结构元素,本实施例中是以像素矩阵中每个元素相邻的8个元素值作为该元素的连通区域的,因此,选取的结构元素为3×3的像素矩阵。结构元素是一个n×n的像素矩阵,其中的矩阵元素包括0或1,按照目标连通区域的方向,对连通区域进行扫描,将结构元素与目标连通区域方向上被结构元素覆盖的连通区域进行逻辑与运算,若结果都为0,则保持不变;若不全为0,则将结构元素覆盖的像素矩阵都变为1,该变为1的部分则为起始连通区域被膨胀的部分。逻辑与运算的运算规则为0&&0=0,0&&1=0,1&&0=0,1&&1=1,其中,&&为逻辑与运算符号。2. Determine the direction of the expansion operation based on the direction of the target connected area, and perform expansion processing on the initial connected area according to the determined expansion direction to obtain the text area to be recognized. The dilation process is an erosion process and is a process for expanding an image in morphology. The built-in imdilate function is used to corrode the connected areas of the binary image. Specifically, the process of expanding the initial connected region includes the following steps: selecting an n × n structural element, in this embodiment, the value of 8 elements adjacent to each element in the pixel matrix is used as the connected region of the element. Therefore, the selected structural element is a 3 × 3 pixel matrix. The structure element is an n × n pixel matrix, where the matrix elements include 0 or 1. The connected area is scanned according to the direction of the target connected area, and the structure element is connected to the connected area covered by the structure element in the direction of the target connected area. The logical AND operation remains unchanged if the results are all 0; if it is not all 0, the pixel matrix covered by the structural elements is changed to 1, and the part that becomes 1 is the expanded part of the initial connected region. The operation rule of the logical AND operation is 0 && 0 = 0, 0 && 1 = 0, 1 && 0 = 0, 1 && 1 = 1, and && is a logical AND operation symbol.
S25:将待识别文字区域输入到目标手写字识别模型中进行识别,获取每一待识别文字区域对应的手写汉字。S25: Input the text area to be recognized into the target handwriting recognition model for recognition, and obtain handwritten Chinese characters corresponding to each text area to be recognized.
其中,目标手写字识别模型是采用中文模型训练方法获取的。具体地,服务器将待识别文字区域输入到目标手写字识别模型中进行识别,使得目标手写字识别模型能够联系上下文进行识别,获取每一待识别文字区域对应的手写汉字,提高识别的准确率。Among them, the target handwriting recognition model is obtained by using a Chinese model training method. Specifically, the server inputs the text area to be recognized into the target handwriting recognition model for recognition, so that the target handwriting recognition model can contact the context for recognition, obtain handwritten Chinese characters corresponding to each text area to be recognized, and improve the accuracy of recognition.
本实施例中,用户可通过计算机设备上的采集模块采集包含手写汉字的待识别中文图像上传到服务器,以使服务器获取待识别中文图像。然后,服务器对待识别中文图像进行预处理,获取排除干扰因素的原始图像。采用核密度估计算法对原始图像进行处理,去除背景图片,获取只包含手写汉字的目标图像,进一步排除干扰。采用文字定位技术对目标图像进行文字定位,获取待识别文字区域,以排除非汉字区域的干扰。服务器将待识别文字区域输入到目标手写字识别模型中进行识别,以使目标手写字识别模型能够联系上下文进行识别,获取每一待识别文字区域对应的手写汉字,提高识别的准确率。In this embodiment, the user can collect the Chinese image to be recognized containing handwritten Chinese characters and upload it to the server through the acquisition module on the computer device, so that the server acquires the Chinese image to be recognized. Then, the server preprocesses the Chinese image to be recognized, and obtains the original image that excludes interference factors. Kernel density estimation algorithm is used to process the original image, remove the background image, and obtain the target image containing only handwritten Chinese characters to further eliminate interference. The text positioning technology is used to locate the text in the target image and obtain the text area to be recognized to eliminate interference from non-Chinese characters. The server inputs the text area to be recognized into the target handwriting recognition model for recognition, so that the target handwriting recognition model can contact the context for recognition, obtain the handwritten Chinese characters corresponding to each text area to be recognized, and improve the accuracy of recognition.
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the sequence numbers of the steps in the above embodiments does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application.
在一实施例中,提供一种中文图像识别装置,该中文图像识别装置与上述实施例中中文图像识别方法一一对应。如图9所示,该中文图像识别装置包括待识别中文图像获取模块21、原始图像获取模块22、目标图像获取模块23、待识别文字区域获取模块24和手写汉字获取模块25。各功能模块详细说明如下:In one embodiment, a Chinese image recognition device is provided, and the Chinese image recognition device corresponds to the Chinese image recognition method in the embodiment described above in a one-to-one manner. As shown in FIG. 9, the Chinese image recognition device includes a Chinese image acquisition module 21 to be identified, an original image acquisition module 22, a target image acquisition module 23, a text region acquisition module 24 and a handwritten Chinese character acquisition module 25. The detailed description of each function module is as follows:
待识别中文图像获取模块21,用于获取待识别中文图像,待识别中文图像包括手写汉字和背景图片。The to-be-recognized Chinese image acquisition module 21 is configured to obtain the to-be-recognized Chinese image, and the to-be-recognized Chinese image includes handwritten Chinese characters and background pictures.
原始图像获取模块22,用于对待识别中文图像进行预处理,获取原始图像。The original image acquisition module 22 is configured to preprocess the Chinese image to be recognized to obtain an original image.
目标图像获取模块23,用于采用核密度估计算法对原始图像进行处理,去除背景图片,获取包括 手写汉字的目标图像。A target image acquisition module 23 is configured to process the original image by using a kernel density estimation algorithm, remove the background picture, and obtain a target image including handwritten Chinese characters.
待识别文字区域获取模块24,用于采用文字定位技术对目标图像进行文字定位,获取待识别文字区域。The text region to be recognized acquisition module 24 is configured to perform text positioning on the target image by using text positioning technology to acquire the text region to be recognized.
手写汉字获取模块25,用于将待识别文字区域输入到目标手写字识别模型中进行识别,获取每一待识别文字区域对应的手写汉字。其中,目标手写字识别模型是采用上述实施例中中文模型训练方法获取的。A handwritten Chinese character acquisition module 25 is configured to input a text area to be recognized into a target handwriting recognition model for recognition, and obtain a handwritten Chinese character corresponding to each text area to be recognized. The target handwriting recognition model is obtained by using the Chinese model training method in the foregoing embodiment.
具体地,原始图像获取模块22包括灰度化图像获取单元221和原始图像获取单元222。Specifically, the original image acquisition module 22 includes a grayscale image acquisition unit 221 and an original image acquisition unit 222.
灰度化图像获取单元221,用于对原始图像进行放大和灰度化处理,获取灰度化图像。A grayscale image acquisition unit 221 is configured to perform enlargement and grayscale processing on an original image to obtain a grayscale image.
原始图像获取单元222,用于对灰度化图像进行标准化处理,获取原始图像,其中,标准化处理的公式为
Figure PCTCN2018094235-appb-000019
X是灰度化图像M的像素值,X′是原始图像的像素值,M min是灰度化图像M中最小的像素值,M max是灰度化图像M中最大的像素值。
The original image obtaining unit 222 is configured to perform normalization processing on the grayscale image to obtain the original image. The formula of the normalization processing is:
Figure PCTCN2018094235-appb-000019
X is the pixel value of the grayed image M, X ′ is the pixel value of the original image, M min is the smallest pixel value in the grayed image M, and M max is the largest pixel value in the grayed image M.
具体地,目标图像获取模块23包括原始图像直方图获取单元231、频率极值获取单元232、分层图像获取单元233和目标图像获取单元234。Specifically, the target image acquisition module 23 includes an original image histogram acquisition unit 231, a frequency extreme value acquisition unit 232, a layered image acquisition unit 233, and a target image acquisition unit 234.
原始图像直方图获取单元231,用于对原始图像中的像素值进行统计,获取原始图像直方图。The original image histogram obtaining unit 231 is configured to perform statistics on pixel values in the original image to obtain a histogram of the original image.
频率极值获取单元232,用于采用高斯核密度估计算法对原始图像直方图进行处理,获取与原始图像直方图对应的至少一个频率极大值和至少一个频率极值获取单元,用于频率极小值。A frequency extreme value acquisition unit 232 is configured to process a histogram of the original image by using a Gaussian kernel density estimation algorithm, and obtain at least one frequency maximum value and at least one frequency extreme value acquisition unit corresponding to the histogram of the original image. Small value.
分层图像获取单元233,用于基于频率极大值和频率极小值对原始图像进行分层切分处理,获取分层图像。A layered image acquisition unit 233 is configured to perform layered segmentation processing on the original image based on the frequency maximum and frequency minimum to obtain a layered image.
目标图像获取单元234,用于基于分层图像,获取包括手写汉字的目标图像。The target image acquisition unit 234 is configured to acquire a target image including a handwritten Chinese character based on the layered image.
具体地,目标图像获取单元234包括二值化图像获取子单元2341、连通区域获取子单元2342和目标图像获取子单元2343。Specifically, the target image acquisition unit 234 includes a binarized image acquisition subunit 2341, a connected region acquisition subunit 2342, and a target image acquisition subunit 2343.
二值化图像获取子单元2341,用于对分层图像进行二值化处理,获取二值化图像。A binarized image acquisition subunit 2341 is configured to perform binarization processing on the layered image to obtain a binarized image.
连通区域获取子单元2342,用于对二值化图像中的像素进行检测标记,获取二值化图像对应的连通区域。The connected region acquisition subunit 2342 is configured to detect pixels in the binarized image and obtain a connected region corresponding to the binarized image.
目标图像获取子单元2343,用于对二值化图像对应的连通区域进行腐蚀和叠加处理,获取包括手写汉字的目标图像。A target image acquisition subunit 2343 is configured to perform erosion and superposition processing on the connected areas corresponding to the binary image, and acquire a target image including handwritten Chinese characters.
关于中文图像识别装置的具体限定可以参见上文中对于中文图像识别方法的限定,在此不再赘述。上述中文图像识别装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific limitation of the Chinese image recognition device, please refer to the limitation on the Chinese image recognition method described above, which is not repeated here. Each module in the above-mentioned Chinese image recognition device may be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules may be embedded in the hardware in or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图10所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于用于存储执行中文模型训练方法或中文图像识别方法过程中生成或获取的数据,如目标手写字识别模型或手写汉字。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种中文图像识别方法。In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 10. The computer device includes a processor, a memory, a network interface, and a database connected through a system bus. The processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for running the operating system and computer programs in a non-volatile storage medium. The database of the computer equipment is used to store data generated or obtained during the execution of the Chinese model training method or the Chinese image recognition method, such as the target handwriting recognition model or handwritten Chinese characters. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer program is executed by a processor to implement a Chinese image recognition method.
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行计算机程序时实现以下步骤:获取待识别中文图像,待识别中文图像包括手写汉字和背景图片;对待识别中文图像进行预处理,获取原始图像;采用核密度估计算法对原始图像进行处理,去除背景图片,获取包括手写汉字的目标图像;采用文字定位技术对目标图像进行文字定位,获取待识别文字区域;将待识别文字区域输入到目标手写字识别模型中进行识别,获取每一待识别文字区域对应的手写汉字;其中,目标手写字识别模型是采用中文模型训练方法获取的。In one embodiment, a computer device is provided, including a memory, a processor, and a computer program stored on the memory and executable on the processor. When the processor executes the computer program, the following steps are performed: acquiring a Chinese image to be identified, The Chinese image to be recognized includes handwritten Chinese characters and background pictures; the Chinese image to be recognized is pre-processed to obtain the original image; the original image is processed by using the kernel density estimation algorithm to remove the background image to obtain the target image including the handwritten Chinese character; using text positioning technology Perform text positioning on the target image to obtain the text area to be recognized; input the text area to be recognized into the target handwriting recognition model for recognition, and obtain handwritten Chinese characters corresponding to each text area to be recognized; among them, the target handwriting recognition model uses Chinese model training method.
在一个实施例中,处理器执行计算机程序时还实现以下步骤:对原始图像中的像素值进行统计,获取原始图像直方图;采用高斯核密度估算方法对原始图像直方图进行处理,获取与原始图像直方图对应的至少一个频率极大值和至少一个频率极小值;基于频率极大值和频率极小值对原始图像进行分层切分处理,获取分层图像;基于分层图像,获取包括手写汉字的目标图像。In one embodiment, when the processor executes the computer program, the following steps are further implemented: the pixel values in the original image are counted to obtain the original image histogram; the Gaussian kernel density estimation method is used to process the original image histogram to obtain the original image At least one frequency maximum and at least one frequency minimum corresponding to the image histogram; perform hierarchical segmentation processing on the original image based on the frequency maximum and frequency minimum to obtain a layered image; and based on the layered image, obtain Includes target images of handwritten Chinese characters.
在一个实施例中,处理器执行计算机程序时还实现以下步骤:对分层图像进行二值化处理,获取二值化图像;对二值化图像中的像素进行检测标记,获取核密度估计算法二值化图像对应的连通区域;对二值化图像对应的连通区域进行腐蚀和叠加处理,获取包括手写汉字的目标图像。In one embodiment, when the processor executes the computer program, the following steps are further implemented: binarizing the layered image to obtain the binarized image; detecting pixels in the binarized image to obtain a kernel density estimation algorithm Connected area corresponding to the binarized image; corroding and superimposing the connected area corresponding to the binarized image to obtain a target image including handwritten Chinese characters.
在一个实施例中,提供了一个或多个存储有计算机可读指令的非易失性可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:获取待识别中文图像,待识别中文图像包括手写汉字和背景图片;对待识别中文图像进行预处理,获取原始图像;采用核密度估计算法对原始图像进行处理,去除背景图片,获取包括手写汉字的目标图像;采用文字定位技术对目标图像进行文字定位,获取待识别文字区域;将待识别文字区域输入到目标手写字识别模型中进行识别,获取每一待识别文字区域对应的手写汉字;其中,目标手写字识别模型是采用中文模型训练方法获取的。In one embodiment, one or more non-volatile readable storage media storing computer-readable instructions are provided, and when the computer-readable instructions are executed by one or more processors, the one or more The processors perform the following steps: obtaining a Chinese image to be recognized, which includes handwritten Chinese characters and background pictures; preprocessing the Chinese image to be recognized to obtain the original image; processing the original image using a kernel density estimation algorithm to remove the background image To obtain a target image including handwritten Chinese characters; use text positioning technology to perform text positioning on the target image to obtain the text area to be recognized; input the text area to be recognized into the target handwriting recognition model for recognition, and obtain the correspondence of each text area to be recognized Handwritten Chinese characters; of which, the target handwriting recognition model is obtained using Chinese model training methods.
在一个实施例中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行时还实现以下步骤:对原始图像中的像素值进行统计,获取原始图像直方图;采用高斯核密度估算方法对原始图像直方图进行处理,获取与原始图像直方图对应的至少一个频率极大值和至少一个频率极小值;基于频率极大值和频率极小值对原始图像进行分层切分处理,获取分层图像;基于分层图像,获取包括手写汉字的目标图像。In one embodiment, when the computer-readable instructions are executed by one or more processors, the execution of the one or more processors further implements the following steps: performing statistics on pixel values in the original image to obtain the original Image histogram; Gaussian kernel density estimation method is used to process the original image histogram to obtain at least one frequency maximum and at least one frequency minimum corresponding to the original image histogram; based on the frequency maximum and frequency minimum The original image is subjected to layered segmentation processing to obtain a layered image; based on the layered image, a target image including handwritten Chinese characters is obtained.
在一个实施例中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行时还实现以下步骤:对分层图像进行二值化处理,获取二值化图像;对二值化图像中的像素进行检测标记,获取核密度估计算法二值化图像对应的连通区域;对二值化图像对应的连通区域进行腐蚀和叠加处理,获取包括手写汉字的目标图像。In one embodiment, when the computer-readable instructions are executed by one or more processors, the execution of the one or more processors further implements the following steps: binarizing the layered image to obtain two Digitized image; detect and mark the pixels in the binarized image to obtain the connected area corresponding to the kernel density estimation algorithm binarized image; etch and overlay the connected area corresponding to the binarized image to obtain handwritten Chinese characters The target image.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those of ordinary skill in the art can understand that the implementation of all or part of the processes in the methods of the above embodiments can be completed by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage. In the medium, the computer program, when executed, may include the processes of the embodiments of the methods described above. Wherein, any reference to the storage, storage, database, or other media used in the embodiments provided in this application may include non-volatile and / or volatile storage. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。Those skilled in the art can clearly understand that, for the convenience and brevity of the description, only the above-mentioned division of functional units and modules is used as an example. In practical applications, the above functions can be assigned by different functional units, Module completion, that is, dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above.
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。The above-mentioned embodiments are only used to describe the technical solution of the present application, but not limited thereto. Although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that they can still implement the foregoing implementations. The technical solutions described in the examples are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in Within the scope of this application.

Claims (20)

  1. 一种中文模型训练方法,其特征在于,包括:A Chinese model training method, comprising:
    获取训练手写中文图像;Obtain training handwritten Chinese images;
    将所述训练手写中文图像按预设比例划分成训练集和测试集;Dividing the training handwritten Chinese image into a training set and a test set according to a preset ratio;
    对所述训练集中的训练手写中文图像进行顺序标注,并将标注好的训练手写中文图像输入到卷积神经网络-长短时记忆神经网络中进行训练,采用时序分类算法对所述卷积神经网络-长短时记忆神经网络的网络参数进行更新,获取原始手写字识别模型;Sequentially label the trained handwritten Chinese images in the training set, and input the labeled trained handwritten Chinese images into a convolutional neural network-long and short-term memory neural network for training, and adopt a time-series classification algorithm to the convolutional neural network -The network parameters of the long-term and short-term memory neural network are updated to obtain the original handwriting recognition model;
    采用所述测试集中的训练手写中文图像对所述原始手写字识别模型进行测试,在测试准确率大于预设准确率时,获取目标手写字识别模型。The original handwriting recognition model is tested using the trained handwritten Chinese images in the test set, and the target handwriting recognition model is obtained when the test accuracy is greater than a preset accuracy rate.
  2. 如权利要求1所述的中文模型训练方法,其特征在于,所述将标注好的训练手写中文图像输入到卷积神经网络-长短时记忆神经网络中进行训练,采用时序分类算法对所述卷积神经网络-长短时记忆神经网络的网络参数进行更新,获取原始手写字识别模型,包括:The Chinese model training method according to claim 1, wherein the labeled training handwritten Chinese image is input to a convolutional neural network-long and short-term memory neural network for training, and the time-series classification algorithm is used for the volume Product neural network-network parameters of long-term and short-term memory neural network are updated to obtain the original handwriting recognition model, including:
    在卷积神经网络中对所述训练手写中文图像进行特征提取,获取中文图像特征;Performing feature extraction on the trained handwritten Chinese image in a convolutional neural network to obtain Chinese image features;
    在长短时记忆神经网络的隐藏层采用第一激活函数对所述中文图像特征进行处理,获取携带激活状态标识的神经元;Processing the Chinese image features using a first activation function in a hidden layer of the long-term and short-term memory neural network to obtain a neuron carrying an activation state identifier;
    在所述长短时记忆神经网络的隐藏层采用第二激活函数对所述携带激活状态标识的神经元进行处理,获取长短时记忆神经网络输出层的输出;Applying a second activation function to the neuron carrying the activation state identifier in the hidden layer of the long-term and short-term memory neural network to obtain the output of the long-term and short-term memory neural network output layer;
    根据所述长短时记忆神经网络输出层的输出,采用时序分类算法对所述卷积神经网络-长短时记忆神经网络的网络参数进行更新,获取所述目标手写字识别模型。According to the output of the long-short-term memory neural network output layer, a time series classification algorithm is used to update network parameters of the convolutional neural network-long-short-term memory neural network to obtain the target handwriting recognition model.
  3. 如权利要求2所述的中文模型训练方法,其特征在于,所述时序分类算法的公式具体为:E loss=-ln∏ (x,z)∈Sp(z|x),p(z|x)=a(t,u)b(t,u),其中,p(z|x)表示输入所述中文图像特征x,在所述长短时记忆神经网络输出层的输出为z的概率,a(t,u)表示第t时刻第u个顺序标签对应的所述中文图像特征在长短时记忆神经网络隐藏层的前向输出,b(t,u)表示第t时刻第u个顺序标签对应的所述中文图像特征在长短时记忆神经网络隐藏层的后向输出。 The Chinese model training method according to claim 2, wherein the formula of the time-series classification algorithm is: E loss = -ln∏ (x, z) ∈ S p (z | x), p (z | x) = a (t, u) b (t, u), where p (z | x) represents the probability that the Chinese image feature x is input, and the output of the output layer of the memory neural network is z, a (t, u) represents the forward output of the Chinese image feature corresponding to the u-th order label at the t-th time in the hidden layer of the long-term memory neural network, and b (t, u) represents the u-th order label at the t-th time The corresponding Chinese image features are output backward in the hidden layer of the long-term and short-term memory neural network.
  4. 一种中文图像识别方法,其特征在于,包括A Chinese image recognition method, comprising:
    获取待识别中文图像,所述待识别中文图像包括手写汉字和背景图片;Obtaining a Chinese image to be identified, where the Chinese image to be identified includes handwritten Chinese characters and background pictures;
    对所述待识别中文图像进行预处理,获取原始图像;Preprocessing the Chinese image to be identified to obtain an original image;
    采用核密度估计算法对所述原始图像进行处理,去除所述背景图片,获取包括所述手写汉字的目标图像;Processing the original image using a kernel density estimation algorithm, removing the background picture, and obtaining a target image including the handwritten Chinese character;
    采用文字定位技术对所述目标图像进行文字定位,获取待识别文字区域;Text positioning the target image using text positioning technology to obtain the text area to be recognized;
    将待识别文字区域输入到目标手写字识别模型中进行识别,获取每一所述待识别文字区域对应的手写汉字;其中,目标手写字识别模型是采用权利要求1-3任意一项所述中文模型训练方法获取的。Input the text area to be recognized into the target handwriting recognition model for recognition, and obtain handwritten Chinese characters corresponding to each of the text area to be recognized; wherein the target handwriting recognition model adopts the Chinese language according to any one of claims 1-3 Obtained by the model training method.
  5. 如权利要求4所述的中文图像识别方法,其特征在于,采用核密度估计算法对所述原始图像进行处理,获取保留所述手写汉字的目标图像,包括:The Chinese image recognition method according to claim 4, wherein processing the original image by using a kernel density estimation algorithm to obtain a target image retaining the handwritten Chinese characters comprises:
    对所述原始图像中的像素值进行统计,获取原始图像直方图;Performing statistics on pixel values in the original image to obtain a histogram of the original image;
    采用高斯核密度估算方法对所述原始图像直方图进行处理,获取与原始图像直方图对应的至少一个频率极大值和至少一个频率极小值;Processing the original image histogram using a Gaussian kernel density estimation method to obtain at least one frequency maximum and at least one frequency minimum corresponding to the original image histogram;
    基于所述频率极大值和频率极小值对所述原始图像进行分层切分处理,获取分层图像;Performing hierarchical segmentation processing on the original image based on the frequency maximum and frequency minimum to obtain a layered image;
    基于所述分层图像,获取包括所述手写汉字的目标图像。Based on the layered image, a target image including the handwritten Chinese character is acquired.
  6. 如权利要求5所述的中文图像识别方法,其特征在于,所述基于所述分层图像,获取包括所述手写汉字的目标图像,包括:The Chinese image recognition method according to claim 5, wherein the acquiring a target image including the handwritten Chinese character based on the layered image comprises:
    对所述分层图像进行二值化处理,获取二值化图像;Performing a binarization process on the layered image to obtain a binarized image;
    对所述二值化图像中的像素进行检测标记,获取所述二值化图像对应的连通区域;Detect and mark pixels in the binarized image to obtain a connected area corresponding to the binarized image;
    对所述二值化图像对应的连通区域进行腐蚀和叠加处理,获取所述包括手写汉字的目标图像。Eroding and superimposing the connected area corresponding to the binary image to obtain the target image including handwritten Chinese characters.
  7. 一种中文模型训练装置,其特征在于,包括:A Chinese model training device, comprising:
    训练手写中文图像获取模块,用于获取训练手写中文图像;Training handwritten Chinese image acquisition module for acquiring training handwritten Chinese images;
    训练手写中文图像划分模块,用于将所述训练手写中文图像按预设比例划分成训练集和测试集;A training handwritten Chinese image division module, configured to divide the trained handwritten Chinese image into a training set and a test set according to a preset ratio;
    原始手写字识别模型获取模块,用于对所述训练集中的训练手写中文图像进行顺序标注,并将标注好的训练手写中文图像输入到卷积神经网络-长短时记忆神经网络中进行训练,采用时序分类算法对所述卷积神经网络-长短时记忆神经网络的网络参数进行更新,获取原始手写字识别模型;The original handwriting recognition model acquisition module is used to sequentially label the trained handwritten Chinese images in the training set, and input the labeled trained handwritten Chinese images into a convolutional neural network-long-term short-term memory neural network for training. The time series classification algorithm updates the network parameters of the convolutional neural network-long-term and short-term memory neural network to obtain the original handwriting recognition model;
    目标手写字识别模型获取模块,用于采用所述测试集中的训练手写中文图像对所述原始手写字识别模型进行测试,在测试准确率大于预设准确率时,获取目标手写字识别模型。A target handwriting recognition model acquisition module is used to test the original handwriting recognition model using the trained handwritten Chinese images in the test set, and obtain a target handwriting recognition model when the test accuracy rate is greater than a preset accuracy rate.
  8. 一种中文图像识别装置,其特征在于,包括:A Chinese image recognition device, comprising:
    待识别中文图像获取模块,用于获取待识别中文图像,所述待识别中文图像包括手写汉字和背景图片;A to-be-recognized Chinese image acquisition module, configured to obtain the to-be-recognized Chinese image, wherein the to-be-recognized Chinese image includes handwritten Chinese characters and a background picture;
    原始图像获取模块,用于对所述待识别中文图像进行预处理,获取原始图像;An original image acquisition module, configured to pre-process the Chinese image to be identified to obtain an original image;
    目标图像获取模块,用于采用核密度估计算法对所述原始图像进行处理,去除所述背景图片,获取包括所述手写汉字的目标图像;A target image acquisition module, configured to process the original image by using a kernel density estimation algorithm, remove the background picture, and obtain a target image including the handwritten Chinese character;
    待识别文字区域获取模块,用于采用文字定位技术对所述目标图像进行文字定位,获取待识别文字区域;A to-be-recognized text area acquisition module, configured to use the text positioning technology to perform text positioning on the target image to obtain the to-be-recognized text area;
    手写汉字获取模块,用于将待识别文字区域输入到目标手写字识别模型中进行识别,获取每一所述待识别文字区域对应的手写汉字。A handwritten Chinese character acquisition module is configured to input a text area to be recognized into a target handwriting recognition model for recognition, and obtain a handwritten Chinese character corresponding to each of the text area to be recognized.
  9. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如下步骤:A computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the following steps when the computer program is executed:
    获取训练手写中文图像;Obtain training handwritten Chinese images;
    将所述训练手写中文图像按预设比例划分成训练集和测试集;Dividing the training handwritten Chinese image into a training set and a test set according to a preset ratio;
    对所述训练集中的训练手写中文图像进行顺序标注,并将标注好的训练手写中文图像输入到卷积神经网络-长短时记忆神经网络中进行训练,采用时序分类算法对所述卷积神经网络-长短时记忆神经网络的网络参数进行更新,获取原始手写字识别模型;Sequentially label the trained handwritten Chinese images in the training set, and input the labeled trained handwritten Chinese images into a convolutional neural network-long and short-term memory neural network for training, and adopt a time-series classification algorithm to the convolutional neural network -The network parameters of the long-term and short-term memory neural network are updated to obtain the original handwriting recognition model;
    采用所述测试集中的训练手写中文图像对所述原始手写字识别模型进行测试,在测试准确率大于预设准确率时,获取目标手写字识别模型。The original handwriting recognition model is tested using the trained handwritten Chinese images in the test set, and the target handwriting recognition model is obtained when the test accuracy is greater than a preset accuracy rate.
  10. 如权利要求9所述的计算机设备,其特征在于,所述将标注好的训练手写中文图像输入到卷积神经网络-长短时记忆神经网络中进行训练,采用时序分类算法对所述卷积神经网络-长短时记忆神经网络的网络参数进行更新,获取原始手写字识别模型,包括:The computer device according to claim 9, wherein the labeled training handwritten Chinese image is input to a convolutional neural network-long-term and short-term memory neural network for training, and the convolutional neural network is processed by a time-series classification algorithm. Network-The network parameters of the long-short-term memory neural network are updated to obtain the original handwriting recognition model, including:
    在卷积神经网络中对所述训练手写中文图像进行特征提取,获取中文图像特征;Performing feature extraction on the trained handwritten Chinese image in a convolutional neural network to obtain Chinese image features;
    在长短时记忆神经网络的隐藏层采用第一激活函数对所述中文图像特征进行处理,获取携带激活状态标识的神经元;Processing the Chinese image features using a first activation function in a hidden layer of the long-term and short-term memory neural network to obtain a neuron carrying an activation state identifier;
    在所述长短时记忆神经网络的隐藏层采用第二激活函数对所述携带激活状态标识的神经元进行处理,获取长短时记忆神经网络输出层的输出;Applying a second activation function to the neuron carrying the activation state identifier in the hidden layer of the long-term and short-term memory neural network to obtain the output of the long-term and short-term memory neural network output layer;
    根据所述长短时记忆神经网络输出层的输出,采用时序分类算法对所述卷积神经网络-长短时记忆神经网络的网络参数进行更新,获取所述目标手写字识别模型。According to the output of the long-short-term memory neural network output layer, a time series classification algorithm is used to update network parameters of the convolutional neural network-long-short-term memory neural network to obtain the target handwriting recognition model.
  11. 如权利要求10所述的计算机设备,其特征在于,所述时序分类算法的公式具体为:E loss=-ln∏ (x,z)∈Sp(z|x),p(z|x)=a(t,u)b(t,u),其中,p(z|x)表示输入所述中文图像特征x,在所述长短时记忆神经网络输出层的输出为z的概率,a(t,u)表示第t时刻第u个顺序标签 对应的所述中文图像特征在长短时记忆神经网络隐藏层的前向输出,b(t,u)表示第t时刻第u个顺序标签对应的所述中文图像特征在长短时记忆神经网络隐藏层的后向输出。 The computer device according to claim 10, wherein the formula of the time-series classification algorithm is: E loss = -ln∏ (x, z) ∈ S p (z | x), p (z | x) = A (t, u) b (t, u), where p (z | x) represents the probability of inputting the Chinese image feature x, the output of the output layer of the memory neural network is z, and a ( t, u) represents the forward output of the Chinese image feature corresponding to the u-th order label at time t in the hidden layer of the long-term memory neural network, and b (t, u) represents the u-th order label at time t The Chinese image features are output backward in the hidden layer of the long-term and short-term memory neural network.
  12. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如下步骤:A computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the following steps when the computer program is executed:
    获取待识别中文图像,所述待识别中文图像包括手写汉字和背景图片;Obtaining a Chinese image to be identified, where the Chinese image to be identified includes handwritten Chinese characters and background pictures;
    对所述待识别中文图像进行预处理,获取原始图像;Preprocessing the Chinese image to be identified to obtain an original image;
    采用核密度估计算法对所述原始图像进行处理,去除所述背景图片,获取包括所述手写汉字的目标图像;Processing the original image using a kernel density estimation algorithm, removing the background picture, and obtaining a target image including the handwritten Chinese character;
    采用文字定位技术对所述目标图像进行文字定位,获取待识别文字区域;Text positioning the target image using text positioning technology to obtain the text area to be recognized;
    将待识别文字区域输入到目标手写字识别模型中进行识别,获取每一所述待识别文字区域对应的手写汉字;其中,目标手写字识别模型是采用权利要求1-3任意一项所述中文模型训练方法获取的。Input the text area to be recognized into the target handwriting recognition model for recognition, and obtain handwritten Chinese characters corresponding to each of the text area to be recognized; wherein the target handwriting recognition model adopts the Chinese language according to any one of claims 1-3 Obtained by the model training method.
  13. 如权利要求12所述的计算机设备,其特征在于,采用核密度估计算法对所述原始图像进行处理,获取保留所述手写汉字的目标图像,包括:The computer device according to claim 12, wherein processing the original image using a kernel density estimation algorithm to obtain a target image retaining the handwritten Chinese characters comprises:
    对所述原始图像中的像素值进行统计,获取原始图像直方图;Performing statistics on pixel values in the original image to obtain a histogram of the original image;
    采用高斯核密度估算方法对所述原始图像直方图进行处理,获取与原始图像直方图对应的至少一个频率极大值和至少一个频率极小值;Processing the original image histogram using a Gaussian kernel density estimation method to obtain at least one frequency maximum and at least one frequency minimum corresponding to the original image histogram;
    基于所述频率极大值和频率极小值对所述原始图像进行分层切分处理,获取分层图像;Performing hierarchical segmentation processing on the original image based on the frequency maximum and frequency minimum to obtain a layered image;
    基于所述分层图像,获取包括所述手写汉字的目标图像。Based on the layered image, a target image including the handwritten Chinese character is acquired.
  14. 如权利要求13所述的计算机设备,其特征在于,所述基于所述分层图像,获取包括所述手写汉字的目标图像,包括:The computer device according to claim 13, wherein the acquiring a target image including the handwritten Chinese character based on the layered image comprises:
    对所述分层图像进行二值化处理,获取二值化图像;Performing a binarization process on the layered image to obtain a binarized image;
    对所述二值化图像中的像素进行检测标记,获取所述二值化图像对应的连通区域;Detect and mark pixels in the binarized image to obtain a connected area corresponding to the binarized image;
    对所述二值化图像对应的连通区域进行腐蚀和叠加处理,获取所述包括手写汉字的目标图像。Eroding and superimposing the connected area corresponding to the binary image to obtain the target image including handwritten Chinese characters.
  15. 一个或多个存储有计算机可读指令的非易失性可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:One or more non-volatile readable storage media storing computer readable instructions, characterized in that when the computer readable instructions are executed by one or more processors, the one or more processors are caused to execute The following steps:
    获取训练手写中文图像;Obtain training handwritten Chinese images;
    将所述训练手写中文图像按预设比例划分成训练集和测试集;Dividing the training handwritten Chinese image into a training set and a test set according to a preset ratio;
    对所述训练集中的训练手写中文图像进行顺序标注,并将标注好的训练手写中文图像输入到卷积神经网络-长短时记忆神经网络中进行训练,采用时序分类算法对所述卷积神经网络-长短时记忆神经网络的网络参数进行更新,获取原始手写字识别模型;Sequentially label the trained handwritten Chinese images in the training set, and input the labeled trained handwritten Chinese images into a convolutional neural network-long and short-term memory neural network for training, and adopt a time-series classification algorithm to the convolutional neural network -The network parameters of the long-term and short-term memory neural network are updated to obtain the original handwriting recognition model;
    采用所述测试集中的训练手写中文图像对所述原始手写字识别模型进行测试,在测试准确率大于预设准确率时,获取目标手写字识别模型。The original handwriting recognition model is tested using the trained handwritten Chinese images in the test set, and the target handwriting recognition model is obtained when the test accuracy is greater than a preset accuracy rate.
  16. 如权利要求15所述的非易失性可读存储介质,其特征在于,所述将标注好的训练手写中文图像输入到卷积神经网络-长短时记忆神经网络中进行训练,采用时序分类算法对所述卷积神经网络-长短时记忆神经网络的网络参数进行更新,获取原始手写字识别模型,包括:The non-volatile readable storage medium according to claim 15, wherein the labeled training handwritten Chinese image is input to a convolutional neural network-long-short-term memory neural network for training, and a time-series classification algorithm is used. Updating the network parameters of the convolutional neural network-long and short-term memory neural network to obtain an original handwriting recognition model, including:
    在卷积神经网络中对所述训练手写中文图像进行特征提取,获取中文图像特征;Performing feature extraction on the trained handwritten Chinese image in a convolutional neural network to obtain Chinese image features;
    在长短时记忆神经网络的隐藏层采用第一激活函数对所述中文图像特征进行处理,获取携带激活状态标识的神经元;Processing the Chinese image features using a first activation function in a hidden layer of the long-term and short-term memory neural network to obtain a neuron carrying an activation state identifier;
    在所述长短时记忆神经网络的隐藏层采用第二激活函数对所述携带激活状态标识的神经元进行处理,获取长短时记忆神经网络输出层的输出;Applying a second activation function to the neuron carrying the activation state identifier in the hidden layer of the long-term and short-term memory neural network to obtain the output of the long-term and short-term memory neural network output layer;
    根据所述长短时记忆神经网络输出层的输出,采用时序分类算法对所述卷积神经网络-长短时记忆神经网络的网络参数进行更新,获取所述目标手写字识别模型。According to the output of the long-short-term memory neural network output layer, a time series classification algorithm is used to update network parameters of the convolutional neural network-long-short-term memory neural network to obtain the target handwriting recognition model.
  17. 如权利要求16所述的非易失性可读存储介质,其特征在于,所述时序分类算法的公式具体为:E loss=-ln∏ (x,z)∈Sp(z|x),p(z|x)=a(t,u)b(t,u),其中,p(z|x)表示输入所述中文图像 特征x,在所述长短时记忆神经网络输出层的输出为z的概率,a(t,u)表示第t时刻第u个顺序标签对应的所述中文图像特征在长短时记忆神经网络隐藏层的前向输出,b(t,u)表示第t时刻第u个顺序标签对应的所述中文图像特征在长短时记忆神经网络隐藏层的后向输出。 The non-volatile readable storage medium according to claim 16, wherein the formula of the time series classification algorithm is: E loss = -ln∏ (x, z) ∈ S p (z | x), p (z | x) = a (t, u) b (t, u), where p (z | x) represents the input of the Chinese image feature x, and the output of the output layer of the memory neural network in the short-term and long-term is The probability of z, a (t, u) represents the forward output of the Chinese image feature corresponding to the uth order label at time t in the hidden layer of the memory neural network, and b (t, u) represents the The Chinese image features corresponding to the u sequential labels are output backward in the hidden layer of the long-term and short-term memory neural network.
  18. 一个或多个存储有计算机可读指令的非易失性可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:One or more non-volatile readable storage media storing computer readable instructions, characterized in that when the computer readable instructions are executed by one or more processors, the one or more processors are caused to execute The following steps:
    获取待识别中文图像,所述待识别中文图像包括手写汉字和背景图片;Obtaining a Chinese image to be identified, where the Chinese image to be identified includes handwritten Chinese characters and background pictures;
    对所述待识别中文图像进行预处理,获取原始图像;Preprocessing the Chinese image to be identified to obtain an original image;
    采用核密度估计算法对所述原始图像进行处理,去除所述背景图片,获取包括所述手写汉字的目标图像;Processing the original image using a kernel density estimation algorithm, removing the background picture, and obtaining a target image including the handwritten Chinese character;
    采用文字定位技术对所述目标图像进行文字定位,获取待识别文字区域;Text positioning the target image using text positioning technology to obtain the text area to be recognized;
    将待识别文字区域输入到目标手写字识别模型中进行识别,获取每一所述待识别文字区域对应的手写汉字;其中,目标手写字识别模型是采用权利要求1-3任意一项所述中文模型训练方法获取的。Input the text area to be recognized into the target handwriting recognition model for recognition, and obtain handwritten Chinese characters corresponding to each of the text area to be recognized; wherein the target handwriting recognition model adopts the Chinese language according to any one of claims 1-3 Obtained by the model training method.
  19. 如权利要求18所述的非易失性可读存储介质,其特征在于,采用核密度估计算法对所述原始图像进行处理,获取保留所述手写汉字的目标图像,包括:The non-volatile readable storage medium of claim 18, wherein processing the original image using a kernel density estimation algorithm to obtain a target image retaining the handwritten Chinese characters comprises:
    对所述原始图像中的像素值进行统计,获取原始图像直方图;Performing statistics on pixel values in the original image to obtain a histogram of the original image;
    采用高斯核密度估算方法对所述原始图像直方图进行处理,获取与原始图像直方图对应的至少一个频率极大值和至少一个频率极小值;Processing the original image histogram using a Gaussian kernel density estimation method to obtain at least one frequency maximum and at least one frequency minimum corresponding to the original image histogram;
    基于所述频率极大值和频率极小值对所述原始图像进行分层切分处理,获取分层图像;Performing hierarchical segmentation processing on the original image based on the frequency maximum and frequency minimum to obtain a layered image;
    基于所述分层图像,获取包括所述手写汉字的目标图像。Based on the layered image, a target image including the handwritten Chinese character is acquired.
  20. 如权利要求19所述的非易失性可读存储介质,其特征在于,所述基于所述分层图像,获取包括所述手写汉字的目标图像,包括:The non-volatile readable storage medium according to claim 19, wherein the acquiring a target image including the handwritten Chinese character based on the layered image comprises:
    对所述分层图像进行二值化处理,获取二值化图像;Performing a binarization process on the layered image to obtain a binarized image;
    对所述二值化图像中的像素进行检测标记,获取所述二值化图像对应的连通区域;Detect and mark pixels in the binarized image to obtain a connected area corresponding to the binarized image;
    对所述二值化图像对应的连通区域进行腐蚀和叠加处理,获取所述包括手写汉字的目标图像。Eroding and superimposing the connected area corresponding to the binary image to obtain the target image including handwritten Chinese characters.
PCT/CN2018/094235 2018-06-04 2018-07-03 Chinese model training method, chinese image recognition method, device, apparatus and medium WO2019232853A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810563508.0 2018-06-04
CN201810563508.0A CN109102037B (en) 2018-06-04 2018-06-04 Chinese model training and Chinese image recognition method, device, equipment and medium

Publications (1)

Publication Number Publication Date
WO2019232853A1 true WO2019232853A1 (en) 2019-12-12

Family

ID=64796652

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/094235 WO2019232853A1 (en) 2018-06-04 2018-07-03 Chinese model training method, chinese image recognition method, device, apparatus and medium

Country Status (2)

Country Link
CN (1) CN109102037B (en)
WO (1) WO2019232853A1 (en)

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111190576A (en) * 2019-12-17 2020-05-22 平安医疗健康管理股份有限公司 Character recognition-based component set display method and device and computer equipment
CN111275120A (en) * 2020-01-22 2020-06-12 支付宝(杭州)信息技术有限公司 Training method and device of image recognition model, and image recognition method and device
CN111291758A (en) * 2020-02-17 2020-06-16 北京百度网讯科技有限公司 Method and device for identifying characters of seal
CN111310846A (en) * 2020-02-28 2020-06-19 平安科技(深圳)有限公司 Method, device, storage medium and server for selecting sample image
CN111310868A (en) * 2020-03-13 2020-06-19 厦门大学 Water-based handwritten character recognition method based on convolutional neural network
CN111401363A (en) * 2020-03-12 2020-07-10 上海眼控科技股份有限公司 Frame number image generation method and device, computer equipment and storage medium
CN111401375A (en) * 2020-03-09 2020-07-10 苏宁云计算有限公司 Text recognition model training method, text recognition device and text recognition equipment
CN111507929A (en) * 2020-04-15 2020-08-07 上海眼控科技股份有限公司 Meteorological cloud picture prediction method and device, computer equipment and storage medium
CN111738141A (en) * 2020-06-19 2020-10-02 首都师范大学 Hard-tipped writing calligraphy work judging method
CN111814539A (en) * 2020-05-28 2020-10-23 平安科技(深圳)有限公司 Character recognition method and device based on infrared light and ultraviolet light and computer equipment
CN111861990A (en) * 2020-06-10 2020-10-30 宜通世纪物联网研究院(广州)有限公司 Method, system and storage medium for detecting bad appearance of product
CN111860682A (en) * 2020-07-30 2020-10-30 上海高德威智能交通系统有限公司 Sequence identification method, sequence identification device, image processing equipment and storage medium
CN111881727A (en) * 2020-06-16 2020-11-03 深圳数联天下智能科技有限公司 Live body discrimination method, device and equipment based on thermal imaging and storage medium
CN112001482A (en) * 2020-08-14 2020-11-27 佳都新太科技股份有限公司 Vibration prediction and model training method and device, computer equipment and storage medium
CN112101344A (en) * 2020-08-25 2020-12-18 腾讯科技(深圳)有限公司 Video text tracking method and device
CN112183335A (en) * 2020-09-28 2021-01-05 中国人民大学 Handwritten image recognition method and system based on unsupervised learning
CN112241994A (en) * 2020-09-28 2021-01-19 北京迈格威科技有限公司 Model training method, rendering device, electronic equipment and storage medium
CN112580623A (en) * 2020-12-25 2021-03-30 北京百度网讯科技有限公司 Image generation method, model training method, related device and electronic equipment
CN112732943A (en) * 2021-01-20 2021-04-30 北京大学 Chinese character library automatic generation method and system based on reinforcement learning
CN112784845A (en) * 2021-01-12 2021-05-11 安徽淘云科技有限公司 Handwritten character detection method, electronic equipment and storage device
CN112801085A (en) * 2021-02-09 2021-05-14 沈阳麟龙科技股份有限公司 Method, device, medium and electronic equipment for recognizing characters in image
CN113204984A (en) * 2020-10-10 2021-08-03 河南中医药大学 Traditional Chinese medicine handwritten prescription identification method under small amount of labeled data
CN113269045A (en) * 2021-04-28 2021-08-17 南京大学 Chinese artistic word detection and recognition method under natural scene
CN113362249A (en) * 2021-06-24 2021-09-07 平安普惠企业管理有限公司 Text image synthesis method and device, computer equipment and storage medium
CN113378609A (en) * 2020-03-10 2021-09-10 中国移动通信集团辽宁有限公司 Method and device for identifying agent signature
CN113436222A (en) * 2021-05-31 2021-09-24 新东方教育科技集团有限公司 Image processing method, image processing apparatus, electronic device, and storage medium
CN113505784A (en) * 2021-06-11 2021-10-15 清华大学 Automatic nail annotation analysis method and device, electronic equipment and storage medium
CN113792723A (en) * 2021-09-08 2021-12-14 浙江力石科技股份有限公司 Optimization method and system for litho character recognition
CN114140796A (en) * 2021-11-30 2022-03-04 马鞍山学院 Shaft part surface character visual identification method based on linear array camera
CN114399772A (en) * 2021-12-20 2022-04-26 北京百度网讯科技有限公司 Sample generation, model training and trajectory recognition methods, devices, equipment and medium
CN114549296A (en) * 2022-04-21 2022-05-27 北京世纪好未来教育科技有限公司 Training method of image processing model, image processing method and electronic equipment
CN115424274A (en) * 2022-09-01 2022-12-02 中国海洋大学 Sea-wading picture recognition method and system based on computer vision
CN117218667A (en) * 2023-11-07 2023-12-12 华侨大学 Chinese character recognition method and system based on character roots

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109840524B (en) * 2019-01-04 2023-07-11 平安科技(深圳)有限公司 Text type recognition method, device, equipment and storage medium
CN109858409A (en) * 2019-01-18 2019-06-07 深圳壹账通智能科技有限公司 Manual figure conversion method, device, equipment and medium
CN111488877A (en) * 2019-01-29 2020-08-04 北京新唐思创教育科技有限公司 OCR recognition method, device and terminal for teaching system
CN109902678A (en) * 2019-02-12 2019-06-18 北京奇艺世纪科技有限公司 Model training method, character recognition method, device, electronic equipment and computer-readable medium
CN111626313B (en) * 2019-02-28 2023-06-02 银河水滴科技(北京)有限公司 Feature extraction model training method, image processing method and device
CN110110585B (en) * 2019-03-15 2023-05-30 西安电子科技大学 Intelligent paper reading implementation method and system based on deep learning and computer program
CN110162459A (en) * 2019-04-15 2019-08-23 深圳壹账通智能科技有限公司 Test cases generation method, device and computer readable storage medium
CN110210297B (en) * 2019-04-25 2023-12-26 上海海事大学 Method for locating and extracting Chinese characters in customs clearance image
CN112183563A (en) * 2019-07-01 2021-01-05 Tcl集团股份有限公司 Image recognition model generation method, storage medium and application server
CN112307820B (en) * 2019-07-29 2022-03-22 北京易真学思教育科技有限公司 Text recognition method, device, equipment and computer readable medium
CN110751034B (en) * 2019-09-16 2023-09-01 平安科技(深圳)有限公司 Pedestrian behavior recognition method and terminal equipment
CN111078073B (en) * 2019-12-17 2021-03-23 科大讯飞股份有限公司 Handwriting amplification method and related device
CN111368632A (en) * 2019-12-27 2020-07-03 上海眼控科技股份有限公司 Signature identification method and device
CN111310808B (en) * 2020-02-03 2024-03-22 平安科技(深圳)有限公司 Training method and device for picture recognition model, computer system and storage medium
CN111414916B (en) * 2020-02-29 2024-05-31 中国平安财产保险股份有限公司 Method and device for extracting and generating text content in image and readable storage medium
CN111898603A (en) * 2020-08-10 2020-11-06 上海瑞美锦鑫健康管理有限公司 Physical examination order recognition method and system based on deep neural network
CN112149678A (en) * 2020-09-17 2020-12-29 支付宝实验室(新加坡)有限公司 Character recognition method and device for special language and recognition model training method and device
CN112132050B (en) * 2020-09-24 2024-03-29 北京计算机技术及应用研究所 On-line handwritten Chinese character recognition algorithm and visual key stroke evaluation method
CN112990220B (en) * 2021-04-19 2022-08-05 烟台中科网络技术研究所 Intelligent identification method and system for target text in image
CN113361666B (en) * 2021-06-15 2023-10-10 浪潮金融信息技术有限公司 Handwritten character recognition method, system and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100056457A1 (en) * 2005-08-11 2010-03-04 Barbas Iii Carlos F Zinc Finger Binding Domains for CNN
CN105574513A (en) * 2015-12-22 2016-05-11 北京旷视科技有限公司 Character detection method and device
CN107798327A (en) * 2017-10-31 2018-03-13 北京小米移动软件有限公司 Character identifying method and device
CN107832400A (en) * 2017-11-01 2018-03-23 山东大学 A kind of method that location-based LSTM and CNN conjunctive models carry out relation classification

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105184226A (en) * 2015-08-11 2015-12-23 北京新晨阳光科技有限公司 Digital identification method, digital identification device, neural network training method and neural network training device
CN106408038A (en) * 2016-09-09 2017-02-15 华南理工大学 Rotary Chinese character identifying method based on convolution neural network model
CN106531157B (en) * 2016-10-28 2019-10-22 中国科学院自动化研究所 Regularization accent adaptive approach in speech recognition

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100056457A1 (en) * 2005-08-11 2010-03-04 Barbas Iii Carlos F Zinc Finger Binding Domains for CNN
CN105574513A (en) * 2015-12-22 2016-05-11 北京旷视科技有限公司 Character detection method and device
CN107798327A (en) * 2017-10-31 2018-03-13 北京小米移动软件有限公司 Character identifying method and device
CN107832400A (en) * 2017-11-01 2018-03-23 山东大学 A kind of method that location-based LSTM and CNN conjunctive models carry out relation classification

Cited By (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111190576A (en) * 2019-12-17 2020-05-22 平安医疗健康管理股份有限公司 Character recognition-based component set display method and device and computer equipment
CN111190576B (en) * 2019-12-17 2022-09-23 深圳平安医疗健康科技服务有限公司 Character recognition-based component set display method and device and computer equipment
CN111275120B (en) * 2020-01-22 2022-07-26 支付宝(杭州)信息技术有限公司 Training method and device of image recognition model and image recognition method and device
CN111275120A (en) * 2020-01-22 2020-06-12 支付宝(杭州)信息技术有限公司 Training method and device of image recognition model, and image recognition method and device
CN111291758A (en) * 2020-02-17 2020-06-16 北京百度网讯科技有限公司 Method and device for identifying characters of seal
CN111310846A (en) * 2020-02-28 2020-06-19 平安科技(深圳)有限公司 Method, device, storage medium and server for selecting sample image
CN111401375A (en) * 2020-03-09 2020-07-10 苏宁云计算有限公司 Text recognition model training method, text recognition device and text recognition equipment
CN113378609B (en) * 2020-03-10 2023-07-21 中国移动通信集团辽宁有限公司 Agent proxy signature identification method and device
CN113378609A (en) * 2020-03-10 2021-09-10 中国移动通信集团辽宁有限公司 Method and device for identifying agent signature
CN111401363A (en) * 2020-03-12 2020-07-10 上海眼控科技股份有限公司 Frame number image generation method and device, computer equipment and storage medium
CN111310868A (en) * 2020-03-13 2020-06-19 厦门大学 Water-based handwritten character recognition method based on convolutional neural network
CN111507929A (en) * 2020-04-15 2020-08-07 上海眼控科技股份有限公司 Meteorological cloud picture prediction method and device, computer equipment and storage medium
CN111814539A (en) * 2020-05-28 2020-10-23 平安科技(深圳)有限公司 Character recognition method and device based on infrared light and ultraviolet light and computer equipment
CN111814539B (en) * 2020-05-28 2023-07-21 平安科技(深圳)有限公司 Character recognition method and device based on infrared light and ultraviolet light and computer equipment
CN111861990A (en) * 2020-06-10 2020-10-30 宜通世纪物联网研究院(广州)有限公司 Method, system and storage medium for detecting bad appearance of product
CN111861990B (en) * 2020-06-10 2024-02-13 广东宜通联云智能信息有限公司 Method, system and storage medium for detecting bad appearance of product
CN111881727A (en) * 2020-06-16 2020-11-03 深圳数联天下智能科技有限公司 Live body discrimination method, device and equipment based on thermal imaging and storage medium
CN111881727B (en) * 2020-06-16 2024-02-06 深圳数联天下智能科技有限公司 Living body screening method, device, equipment and storage medium based on thermal imaging
CN111738141A (en) * 2020-06-19 2020-10-02 首都师范大学 Hard-tipped writing calligraphy work judging method
CN111738141B (en) * 2020-06-19 2023-07-07 首都师范大学 Hard-tipped pen calligraphy work judging method
CN111860682A (en) * 2020-07-30 2020-10-30 上海高德威智能交通系统有限公司 Sequence identification method, sequence identification device, image processing equipment and storage medium
CN112001482B (en) * 2020-08-14 2024-05-24 佳都科技集团股份有限公司 Vibration prediction and model training method, device, computer equipment and storage medium
CN112001482A (en) * 2020-08-14 2020-11-27 佳都新太科技股份有限公司 Vibration prediction and model training method and device, computer equipment and storage medium
CN112101344A (en) * 2020-08-25 2020-12-18 腾讯科技(深圳)有限公司 Video text tracking method and device
CN112241994B (en) * 2020-09-28 2024-05-31 爱芯元智半导体股份有限公司 Model training method, rendering method, device, electronic equipment and storage medium
CN112241994A (en) * 2020-09-28 2021-01-19 北京迈格威科技有限公司 Model training method, rendering device, electronic equipment and storage medium
CN112183335A (en) * 2020-09-28 2021-01-05 中国人民大学 Handwritten image recognition method and system based on unsupervised learning
CN113204984A (en) * 2020-10-10 2021-08-03 河南中医药大学 Traditional Chinese medicine handwritten prescription identification method under small amount of labeled data
CN112580623A (en) * 2020-12-25 2021-03-30 北京百度网讯科技有限公司 Image generation method, model training method, related device and electronic equipment
CN112580623B (en) * 2020-12-25 2023-07-25 北京百度网讯科技有限公司 Image generation method, model training method, related device and electronic equipment
CN112784845A (en) * 2021-01-12 2021-05-11 安徽淘云科技有限公司 Handwritten character detection method, electronic equipment and storage device
CN112732943A (en) * 2021-01-20 2021-04-30 北京大学 Chinese character library automatic generation method and system based on reinforcement learning
CN112732943B (en) * 2021-01-20 2023-09-22 北京大学 Chinese character library automatic generation method and system based on reinforcement learning
CN112801085A (en) * 2021-02-09 2021-05-14 沈阳麟龙科技股份有限公司 Method, device, medium and electronic equipment for recognizing characters in image
CN113269045A (en) * 2021-04-28 2021-08-17 南京大学 Chinese artistic word detection and recognition method under natural scene
CN113436222A (en) * 2021-05-31 2021-09-24 新东方教育科技集团有限公司 Image processing method, image processing apparatus, electronic device, and storage medium
CN113505784A (en) * 2021-06-11 2021-10-15 清华大学 Automatic nail annotation analysis method and device, electronic equipment and storage medium
CN113362249A (en) * 2021-06-24 2021-09-07 平安普惠企业管理有限公司 Text image synthesis method and device, computer equipment and storage medium
CN113362249B (en) * 2021-06-24 2023-11-24 广州云智达创科技有限公司 Text image synthesis method, text image synthesis device, computer equipment and storage medium
CN113792723A (en) * 2021-09-08 2021-12-14 浙江力石科技股份有限公司 Optimization method and system for litho character recognition
CN113792723B (en) * 2021-09-08 2024-01-16 浙江力石科技股份有限公司 Optimization method and system for identifying stone carving characters
CN114140796A (en) * 2021-11-30 2022-03-04 马鞍山学院 Shaft part surface character visual identification method based on linear array camera
CN114399772A (en) * 2021-12-20 2022-04-26 北京百度网讯科技有限公司 Sample generation, model training and trajectory recognition methods, devices, equipment and medium
CN114399772B (en) * 2021-12-20 2024-02-27 北京百度网讯科技有限公司 Sample generation, model training and track recognition methods, devices, equipment and media
CN114549296B (en) * 2022-04-21 2022-07-12 北京世纪好未来教育科技有限公司 Training method of image processing model, image processing method and electronic equipment
CN114549296A (en) * 2022-04-21 2022-05-27 北京世纪好未来教育科技有限公司 Training method of image processing model, image processing method and electronic equipment
CN115424274A (en) * 2022-09-01 2022-12-02 中国海洋大学 Sea-wading picture recognition method and system based on computer vision
CN117218667A (en) * 2023-11-07 2023-12-12 华侨大学 Chinese character recognition method and system based on character roots
CN117218667B (en) * 2023-11-07 2024-03-08 华侨大学 Chinese character recognition method and system based on character roots

Also Published As

Publication number Publication date
CN109102037B (en) 2024-03-05
CN109102037A (en) 2018-12-28

Similar Documents

Publication Publication Date Title
WO2019232853A1 (en) Chinese model training method, chinese image recognition method, device, apparatus and medium
CN108710866B (en) Chinese character model training method, chinese character recognition method, device, equipment and medium
WO2019232843A1 (en) Handwritten model training method and apparatus, handwritten image recognition method and apparatus, and device and medium
WO2021120752A1 (en) Region-based self-adaptive model training method and device, image detection method and device, and apparatus and medium
WO2019232852A1 (en) Handwriting training sample obtaining method and apparatus, and device and medium
WO2019232873A1 (en) Character model training method, character recognition method, apparatuses, device and medium
WO2019232872A1 (en) Handwritten character model training method, chinese character recognition method, apparatus, device, and medium
WO2019232850A1 (en) Method and apparatus for recognizing handwritten chinese character image, computer device, and storage medium
Goodfellow et al. Multi-digit number recognition from street view imagery using deep convolutional neural networks
US9367766B2 (en) Text line detection in images
CN110838126B (en) Cell image segmentation method, cell image segmentation device, computer equipment and storage medium
WO2019232849A1 (en) Chinese character model training method, handwritten character recognition method, apparatuses, device and medium
Zhang et al. Road recognition from remote sensing imagery using incremental learning
Vanetti et al. Gas meter reading from real world images using a multi-net system
WO2019232870A1 (en) Method for acquiring handwritten character training sample, apparatus, computer device, and storage medium
US11144799B2 (en) Image classification method, computer device and medium
CN116596875B (en) Wafer defect detection method and device, electronic equipment and storage medium
CN109189965A (en) Pictograph search method and system
CN113780145A (en) Sperm morphology detection method, sperm morphology detection device, computer equipment and storage medium
He et al. Aggregating local context for accurate scene text detection
Yang et al. An improved algorithm for the detection of fastening targets based on machine vision
CN115908363B (en) Tumor cell statistics method, device, equipment and storage medium
CN117115824A (en) Visual text detection method based on stroke region segmentation strategy
Li et al. A pre-training strategy for convolutional neural network applied to Chinese digital gesture recognition
KR20190093752A (en) Method and system for scene text detection using deep learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18922025

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 11/03/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18922025

Country of ref document: EP

Kind code of ref document: A1