WO2019232874A1 - 汉字模型训练方法、汉字识别方法、装置、设备及介质 - Google Patents

汉字模型训练方法、汉字识别方法、装置、设备及介质 Download PDF

Info

Publication number
WO2019232874A1
WO2019232874A1 PCT/CN2018/094405 CN2018094405W WO2019232874A1 WO 2019232874 A1 WO2019232874 A1 WO 2019232874A1 CN 2018094405 W CN2018094405 W CN 2018094405W WO 2019232874 A1 WO2019232874 A1 WO 2019232874A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
neural network
network model
recurrent neural
handwriting
Prior art date
Application number
PCT/CN2018/094405
Other languages
English (en)
French (fr)
Inventor
吴启
周罡
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019232874A1 publication Critical patent/WO2019232874A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/30Writer recognition; Reading and verifying signatures
    • G06V40/33Writer recognition; Reading and verifying signatures based only on signature image, e.g. static signature recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • the present application relates to the field of handwriting recognition, and in particular, to a Chinese character model training method, a Chinese character recognition method, a device, a device, and a medium.
  • OCR optical character recognition
  • the training set is input into a convolutional recurrent neural network model to obtain the forward output and the backward output of the convolutional recurrent neural network model.
  • a convolutional recurrent neural network model According to the forward output and the backward output of the convolutional recurrent neural network model, use Updating the weights and biases in the convolutional recurrent neural network model based on a back-propagation algorithm based on a continuous time classification algorithm to obtain an initial handwriting recognition model;
  • the test set is input into the initial handwriting recognition model to obtain a recognition accuracy rate. If the recognition accuracy rate is greater than a preset accuracy rate, it is determined that the initial handwriting recognition model is a target handwriting recognition model.
  • Model initialization module used to initialize the weights and offsets of the convolutional recurrent neural network model
  • a training sample processing module for obtaining font image training samples, using a Chinese secondary font library to mark handwritten images in the font image training samples, and dividing the font image training samples into training sets according to a preset allocation rule And test set;
  • An initial model acquisition module configured to input the training set into a convolutional recurrent neural network model, obtain forward output and backward output of the convolutional recurrent neural network model, and according to the forward direction of the convolutional recurrent neural network model Output and backward output, using back-propagation algorithm based on continuous-time classification algorithm to update weights and offsets in the convolutional recurrent neural network model to obtain an initial handwriting recognition model;
  • a target model acquisition module is configured to input the test set into the initial handwriting recognition model to obtain a recognition accuracy rate. If the recognition accuracy rate is greater than a preset accuracy rate, determine the initial handwriting recognition model as Target handwriting recognition model.
  • a computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor.
  • the processor executes the computer-readable instructions, the following steps are implemented:
  • the training set is input into a convolutional recurrent neural network model to obtain the forward output and the backward output of the convolutional recurrent neural network model.
  • a convolutional recurrent neural network model According to the forward output and the backward output of the convolutional recurrent neural network model, use Updating the weights and biases in the convolutional recurrent neural network model based on a back-propagation algorithm based on a continuous time classification algorithm to obtain an initial handwriting recognition model;
  • the test set is input into the initial handwriting recognition model to obtain a recognition accuracy rate. If the recognition accuracy rate is greater than a preset accuracy rate, it is determined that the initial handwriting recognition model is a target handwriting recognition model.
  • One or more non-volatile readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the following steps:
  • the training set is input into a convolutional recurrent neural network model to obtain the forward output and the backward output of the convolutional recurrent neural network model.
  • a convolutional recurrent neural network model According to the forward output and the backward output of the convolutional recurrent neural network model, use Updating the weights and biases in the convolutional recurrent neural network model based on a back-propagation algorithm based on a continuous time classification algorithm to obtain an initial handwriting recognition model;
  • the test set is input into the initial handwriting recognition model to obtain a recognition accuracy rate. If the recognition accuracy rate is greater than a preset accuracy rate, it is determined that the initial handwriting recognition model is a target handwriting recognition model. Based on this, it is necessary to provide a Chinese character recognition method, device, device and medium with high recognition accuracy in response to the above technical problems.
  • the text line image is input to a target handwriting recognition model for recognition, and a recognition result corresponding to the text line image is obtained.
  • the target handwriting recognition model is obtained by using the above-mentioned Chinese character model training method.
  • An original image acquisition module configured to acquire an original image, where the original image includes handwriting and a background image
  • An effective image acquisition module configured to pre-process the original image to obtain an effective image
  • a target image acquisition module configured to process the effective image by using a kernel density estimation algorithm and an erosion method, remove a background image, and obtain a target image including the handwriting;
  • a text line image acquisition module configured to use text positioning technology to perform text positioning on the target image to obtain a text line image
  • a recognition result acquisition module configured to input the text line image into a target handwriting recognition model for recognition, and obtain a recognition result corresponding to the text line image; the target handwriting recognition model is obtained by using the above-mentioned Chinese character model training method To.
  • a computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor.
  • the processor executes the computer-readable instructions, the following steps are implemented:
  • the text line image is input to a target handwriting recognition model for recognition, and a recognition result corresponding to the text line image is obtained.
  • the target handwriting recognition model is obtained by using the above-mentioned Chinese character model training method.
  • One or more non-volatile readable storage media storing computer readable instructions, characterized in that when the computer readable instructions are executed by one or more processors, the one or more processors are caused to execute The following steps:
  • the text line image is input to a target handwriting recognition model for recognition, and a recognition result corresponding to the text line image is obtained.
  • the target handwriting recognition model is obtained by using the above-mentioned Chinese character model training method.
  • FIG. 1 is an application scenario diagram of a Chinese character model training method according to an embodiment of the present application
  • FIG. 2 is a flowchart of a Chinese character model training method according to an embodiment of the present application.
  • FIG. 3 is a specific flowchart of step S30 in FIG. 2;
  • FIG. 4 is a schematic diagram of a Chinese character model training device according to an embodiment of the present application.
  • FIG. 5 is a flowchart of a Chinese character recognition method according to an embodiment of the present application.
  • step S52 in FIG. 5 is a specific flowchart of step S52 in FIG. 5;
  • step S53 in FIG. 5 is a specific flowchart of step S53 in FIG. 5;
  • step S534 in FIG. 7 is a specific flowchart of step S534 in FIG. 7;
  • FIG. 9 is a schematic diagram of a Chinese character recognition device according to an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a computer device according to an embodiment of the present application.
  • the Chinese character model training method provided in the embodiment of the present application can be applied in the application environment shown in FIG. 1.
  • the application environment of the Chinese character model training method includes a server and a client.
  • the client communicates with the server through a network.
  • the client is a device that can interact with the user, including, but not limited to, a computer, a smartphone, and a tablet.
  • the Chinese character model training method provided in the embodiment of the present application is applied to a server.
  • a Chinese character model training method includes the following steps:
  • the Convolutional-Recurrent Neural Networks (C-RNN) model is composed of a Convolutional Neural Networks (CNN) model and a Recurrent Neural Networks (RNN) model.
  • a neural network model The forward output of the convolutional recurrent neural network model is the forward output of the recurrent neural network model.
  • weights and biases between the input layer, hidden layer and output layer of the convolutional recurrent neural network model.
  • Initializing the weights and biases of the convolutional recurrent neural network model is a necessary step for model training.
  • Reasonably initializing the weights and biases of the convolutional recurrent neural network model is conducive to improving the speed of model training.
  • S20 Obtain font image training samples, use the Chinese secondary font to mark the handwritten images in the font image training samples, and divide the font image training samples into a training set and a test set according to a preset allocation rule.
  • the server obtains font image training samples from the database to provide data sources for subsequent model training.
  • the font image training sample refers to a handwriting sample used to train a neural network model, and includes multiple handwriting images.
  • the handwriting image refers to an image carrying Chinese characters written by different people.
  • the standard fonts in the Chinese secondary font library are used to label the handwritten images in the font image training samples, and the labeled Chinese characters associated with the handwritten images are obtained.
  • Tag Chinese characters refer to Chinese characters in standard fonts that match handwritten images and are obtained from secondary Chinese character libraries. Standard fonts include, but are not limited to, Song, Kai, and imitation Song.
  • the handwritten images in the font image training samples are handwriting "forbearance”, “hunger”, “starving”, and “hungry” written by different people. Forbearance, “hunger”, “endurance”, and “hungry” are marked, and the Song style, Kaiti or the imitation of Song in the second-level Chinese fonts correspond to "bearance”, “hunger”, “starvation” and “hungry”, which are the Chinese characters corresponding to each handwritten image. .
  • a training set is data for adjusting parameters in a convolutional recurrent neural network model.
  • a test set is data used to test the recognition accuracy of a trained convolutional recurrent neural network model.
  • a ten-fold cross-validation method is used to divide the font image training samples into a training set and a test set.
  • the ten-fold cross-validation method is a commonly used method to test the accuracy of the algorithm.
  • a ten-fold cross-validation method is used to classify font image training samples according to a 9: 1 ratio.
  • the font image training samples are divided into 10 groups, of which 9 groups of font image training samples are used as training sets for The convolutional recurrent neural network model is trained, and the remaining 1 set of font image training samples is used as a test set to verify the accuracy of the trained convolutional recurrent neural network model.
  • the training set is input into the convolutional recurrent neural network model, and the forward and backward outputs of the convolutional recurrent neural network model are obtained.
  • the forward and backward output of the convolutional recurrent neural network model continuous-based
  • the back-propagation algorithm of the time classification algorithm updates the weights and biases in the convolutional recurrent neural network model to obtain the initial handwriting recognition model.
  • CTC Continuous Time Classification
  • the initial handwriting recognition model refers to a model in which training samples of a font image in a training set are input to a convolutional recurrent neural network model for training.
  • the Back Propagation algorithm refers to an algorithm that adjusts the weights and offsets between the hidden layer and the output layer, and the weights and offsets between the input layer and the hidden layer in the reverse order of the timing state.
  • the server sequentially labels the handwritten images in the training set, so that each handwritten image in the handwritten image carries a corresponding sequence label.
  • the handwritten image in the training set contains handwritten characters such as "Beijing Welcomes You", and each handwritten character is labeled sequentially, so that "North” carries the sequential label "110", and "Beijing” carries the sequential label "111", "Huan "Carries a sequence tag” 112 ",” welcome “carries a sequence tag” 113 ", and” you “carries a sequence tag” 114 ".
  • the handwritten image in the training set is input to the recurrent neural network model for training, and the hidden layer obtains the corresponding forward output and backward output through calculation, where the forward output refers to the uth handwriting output in chronological order.
  • the backward output is the probability of the u-th handwriting output in the reverse order of time. For example, "Welcome to Beijing” assumes that the u-th handwriting is "huan", and the output at time t-1 is "jing", and the output at time t-1 is calculated according to the output "jing" at time t-1 and the input "huan” at time t.
  • the output at time t may include "huan, kan, and double", then the forward output refers to the probability that the output is "huan” at time t. Assume that the output at time t + 1 is “welcome”, and calculate the output at time t according to the output "welcome” at time t + 1 and the input "huan” at time t. The output at time t may include “huan, kan, and double", Then the backward output refers to the probability that the output is "happy" at time t.
  • the handwritten image in the training set specifically refers to a single-line handwritten correspondence formed by three or more handwritten letters.
  • Image In the convolutional recurrent neural network model, the forward output and backward output of the handwritten image are input to the output layer of the recurrent neural network model in the convolutional recurrent neural network model, and the output layer of the recurrent neural network model is forward-oriented. The output and backward output are calculated to get the target output.
  • the convolutional recurrent neural network model constructs an error function based on the target output and the label Chinese characters, and uses the error function to obtain partial derivatives to update the weights and offsets in the convolutional recurrent neural network model to obtain the initial handwriting.
  • Word recognition model Back-propagation algorithm using continuous time classification algorithm to update weights and offsets in the convolutional recurrent neural network model, so that the update of weights and offsets is based on the error function constructed by the handwritten image corresponding to a single line of handwritten training in the training set
  • the update solves the problem of time series with uncertain input and output alignment, ensures that the initial handwriting recognition model is trained according to the time series, and improves the accuracy of model training.
  • S40 Input the test set into the initial handwriting recognition model to obtain the recognition accuracy rate. If the recognition accuracy rate is greater than a preset accuracy rate, determine the initial handwriting recognition model as the target handwriting recognition model.
  • the target handwriting recognition model refers to a model in which a recognition accuracy rate determined after testing an initial handwriting recognition model after a test set meets a preset accuracy rate, and the target handwriting recognition model can be used to recognize a handwritten image model. After the initial handwriting recognition model training is completed, the handwriting images of each handwriting training sample in the test set are sequentially input into the initial handwriting recognition model to obtain the recognition accuracy of the initial handwriting recognition model.
  • Step S40 specifically includes the following steps: First, the handwriting images of each handwriting training sample in the test set are sequentially input into the initial handwriting recognition model, and the recognized Chinese characters corresponding to each handwriting image are obtained.
  • the identified Chinese characters in this embodiment are specific
  • the preset accuracy is a preset threshold used to evaluate that the accuracy of the initial handwriting recognition model meets a preset requirement. For example, the preset accuracy rate is 82%. After the recognition of the initial handwriting recognition model in the test set, the recognition accuracy rate obtained is greater than 82% (such as 85% or 90%, etc.), which indicates that the initial handwriting recognition model opponent The recognition accuracy of the writing training sample met the requirements.
  • the initial handwriting recognition model can be determined as the target handwriting recognition model.
  • a training set is input into a convolutional recurrent neural network model, and a forward output and a backward output are obtained, and then a target output is calculated based on the forward output and the backward output.
  • the back-propagation algorithm of the time classification algorithm updates the weights and biases in the convolutional recurrent neural network model, and obtains the handwriting training model, which can effectively improve the accuracy of model training.
  • the test set is input into the handwriting training model for testing. If the recognition accuracy of the handwriting training model's handwriting training sample is greater than the preset accuracy rate, it means that the handwriting training model's handwriting training sample's recognition accuracy rate has reached the requirements.
  • the handwriting training model is determined as a target handwriting recognition model for recognizing a handwriting image, so that the obtained target handwriting recognition model can recognize the handwriting with high recognition accuracy.
  • the convolutional recurrent neural network model is a neural network model composed of a convolutional neural network model and a recurrent neural network model
  • Convolutional neural network models and recurrent neural network models are used for model training.
  • the training set is input into the convolutional recurrent neural network model, and the forward output and the backward output of the convolutional recurrent neural network model are obtained.
  • S31 Input the handwritten image in the training set into the convolutional neural network model, and obtain the handwritten image feature corresponding to the handwritten image in the training set.
  • the convolutional neural network model includes multiple layers of convolutional layers and pooling layers.
  • the server trains the handwritten images in the font image training samples in the training set into the convolutional neural network model. Through the calculation of each layer of convolutional layers, it obtains the output of each layer of the convolutional layers.
  • the output is to obtain the features of the handwritten image corresponding to the handwritten image.
  • Z m l represents the output of the m-th sequential label before the activation function is processed
  • a m l-1 represents the first m sequential label outputs (that is, the output of the previous layer)
  • represents the activation function
  • the activation function ⁇ used for the convolution layer is ReLU (Rectified Linear Unit, linear rectification function), which has a better effect than other activation functions
  • W l represents the convolution kernel (weight) of the first layer
  • b l represents the offset of the first convolution layer. If the first layer is a pooling layer, the maximum pooling downsampling is used to reduce the dimension of the output of the convolution layer in the pooling layer.
  • T (m) represents the output of the output layer of the convolutional neural network model.
  • the output is the handwriting image feature of the handwriting image corresponding to the mth order label.
  • the handwriting image feature carries There is an order label, and the order label of the handwritten image feature is consistent with the order label of the handwritten image corresponding to the image label.
  • the handwriting image features corresponding to the handwriting image in the training set are input into the recurrent neural network model for training, and the forward output and the backward output of the recurrent neural network model are obtained.
  • the formula of the forward output of the recurrent neural network model is Among them, a (t, u) represents the forward output corresponding to the feature of the u-th handwritten image at time t, Represents the probability that the output is a space at time t, l ′ u represents the total length of the handwritten image and the space, and a (t-1, i) represents the forward output of the i-th Chinese character at time t-1; the back of the recurrent neural network model
  • the formula for the output is Among them, b (t, u) represents the backward output corresponding to the feature of the u-th handwritten image at time t. Represents the probability of output as a space at time t + 1, and a (t + 1, i) indicates the backward output of the
  • the space refers to the space between adjacent Chinese characters.
  • U' represents the Weights between the pooling layer and the hidden layer of the recurrent neural network model.
  • W ′ represents the weight between the hidden layer and the hidden layer, b ′ represents the offset between the input layer and the hidden layer, and T (m) represents the m-th sequential label obtained by the input layer of the recurrent neural network model.
  • S33 Construct a loss function based on the forward and backward outputs of the recurrent neural network model, and use the back-propagation algorithm based on the continuous time classification algorithm to update and adjust the recurrent neural network model and the convolutional neural network model based on the loss function. Weights and offsets to obtain the initial handwriting recognition model.
  • the specific expression of the loss function is: Among them, x represents the input Chinese character, z represents the output corresponding to the input Chinese character x, u represents the u-th Chinese character, z ′ represents the length of the Chinese character, and a (t, u) represents the forward direction corresponding to the u-th Chinese character at time t. Output, b (t, u) represents the backward output corresponding to the uth Chinese character at time t.
  • the convolutional neural network model inputs handwritten image features into the hidden layer of the recurrent neural network model, according to the formula Get the forward output of the handwritten image on the hidden layer, according to the formula Get the handwritten image back output in the hidden layer, and then input the forward output and the backward output to the output layer.
  • the target output of the image at the output layer of the recurrent neural network model is mapped to the hidden layer of the recurrent neural network model.
  • the target output After obtaining the target output, input the target output and label Chinese characters into the loss function.
  • the specific expression of the loss function is Then, the error E loss (x, z) of the handwritten image corresponding to a single line of handwriting is obtained according to the loss function.
  • E loss (x, z) After obtaining E loss (x, z), obtain the initial handwriting recognition model by obtaining the partial derivative of E loss (x, z) and updating the weights and biases in the recurrent neural network model and the convolutional neural network model. .
  • the formula for finding partial derivatives is Where ⁇ represents the set of weights and biases in the convolutional recurrent neural network model.
  • Steps S31-S33 Acquire the handwriting image features corresponding to the handwriting images in the training set through the convolutional neural network model, and then input the handwriting image features into the recurrent neural network model for training, and obtain the forward output and backward output, and Construct a loss function based on the forward output and backward output and label Chinese characters. Finally, based on the loss function, the back-propagation algorithm based on the continuous-time classification algorithm is used to update and adjust the weights and offsets in the recurrent neural network model and the convolutional neural network model to obtain the initial handwriting recognition model to ensure the accuracy of model training and speed.
  • a training set is input into a convolutional recurrent neural network model, and a handwriting image feature corresponding to a handwriting image is obtained through the convolutional neural network model, and then the handwriting image feature is input
  • a back-propagation algorithm based on a continuous time classification algorithm is used to update the weights and offsets in the convolutional recurrent neural network model, so that the weights and offsets in the convolutional recurrent neural network model are based on time
  • the sequence of handwriting images is updated, and the handwriting is recognized through the relationship between each handwriting and the adjacent handwriting, which effectively improves the accuracy of the initial handwriting recognition model.
  • the test set is input into the initial handwriting recognition model for testing. If the initial handwriting recognition model recognizes the accuracy accuracy of the font image training samples greater than the preset accuracy rate, it indicates the initial The recognition accuracy of the handwriting recognition model on the font image training samples met the requirements.
  • the initial handwriting recognition model was determined as the target handwriting recognition model for recognizing the handwriting image.
  • the target handwriting recognition model has high recognition accuracy. Sex.
  • a Chinese character model training device corresponds to the Chinese character model training method in the above embodiment in a one-to-one correspondence.
  • the Chinese character model training device includes a model initialization module 10, a training sample processing module 20, an initial model acquisition module 30, and a target model acquisition module 40.
  • the functional modules are described in detail as follows:
  • a model initialization module 10 is configured to initialize weights and biases of a convolutional recurrent neural network model.
  • the training sample processing module 20 is configured to obtain font image training samples, use a Chinese secondary font library to annotate handwritten images in the font image training samples, and divide the font image training samples into a training set and a test set according to a preset allocation rule. .
  • An initial model acquisition module 30 is configured to input a training set into a convolutional recurrent neural network model, and obtain a forward output and a backward output of the convolutional recurrent neural network model. According to the forward output and the back of the convolutional recurrent neural network model, To the output, a back-propagation algorithm based on a continuous-time classification algorithm is used to update the weights and offsets in the convolutional recurrent neural network model to obtain the initial handwriting recognition model.
  • the target model acquisition module 40 is configured to input the test set into the initial handwriting recognition model to obtain a recognition accuracy rate. If the recognition accuracy rate is greater than a preset accuracy rate, determine the initial handwriting recognition model as the target handwriting recognition model.
  • the convolutional recurrent neural network model includes a convolutional neural network model and a recurrent neural network model.
  • the training model acquisition module 30 includes an image feature acquisition unit 31, a model output acquisition unit 32, and an initial model acquisition unit 33.
  • the image feature acquiring unit 31 is configured to input the handwritten image in the training set into a convolutional neural network model, and obtain the handwritten image feature corresponding to the handwritten image in the training set.
  • a model output obtaining unit 32 is configured to input handwriting image features corresponding to the handwriting image in the training set into a recurrent neural network model for training, and obtain forward and backward outputs of the recurrent neural network model.
  • the formula for the output is Among them, a (t, u) represents the forward output corresponding to the feature of the u-th handwritten image at time t, Represents the probability that the output is a space at time t, l ′ u represents the total length of the handwritten image and the space, and a (t-1, i) represents the forward output of the i-th Chinese character at time t-1; the back of the recurrent neural network model The formula for the output is Among them, b (t, u) represents the backward output corresponding to the feature of the u-th handwritten image at time t. Represents the probability of output as a space at time t + 1, and a (t + 1, i) indicates the backward output of the i-th Chinese character
  • An initial model acquisition unit 33 is configured to construct a loss function according to the forward output and the backward output of the recurrent neural network model, and according to the loss function, use the back-propagation algorithm based on continuous time classification algorithm to update and adjust the recurrent neural network model and volume
  • the weight and bias in the product neural network model are used to obtain the initial handwriting recognition model.
  • x represents the input Chinese character
  • z represents the output corresponding to the input Chinese character x
  • u represents the u-th Chinese character
  • z ′ represents the length of the Chinese character
  • a (t, u) represents the forward direction corresponding to the u-th Chinese character at time t Output
  • b (t, u) represents the backward output corresponding to the uth Chinese character at time t.
  • a Chinese character recognition method specifically includes the following steps:
  • S51 Acquire an original image.
  • the original image includes handwriting and a background image.
  • the original image refers to a specific image that has not undergone any processing, and the specific image refers to an image that needs to include handwriting.
  • the original images in this embodiment include handwriting and background images.
  • the background image refers to an image corresponding to a background pattern on the original image.
  • the method for acquiring the original image includes, but is not limited to, crawling from a webpage or acquiring the database connected to the server, and the original image on the database may be an image uploaded in advance by the terminal device.
  • the effective image refers to the image after the original image is preprocessed.
  • the specific steps for the server to obtain a valid image are: (1) determine whether the original image is a color image, and if the original image is a color image, perform grayscale processing on the original image to obtain a grayscale image so that each pixel in the color image corresponds to The three components of R (red), G (green), and B (blue) can be replaced with one value, which helps to simplify the complexity of subsequent extreme normalization processing. Understandably, if the original image is not a color image, the original image is a grayscale image, and no further graying process is required. (2) Perform a range normalization process on the pixel matrix corresponding to the grayscale image to obtain a valid image. Performing the range normalization processing on the pixel matrix corresponding to the grayscale image can preserve the relative relationship in the pixel matrix while improving the calculation speed.
  • S53 Use the kernel density estimation algorithm and the erosion method to process the effective image, remove the background image, and obtain a target image including handwriting.
  • the target image refers to an image containing only a handwritten portion.
  • Kernel density estimation algorithm is a non-parametric method that studies the data distribution characteristics from the data sample itself and is used to estimate the probability density function.
  • the specific formula of the kernel density estimation algorithm is Represents the estimated probability density of the pixel, K (.) Is the kernel function, h is the pixel range, x is the pixel whose probability density is to be estimated, x i is the i-th pixel in the h range, and n is the pixel in the h range where x is Number.
  • the etching method refers to a method of performing an etching treatment on an image, wherein the etching refers to removing a portion of the background image in the image and leaving only the handwritten portion.
  • the formula of the kernel density estimation algorithm is used to process the frequency distribution histogram corresponding to the effective image, to obtain the smooth curve corresponding to the frequency distribution histogram, and to obtain the minimum value according to the minimum value and the maximum value on the smoothed curve.
  • the layered image is corroded to remove the background image. Keep the handwritten part.
  • the layered and etched images are superimposed to obtain a target image including handwriting.
  • the superposition processing refers to a process of superimposing the layered image with only the handwritten portion into one image, thereby achieving the purpose of obtaining a target image including handwriting.
  • S54 Use text positioning technology to perform text positioning on the target image to obtain a text line image.
  • the text positioning technology refers to a technology for positioning a text area.
  • the text localization technology includes, but is not limited to, Text Detection (Provisional Network, CTPN) technology and Optical Character Recognition (OCR) technology.
  • CTPN Text Detection
  • OCR Optical Character Recognition
  • CPTN refers to a common network technology for image text detection.
  • OCR technology refers to the technology of analyzing and identifying image files of text materials to obtain text and layout information. Generally, it is divided into two steps: 1. text positioning, that is, finding the position of the text in the picture; 2. text recognition, that is, recognizing the found text. In this embodiment, the step of positioning characters in OCR technology is adopted.
  • step S5342 First use the proximity search method from the connected areas obtained in step S5342 to randomly select one connected area as the starting connected area, and calculate the remaining connected area (other connected areas except the actual area) and the starting connected area.
  • the connected area with the area distance less than a preset threshold is selected as the target connected area in order to determine the direction of the expansion operation (ie, up, down, left, and right).
  • the preset threshold is a preset threshold used to determine a distance between two connected regions.
  • Proximity search method refers to starting from a starting connected area, which can find the horizontal circumscribed rectangle of the starting connected area, and expand the connected area to the entire rectangle.
  • the expansion operation is performed on this rectangle, and the direction of expansion is the method of the direction of the nearest neighboring area.
  • the expansion operation is performed only when the expansion direction is horizontal.
  • the area distance refers to the distance between two connected areas.
  • the area length needs to be subtracted, that is, by the formula Calculate x ′ c by formula Calculate y ′ c and get After obtaining (x ′ c , y ′ c ), according to the calculation formula of the area distance Obtain the regional distance, where S is the initial connected region, S ′ is the remaining connected region, and (x c , y c ) is the center vector difference between the two connected regions.
  • (x ', y') represents the coordinate point of the upper left corner of the rectangle where the remaining connected area S 'is
  • (w', z ') represents the coordinate point of the lower right corner of the rectangle where the remaining connected area S' is
  • (x, y) represents The coordinate point of the upper left corner of the rectangle where the starting connected area S is located
  • (w, z) represents the coordinate point of the lower right corner of the rectangle where the starting connected area S is located.
  • the point corresponding to (x, y) (that is, starting from The coordinate point of the upper left corner of the rectangle where the starting connected area S is located) is used as the origin.
  • the expansion process is an etching process and is a process for expanding an image in morphology.
  • the built-in imdilate function is used to corrode the connected areas of the binary image.
  • the text line image refers to an image corresponding to a single line of handwriting obtained by using text positioning technology.
  • the process of expanding the initial connected region includes the following steps: selecting an n ⁇ n structural element, in this embodiment, the value of 8 elements adjacent to each element in the pixel matrix is used as the connected region of the element. Therefore, the selected structural element is a 3 ⁇ 3 pixel matrix.
  • the structure element is an n ⁇ n pixel matrix, where the matrix elements include 0 or 1.
  • the connected area is scanned according to the direction of the target connected area, and the structure element is connected to the connected area covered by the structure element in the direction of the target connected area.
  • the logical AND operation remains unchanged if the results are all 0; if it is not all 0, the pixel matrix covered by the structural elements is changed to 1, and the part that becomes 1 is the expanded part of the initial connected region.
  • the text line image is input to the target handwriting recognition model for recognition, and the recognition result corresponding to the text line image is obtained.
  • the target handwriting recognition model is obtained by using the above-mentioned Chinese character model training method.
  • the target handwriting recognition model is a pre-trained model for identifying handwriting.
  • the recognition result refers to a result obtained by recognizing a handwritten image with a recognition probability greater than a preset probability through a convolutional recurrent neural network model.
  • the text line image is input into the target handwriting recognition model, and the recognition probability corresponding to each text line image is obtained.
  • the recognition probability refers to the probability of the Chinese character corresponding to the text line image obtained through the target handwriting model recognition.
  • the recognition probability is compared with a preset probability. If the recognition probability is greater than the preset probability, the corresponding recognition result is obtained, which is helpful to improve the accuracy of the recognition result.
  • the text line image corresponding to "Beijing Welcomes You” is input into the target handwriting recognition model, and the recognition results obtained may be "Beijing Welcomes You", “Beijing Kanyou” and “Beijing” "Double Welcome", of which the recognition probability corresponding to "Beijing Welcomes You” is 99%, and the recognition probabilities of "Beijing Kan Welcome You” and "Beijing Double Welcome You” are 50% and 60%, respectively. Compare the probabilities, 99% is greater than 85%, and the corresponding recognition result is "Welcome to Beijing".
  • the Chinese character recognition method provided in this embodiment obtains a valid image by preprocessing the original image, and processes the valid image by using a kernel density estimation algorithm and an erosion method to remove the part of the background image and retain the target containing only handwriting Images can save model recognition time.
  • the text positioning technology is used to position the target image to obtain the text line image, and the obtained text line image is input to the target handwriting recognition model for recognition, and the recognition result is obtained based on the recognition probability value corresponding to the text line image.
  • Using the target handwriting recognition model to recognize text line images can improve the recognition accuracy.
  • step S52 preprocessing the original image to obtain a valid image, specifically includes the following steps:
  • the size of the handwriting itself is relatively small compared to the background image.
  • the handwriting is easily mishandled. Therefore, in order to ensure that the handwriting is no longer grayscale It was mistakenly cleared during the processing, and each pixel corresponding to the original image needs to be enlarged.
  • the size of the nth pixel in the original image is x n
  • each pixel in the original image is power-magnified to make x n becomes
  • enlarging the pixels in the original image can effectively avoid handwriting being mistakenly processed when the original image is grayed out.
  • the original image is enlarged, if the original image is not a grayscale image but a color image, it is necessary to perform grayscale processing on the original image to obtain a grayscale image. Understandably, if the original image is a grayscale image, no grayscale processing is required.
  • Sampling pixels according to which the grayscale image is formed; wherein R (red), G (green), and B (blue) are the three components in the original image, and the sampling pixels are the grayscale images used to replace the color image The pixels corresponding to the three components of R, G, and B.
  • Graying the original image as a color image effectively reduces the amount of data and computational complexity required to obtain valid images in subsequent steps.
  • S522 Perform range normalization processing on the pixel matrix corresponding to the grayscale image to obtain a valid image, where the range normalization formula is: x is the pixel of the effective image before normalization, x ′ is the pixel of the effective image after normalization, M min is the smallest pixel in the pixel matrix M corresponding to the grayscale image, and M max is the largest pixel in the pixel matrix M corresponding to the grayscale image .
  • the range standardization processing is a processing method for processing data to make the data compressed in the range of (0, 1). Standardizing the spread of the pixel matrix corresponding to the grayscale image and multiplying it by 255 can facilitate the processing of the data in the pixel matrix, while retaining the relationship between the pixels in the pixel matrix.
  • the background image and each handwriting have their own corresponding pixel matrix. After obtaining the background image in the grayscale image and the pixel matrix corresponding to each handwriting, the pixel matrix is subjected to a range normalization process to obtain an effective image corresponding to the pixel matrix after the range normalization process. Performing the range normalization processing on the pixel matrix can improve the processing speed of obtaining a target image including handwriting.
  • steps S521-S522 by performing an enlargement process on the original image, it is possible to effectively avoid a situation where the handwriting is mishandled when the original image is grayed out in the next step.
  • the grayscale processing is performed on the original image, and obtaining a grayscale image can reduce the amount of data that needs to be processed in subsequent steps. Performing extremely poor normalization processing on grayscale images can improve the processing speed of obtaining target images including handwriting.
  • step S53 uses a kernel density estimation algorithm and an erosion method to process the effective image, removes the background image, and obtains a target image including handwriting, which specifically includes the following steps:
  • S531 Count the number of occurrences of pixels in the effective image, and obtain a frequency distribution histogram corresponding to the effective image.
  • the horizontal axis of the frequency distribution histogram represents continuous values of the sample data, and each cell on the horizontal axis corresponds to the group distance of a group as the bottom edge of the small rectangle; the vertical axis represents the ratio of the frequency to the group distance, and uses this
  • the ratio is the height of a small rectangle, and a group of graphs composed of multiple small rectangles is called a frequency histogram.
  • the horizontal axis of the frequency histogram indicates that the pixels are continuous values between (0, 255), the group distance corresponding to each small rectangle on the horizontal axis is 1, and the vertical axis indicates the corresponding value of the small rectangle.
  • the ratio is the height of the corresponding small rectangle.
  • the frequency distribution histogram can vividly display the number of occurrences of pixels in the effective image, so that the distribution of the data can be reflected at a glance.
  • S532 The Gaussian kernel density estimation method is used to process the frequency distribution histogram to obtain the frequency maximum and frequency minimum corresponding to the frequency distribution histogram, and obtain corresponding pixels according to the frequency maximum and frequency minimum.
  • Gaussian kernel density estimation method refers to a kernel density estimation method whose kernel function is a Gaussian kernel.
  • the function corresponding to the Gaussian kernel is Among them, K (x) refers to a Gaussian kernel function in which pixels (independent variables) are x, x refers to pixels, and e and ⁇ are constants.
  • the frequency maximum value refers to the frequency value whose frequency value is the maximum value in the frequency distribution histogram; the frequency minimum value refers to the frequency value whose frequency value is the minimum value in the frequency distribution histogram.
  • a Gaussian kernel density function estimation method is used to perform Gaussian smoothing on the frequency distribution histogram corresponding to the obtained effective image, and obtain a Gaussian smooth curve corresponding to the frequency distribution histogram. Based on the frequency maximum and the frequency minimum on the Gaussian smooth curve, pixels corresponding to the frequency maximum and the frequency minimum on the horizontal axis are obtained. In this embodiment, pixels corresponding to the maximum frequency value and the minimum frequency value are acquired, which facilitates subsequent hierarchical differentiation of valid images and acquires a hierarchical image.
  • S533 Perform layer processing on the effective image based on the pixels corresponding to the maximum frequency and the minimum frequency to obtain a layered image.
  • Layered image refers to the image obtained by layering the effective image based on the frequency maximum and frequency minimum. Obtain the pixels corresponding to the frequency maximum and frequency minimum, and layer the effective image according to the pixels corresponding to the frequency maximum. How many frequency maximums are in the effective image, the pixels of the corresponding effective image are aggregated. As many classes as there are, the effective image is divided into several layers. Then the pixels corresponding to the minimum frequency are used as the boundary values between the classes. According to the boundaries between the classes, the pixels corresponding to each layer of the layered image can be obtained.
  • the pixels corresponding to the maximum frequency in the effective image are 12, 54, 97, 113, 159, and 172, and the pixels corresponding to the minimum frequency are 26, 69, 104, 139, and 163.
  • the number of frequency maxima can determine that the pixels of the effective image can be divided into 6 categories, and the effective image can be divided into 6 layers.
  • the pixels corresponding to the minimum frequency are used as the boundary value between the classes. Is 0, and the largest pixel is 255.
  • a layered image with a pixel of 12 can be determined, and the layered image corresponds to a pixel range of [0,26); with a pixel of 54, A layered image with a corresponding pixel range of [26,69); a layered image with 97 pixels and a corresponding pixel range of [69,104); a layered image with 113 pixels.
  • the layered image corresponds to a pixel range of [104,139); a layered image with a pixel of 159 corresponds to a pixel range of [139,163); a layered image with a pixel of 172 corresponds to a layered image
  • the pixel range is [163,255].
  • S534 Eroding and superimposing the layered image to obtain a target image including handwriting.
  • the layered image is binarized.
  • the binarization process refers to a process in which pixels on an image are set to 0 (black) or 1 (white), and the entire image presents a clear black and white effect.
  • the binarized layered image is etched to remove the background image portion and retain the handwritten portion on the layered image.
  • the etching process is an operation for removing the content of a part of an image in morphology. Because the pixels on each layered image are pixels belonging to different ranges, after the layered image is etched, each layered image also needs to be superimposed to generate a target image containing only handwriting.
  • steps S531-S534 a frequency distribution histogram corresponding to the effective image is obtained, and pixels corresponding to the maximum frequency value and the minimum frequency value are obtained according to the frequency distribution histogram, thereby obtaining a layered image. Finally, the layered image is binarized, eroded, and superimposed to complete the recognition of the handwriting and background image in the original image. The background image is removed to obtain the target image including handwriting.
  • step S534 the layered image is etched and superimposed to obtain a target image including handwriting, which specifically includes the following steps:
  • the layered binarized image refers to an image obtained by binarizing the layered image. Specifically, after obtaining the layered image, comparing the sampled pixels of the layered image with a pre-selected threshold, and setting the pixels whose sampling is greater than or equal to the threshold to 1, and the pixels less than the threshold to 0.
  • 0 represents a background pixel
  • 1 represents a target pixel (handwriting pixel).
  • This threshold can be obtained by calculating the inter-class variance of the layered image, or it can be obtained based on empirical values.
  • the size of the threshold will affect the effect of binarizing the layered image. If the threshold is selected properly, the effect of binarizing the layered image is better. Accordingly, if the threshold is not selected properly, the layered image will be affected. The effect of binarization.
  • the threshold in this embodiment is determined based on empirical values.
  • S5342 Detect pixels in the layered binarized image to obtain a connected area corresponding to the layered binarized image.
  • the connected area refers to an area surrounded by adjacent pixels around a specific pixel. If a certain pixel is 0 and its neighboring pixels are 1, the area surrounded by the neighboring pixels is regarded as the connected area.
  • the pixel matrix corresponding to the layered binarized image is scanned progressively, and the pixel directions that meet the connectivity rule (4 neighborhood connectivity or 8 neighborhood connectivity) are scanned. Identical numbers are marked.
  • 4 neighborhood connectivity refers to the situation where a specific pixel is the same as the pixels adjacent in the four directions of up, down, left, and right;
  • 8 neighborhood connectivity refers to a specific pixel up, down, left, right, upper left, lower left, upper right, and right The case where the adjacent pixels in the next eight directions are the same.
  • the pixel matrix includes rows and columns.
  • the specific process of detecting and labeling the pixels in the binarized image is: (1) Scan the pixel matrix line by line, and form a sequence of consecutive 1 pixels (target pixels) in each line. This sequence is called a cluster, and it is labeled well. The start, end, and line number of the group. The starting point of the group refers to the first pixel of the group, and the ending point of the group refers to the last pixel of the group. (2) For the clusters in the remaining rows except the first row in the pixel matrix, compare whether the clusters in a specific residual row and all clusters in the previous row have coincident regions.
  • the specific residual row A new label in the group; if the group in the particular remaining row only overlaps with one of the groups in the uplink, assign the label of the group in the upstream to it; if there are more than two in the specific remaining row and the group If the group has overlapping areas, the corresponding group is assigned a minimum label of the associated group, and the tags in the groups above are written into equivalent pairs, indicating that they belong to a class.
  • the associated group refers to the upward group that overlaps with the group of the specific remaining row; the equivalent pair refers to the label on the group that is connected to each other.
  • the specific residual in a pixel matrix is the third row.
  • the cluster A and the two clusters in the second row are labeled 1
  • the smallest number 1 of the two groups in the second row is assigned to the A group, the number of the A group is 1, and the corresponding numbers of the A group, the 1 group, and the 2 group are recorded as Price pairs, that is, (1, 2) will be recorded as equivalent pairs.
  • the clique labeled 1 and 2 are called a connected region.
  • S5343 Eroding and superimposing the connected areas corresponding to the layered binary image to obtain a target image including handwriting.
  • the imerode function in MATLAB or the cvErode function in OpenCV is used to etch the connected regions of the layered binary image. Specifically, one structural pixel is selected. In this embodiment, 8 pixels adjacent to a characteristic pixel in the pixel matrix are used as the connected regions of the characteristic pixel. Therefore, the selected structural pixel 3 ⁇ 3 pixel matrix is used. Use the structured pixels to scan the pixel matrix of the layered binary image, and compare whether the pixel matrix in the layered binary image is completely consistent with the structured pixels.
  • the corresponding 9 pixels in the pixel matrix are all Becomes 1; if they are not completely consistent, the corresponding 9 pixels in the pixel matrix will all become 0, where 0 (black) is the corroded part of the layered binary image.
  • the layered binarized image is filtered based on the preset corrosion resistance range of the handwritten area, and the layered binary image that is not within the range of the corrosion resistance of the handwritten area is partially deleted to obtain The area within the corrosion resistance of the handwriting area.
  • the anti-corrosion ability of the hand-written area can adopt the formula: Calculated, s 1 represents the total area after being eroded in the layered binary image, and s 2 represents the total area before being eroded in the layered binary image.
  • the preset anti-corrosion range of the handwriting area is [0.05,0.8], according to the formula Calculate the ratio of the total area of each layered binary image after being etched to the total area of the layered binary image before being etched.
  • the ratio of the total area after corrosion to the total area before corrosion in a layered binary image is not within the preset corrosion resistance range of the handwritten area, indicating that the layered binary image of the area is Handwriting needs to be kept.
  • the ratio of the total area after erosion to the total area before erosion in the layered binarized image is in the range of [0.05,0.8], which means that the layered binarized image in the area is handwritten and needs to be retained.
  • Use the imadd function to superimpose the pixel matrix corresponding to each layered binary image to obtain the target image containing handwriting.
  • the imadd function is a function in computer language for superimposing layered images
  • Steps S5341-S5343 Binarize the layered image to obtain a layered binary image, and then detect the pixels in the layered binary image to obtain the pixel matrix corresponding to the layered binary image.
  • the connected area of each pixel uses structured pixels to detect the connected area of each pixel.
  • the pixels in the pixel matrix that are not completely consistent with the structured pixels become 0, and the layered binary image with 0 pixels is black.
  • the black part is the corroded part of the layered binary image.
  • the ratio is determined by calculating the ratio of the total area of the layered binary image after being eroded and the total area of the layered binary image before being eroded. Whether the anti-corrosion capability range of the preset handwriting area is removed, the background image is removed, and handwriting is retained to achieve the purpose of obtaining a target image including handwriting.
  • the Chinese character recognition method obtains a grayscale image by enlarging and graying the original image, and then performs standardization processing on the price difference to obtain a valid image. It is convenient for subsequent steps to use the Gaussian kernel density estimation algorithm to layer, binarize, corrode and superimpose the effective image, remove the background image, and retain the target image containing only handwriting.
  • the text positioning technology is used to locate the target image, obtain the text line image, and input the obtained text line image into the target handwriting recognition model for recognition. Based on the recognition probability value corresponding to the text line image, obtaining the recognition result can improve handwriting Recognition accuracy.
  • a Chinese character recognition device is provided, and the Chinese character recognition device corresponds to the Chinese character recognition method in the above-mentioned one-to-one correspondence.
  • the Chinese character recognition device includes an original image acquisition module 51, a valid image acquisition module 52, a target image acquisition module 53, a text line image acquisition module 54, and a recognition result acquisition module 55.
  • the detailed description of each function module is as follows:
  • the original image obtaining module 51 is configured to obtain an original image, and the original image includes a handwriting and a background image.
  • An effective image acquisition module 52 is configured to pre-process the original image to obtain an effective image.
  • a target image acquisition module 53 is configured to process a valid image by using a kernel density estimation algorithm and an erosion method, remove a background image, and obtain a target image including handwriting.
  • the text line image acquisition module 54 is configured to perform text positioning on a target image by using text positioning technology to obtain a text line image.
  • the recognition result acquisition module 55 is configured to input a text line image into a target handwriting recognition model for recognition, and obtain a recognition result corresponding to the text line image.
  • the target handwriting recognition model is obtained by using the foregoing Chinese character model training method.
  • the effective image acquisition module 52 includes a grayscale image acquisition unit 521 and a range normalization processing unit 522.
  • a grayscale image acquisition unit 521 is configured to perform enlargement and grayscale processing on an original image to obtain a grayscale image.
  • the range standardization processing unit 522 is configured to perform range standardization processing on a pixel matrix corresponding to a grayscale image to obtain a valid image, where the formula of the range standardization processing is x is the pixel of the effective image before normalization, x ′ is the pixel of the effective image after normalization, M min is the smallest pixel in the pixel matrix M corresponding to the grayscale image, and M max is the largest pixel in the pixel matrix M corresponding to the grayscale image .
  • the target image acquisition module 53 includes a first processing unit 531, a second processing unit 532, a layered image acquisition unit 533, and a layered image processing unit 534.
  • the first processing unit 531 is configured to count the number of occurrences of pixels in the effective image, and obtain a frequency distribution histogram corresponding to the effective image.
  • a second processing unit 532 is configured to process the frequency distribution histogram by using a Gaussian kernel density estimation method to obtain a frequency maximum and a frequency minimum corresponding to the frequency distribution histogram, and according to the frequency maximum and the frequency minimum Get the corresponding pixels.
  • a layered image acquisition unit 533 is configured to perform a layered processing on the effective image based on the pixels corresponding to the maximum frequency value and the minimum frequency value to obtain a layered image.
  • a layered image processing unit 534 is configured to perform erosion and superposition processing on the layered image to obtain a target image including handwriting.
  • the layered image processing unit 534 includes a binarization processing unit 5341, a connected region acquisition unit 5342, and a connected region processing unit 5343.
  • a binarization processing unit 5341 is configured to perform binarization processing on the layered image to obtain a layered binarized image.
  • the connected region obtaining unit 5342 is configured to detect pixels in the layered binarized image to obtain a connected region corresponding to the layered binarized image.
  • the connected region processing unit 5343 is configured to perform erosion and superposition processing on the connected regions corresponding to the layered binary image to obtain a target image including handwriting.
  • a computer device is provided.
  • the computer device may be a server, and the internal structure diagram may be as shown in FIG. 10.
  • the computer device includes a processor, a memory, a network interface, and a database connected through a system bus.
  • the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer-readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer-readable instructions in a non-volatile storage medium.
  • the database of the computer equipment is used to store the target handwriting recognition model.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer-readable instructions are executed by a processor to implement a Chinese character model training method.
  • a computer device including a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor.
  • the processor executes the computer-readable instructions, the processor implements the following steps: initializing a volume Product recurrent neural network model weights and biases; Obtain font image training samples, use Chinese secondary fonts to label handwritten images in font image training samples, and divide font image training samples into training according to preset allocation rules Set and test set; input the training set into the convolutional recurrent neural network model to obtain the forward and backward output of the convolutional recurrent neural network model, and according to the forward and backward output of the convolutional recurrent neural network model, The back-propagation algorithm based on continuous time classification algorithm is used to update the weights and biases in the convolutional recurrent neural network model to obtain the initial handwriting recognition model; input the test set into the initial handwriting recognition model to obtain the recognition accuracy rate, If the recognition accuracy is greater than the preset accuracy, determine the initial handwriting recognition model as
  • the processor executes the computer-readable instructions
  • the following steps are further implemented: inputting the handwritten image in the training set into a convolutional neural network model, obtaining the handwriting image feature corresponding to the handwritten image in the training set;
  • the features of the handwritten image corresponding to the handwritten image are input into the recurrent neural network model for training, and the forward output and backward output of the recurrent neural network model are obtained.
  • the formula of the forward output is Among them, a (t, u) represents the forward output corresponding to the feature of the u-th handwritten image at time t, Represents the probability that the output is a space at time t, l ′ u represents the total length of the handwritten image and the space, a (t-1, i) represents the forward output of the i-th Chinese character at time t-1; the back of the recurrent neural network model
  • the formula for the output is Among them, b (t, u) represents the backward output corresponding to the feature of the u-th handwritten image at time t.
  • x represents the input Chinese character
  • z represents the output corresponding to the input Chinese character x
  • u represents the u-th Chinese character
  • z ′ represents the length of the Chinese character
  • a (t, u) represents the forward direction corresponding to the u-th Chinese character at time t.
  • Output, b (t, u) represents the backward output corresponding to the uth Chinese character at time t.
  • one or more non-volatile storage media storing computer-readable instructions are provided, and when the computer-readable instructions are executed by one or more processors, the one or more processors implement The following steps: initialize the weights and offsets of the convolutional recurrent neural network model; obtain font image training samples, use the Chinese secondary font to mark the handwritten images in the font image training samples, and assign the font images according to preset allocation rules
  • the training samples are divided into a training set and a test set; the training set is input into a convolutional recurrent neural network model, and the forward and backward outputs of the convolutional recurrent neural network model are obtained, and according to the forward output of the convolutional recurrent neural network model And backward output, using the back-propagation algorithm based on continuous time classification algorithm to update the weights and offsets in the convolutional recurrent neural network model to obtain the initial handwriting recognition model; input the test set into the initial handwriting recognition model, Get the recognition accuracy. If the recognition accuracy is greater than the preset accuracy,
  • the following steps are further implemented: inputting the handwritten image in the training set into a convolutional neural network model, obtaining the handwriting image feature corresponding to the handwritten image in the training set; The characteristics of the handwritten image corresponding to the handwritten image are input into the recurrent neural network model for training, and the forward output and backward output of the recurrent neural network model are obtained.
  • the formula of the forward output is Among them, a (t, u) represents the forward output corresponding to the feature of the u-th handwritten image at time t, Represents the probability that the output is a space at time t, l ′ u represents the total length of the handwritten image and the space, and a (t-1, i) represents the forward output of the i-th Chinese character at time t-1;
  • the formula for the output is Among them, b (t, u) represents the backward output corresponding to the feature of the u-th handwritten image at time t.
  • x represents the input Chinese character
  • z represents the output corresponding to the input Chinese character x
  • u represents the u-th Chinese character
  • z ′ represents the length of the Chinese character
  • a (t, u) represents the forward direction corresponding to the u-th Chinese character at time t Output
  • b (t, u) represents the backward output corresponding to the uth Chinese character at time t.
  • a computer device including a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor.
  • the processor executes the computer-readable instructions, the processor implements the following steps: obtaining the original Image, original image includes handwriting and background image; preprocess the original image to obtain an effective image; use kernel density estimation algorithm and erosion method to process the effective image, remove the background image, and obtain the target image including handwriting; use text Positioning technology performs text positioning on the target image to obtain the text line image; the text line image is input to the target handwriting recognition model for recognition, and the recognition result corresponding to the text line image is obtained.
  • the target handwriting recognition model uses the above-mentioned Chinese character model training method Get it.
  • the processor executes the computer-readable instructions, the following steps are further implemented: the original image is enlarged and grayed out to obtain a grayscale image; the pixel matrix corresponding to the grayscale image is subjected to extreme standardization processing to obtain Valid image, where the formula for range normalization is x is the pixel of the effective image before normalization, x ′ is the pixel of the effective image after normalization, M min is the smallest pixel in the pixel matrix M corresponding to the grayscale image, and M max is the largest pixel in the pixel matrix M corresponding to the grayscale image .
  • the processor when the processor executes the computer-readable instructions, the processor further implements the following steps: counting the number of occurrences of pixels in the effective image, obtaining a frequency distribution histogram corresponding to the effective image; and adopting a Gaussian kernel density estimation method for frequency distribution.
  • the histogram is processed to obtain the frequency maximum and frequency minimum corresponding to the frequency distribution histogram, and to obtain corresponding pixels according to the frequency maximum and frequency minimum; based on the frequency maximum and frequency minimum corresponding
  • the pixels segment the effective image to obtain a layered image; the layered image is etched and superimposed to obtain a target image including handwriting.
  • the processor when the processor executes the computer-readable instructions, the following steps are further implemented: performing a binarization process on the layered image to obtain a layered binarized image; and detecting and marking pixels in the layered binarized image. To obtain the connected area corresponding to the layered binary image; etch and overlay the connected area corresponding to the layered binary image to obtain a target image including handwriting.
  • one or more non-volatile storage media storing computer-readable instructions are provided, and when the computer-readable instructions are executed by one or more processors, the one or more processors implement The following steps: obtain the original image, including the handwriting and background image; preprocess the original image to obtain a valid image; use the kernel density estimation algorithm and the erosion method to process the effective image, remove the background image, and obtain the handwritten image Target image; text positioning technology is used to locate the target image to obtain the text line image; input the text line image into the target handwriting recognition model for recognition, and obtain the recognition result corresponding to the text line image.
  • the target handwriting recognition model uses Obtained by the above Chinese character model training method.
  • the following steps are further implemented: the original image is enlarged and gray-scaled to obtain a gray-scale image; the pixel matrix corresponding to the gray-scale image is subjected to extreme standardization processing, Obtain a valid image, where the formula for range normalization is x is the pixel of the effective image before normalization, x ′ is the pixel of the effective image after normalization, M min is the smallest pixel in the pixel matrix M corresponding to the grayscale image, and M max is the largest pixel in the pixel matrix M corresponding to the grayscale image .
  • the following steps are also implemented: counting the number of times the pixels in the effective image appear, obtaining a frequency distribution histogram corresponding to the effective image; using a Gaussian kernel density estimation method to measure the frequency
  • the distribution histogram is processed to obtain the frequency maximum and frequency minimum corresponding to the frequency distribution histogram, and the corresponding pixels are obtained according to the frequency maximum and frequency minimum; corresponding to the frequency maximum and frequency minimum
  • the effective pixels are sliced and segmented to obtain layered images; the layered images are etched and superimposed to obtain target images including handwriting.
  • the following steps are further implemented: performing a binarization process on the layered image to obtain a layered binarized image; and detecting pixels in the layered binarized image. Mark to obtain the connected area corresponding to the layered binary image; etch and overlay the connected area corresponding to the layered binary image to obtain the target image including handwriting.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Character Discrimination (AREA)

Abstract

本申请公开了一种汉字模型训练方法、汉字识别方法、装置、设备及介质,该汉字模型训练方法,包括:获取字体图像训练样本,采用中文二级字库对字体图像训练样本中的手写字图像进行标注,并按预设分配规则将字体图像训练样本分为训练集和测试集;将训练集输入到卷积循环神经网络模型中,采用基于连续时间分类算法的反向传播算法更新卷积循环神经网络模型中的权值和偏置,获取初始手写字识别模型;将测试集输入到初始手写字识别模型中,获取识别准确率,若识别准确率大于预设准确率,则确定初始手写字识别模型为目标手写字识别模型。该目标手写字识别模型可以较准确地识别手写字。

Description

汉字模型训练方法、汉字识别方法、装置、设备及介质
本申请以2018年6月4日提交的申请号为201810563512.7,名称为“汉字模型训练方法、汉字识别方法、装置、设备及介质”的中国发明专利申请为基础,并要求其优先权。
技术领域
本申请涉及手写字识别领域,尤其涉及一种汉字模型训练方法、汉字识别方法、装置、设备及介质。
背景技术
传统汉字的识别方法大多会采用OCR(Optical Character Recognition,光学字符识别)技术进行识别。由于汉字的类别繁多,比如“宋体、楷体、姚体和仿宋”,而且部分汉字的结构比较复杂,比如“魑、魅”,并且汉字中存在着较多的结构相似的字,比如“受和爱”,使得汉字识别准确性无法保证。对标准的、书写简单且规范的句子,采用OCR(光学字符识别)技术可以识别,但是对于手写的字组成的句子,由于每个人的书写习惯不相同且不是标准的横竖撇捺组成的汉字,采用OCR技术识别时,会存在识别不准确的情况,极大限制了识别系统的性能,造成识别的精确度不高,使得识别效果不理想。
发明内容
基于此,有必要针对上述技术问题,提供一种可以提高识别准确度的汉字模型训练方法、装置、设备及介质。
一种汉字模型训练方法,包括:
初始化卷积循环神经网络模型的权值和偏置;
获取字体图像训练样本,采用中文二级字库对所述字体图像训练样本中的手写字图像进行标注,并按预设分配规则将所述字体图像训练样本分为训练集和测试集;
将所述训练集输入到卷积循环神经网络模型中,获取卷积循环神经网络模型的前向输出和后向输出,根据所述卷积循环神经网络模型的前向输出和后向输出,采用基于连续时间分类算法的反向传播算法更新所述卷积循环神经网络模型中的权值和偏置,获取初始手写字识别模型;
将所述测试集输入到所述初始手写字识别模型中,获取识别准确率,若所述识别准确率大于预设准确率,则确定所述初始手写字识别模型为目标手写字识别模型。
一种汉字模型训练装置,包括:
模型初始化模块,用于初始化卷积循环神经网络模型的权值和偏置;
训练样本处理模块,用于获取字体图像训练样本,采用中文二级字库对所述字体图像训练样本中的手写字图像进行标注,并按预设分配规则将所述字体图像训练样本分为训练集和测试集;
初始模型获取模块,用于将所述训练集输入到卷积循环神经网络模型中,获取卷积循环神经网络模型的前向输出和后向输出,根据所述卷积循环神经网络模型的前向输出和后向输出,采用基于连续时间分类算法的反向传播算法更新所述卷积循环神经网络模型中的权值和偏置,获取初始手写字识别模型;
目标模型获取模块,用于将所述测试集输入到所述初始手写字识别模型中,获取识别准确率,若所述识别准确率大于预设准确率,则确定所述初始手写字识别模型为目标手写字识别模型。
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
初始化卷积循环神经网络模型的权值和偏置;
获取字体图像训练样本,采用中文二级字库对所述字体图像训练样本中的手写字图像进行标注,并按预设分配规则将所述字体图像训练样本分为训练集和测试集;
将所述训练集输入到卷积循环神经网络模型中,获取卷积循环神经网络模型的前向输出和后向输出,根据所述卷积循环神经网络模型的前向输出和后向输出,采用基于连续时间分类算法的反向传播算法更新所述卷积循环神经网络模型中的权值和偏置,获取初始手写字识别模型;
将所述测试集输入到所述初始手写字识别模型中,获取识别准确率,若所述识别准确率大于预设准确率,则确定所述初始手写字识别模型为目标手写字识别模型。
一个或多个存储有计算机可读指令的非易失性可读存储介质,所述计算机可读指令被一个或多个处 理器执行时,使得所述一个或多个处理器执行如下步骤:
初始化卷积循环神经网络模型的权值和偏置;
获取字体图像训练样本,采用中文二级字库对所述字体图像训练样本中的手写字图像进行标注,并按预设分配规则将所述字体图像训练样本分为训练集和测试集;
将所述训练集输入到卷积循环神经网络模型中,获取卷积循环神经网络模型的前向输出和后向输出,根据所述卷积循环神经网络模型的前向输出和后向输出,采用基于连续时间分类算法的反向传播算法更新所述卷积循环神经网络模型中的权值和偏置,获取初始手写字识别模型;
将所述测试集输入到所述初始手写字识别模型中,获取识别准确率,若所述识别准确率大于预设准确率,则确定所述初始手写字识别模型为目标手写字识别模型。基于此,有必要针对上述技术问题,提供一种识别准确度较高的汉字识别方法、装置、设备及介质。
一种汉字识别方法,包括:
获取原始图像,所述原始图像包括手写字和背景图像;
对所述原始图像进行预处理,获取有效图像;
采用核密度估计算法和腐蚀方法对所述有效图像进行处理,去除背景图像,获取包括所述手写字的目标图像;
采用文字定位技术对所述目标图像进行文字定位,获取文本行图像;
将所述文本行图像输入到目标手写字识别模型中进行识别,获取所述文本行图像对应的识别结果,所述目标手写字识别模型是采用上述汉字模型训练方法获取到的。
一种汉字识别装置,包括:
原始图像获取模块,用于获取原始图像,所述原始图像包括手写字和背景图像;
有效图像获取模块,用于对所述原始图像进行预处理,获取有效图像;
目标图像获取模块,用于采用核密度估计算法和腐蚀方法对所述有效图像进行处理,去除背景图像,获取包括所述手写字的目标图像;
文本行图像获取模块,用于采用文字定位技术对所述目标图像进行文字定位,获取文本行图像;
识别结果获取模块,用于将所述文本行图像输入到目标手写字识别模型中进行识别,获取所述文本行图像对应的识别结果,所述目标手写字识别模型是采用上述汉字模型训练方法获取到的。
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
获取原始图像,所述原始图像包括手写字和背景图像;
对所述原始图像进行预处理,获取有效图像;
采用核密度估计算法和腐蚀方法对所述有效图像进行处理,去除背景图像,获取包括所述手写字的目标图像;
采用文字定位技术对所述目标图像进行文字定位,获取文本行图像;
将所述文本行图像输入到目标手写字识别模型中进行识别,获取所述文本行图像对应的识别结果,所述目标手写字识别模型是采用上述汉字模型训练方法获取到的。
一个或多个存储有计算机可读指令的非易失性可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:
获取原始图像,所述原始图像包括手写字和背景图像;
对所述原始图像进行预处理,获取有效图像;
采用核密度估计算法和腐蚀方法对所述有效图像进行处理,去除背景图像,获取包括所述手写字的目标图像;
采用文字定位技术对所述目标图像进行文字定位,获取文本行图像;
将所述文本行图像输入到目标手写字识别模型中进行识别,获取所述文本行图像对应的识别结果,所述目标手写字识别模型是采用上述汉字模型训练方法获取到的。
本申请的一个或多个实施例的细节在下面的附图及描述中提出。本申请的其他特征和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请一实施例中汉字模型训练方法的一应用场景图;
图2是本申请一实施例中汉字模型训练方法的一流程图;
图3是图2中步骤S30的一具体流程图;
图4是本申请一实施例中汉字模型训练装置的一示意图;
图5是本申请一实施例中汉字识别方法的一流程图;
图6是图5中步骤S52的一具体流程图;
图7是图5中步骤S53的一具体流程图;
图8是图7中步骤S534的一具体流程图;
图9是本申请一实施例汉字识别装置的一示意图;
图10是本申请一实施例中计算机设备的一示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的部分其他实施例,都属于本申请保护的范围。
本申请实施例提供的汉字模型训练方法,可应用在如图1的应用环境中。该汉字模型训练方法的应用环境包括服务器和客户端,其中,客户端通过网络与服务器进行通信,客户端是可与用户进行人机交互的设备,包括但不限于电脑、智能手机和平板等设备。本申请实施例提供的汉字模型训练方法应用于服务器。
在一实施例中,如图2所示,提供一种汉字模型训练方法,该汉字模型训练方法包括如下步骤:
S10:初始化卷积循环神经网络模型的权值和偏置。
其中,卷积循环神经网络(Convolutional-Recurrent Neural Networks,简称C-RNN)模型是由卷积神经网络(Convolutional Neural Networks,简称CNN)模型和循环神经网络(Recurrent Neural Networks,简称RNN)模型组成的一种神经网络模型。卷积循环神经网络模型的前向输出即就是循环神经网络模型的前向输出。卷积循环神经网络模型的输入层、隐藏层和输出层之间存在有对应的权值和偏置,在模型训练时,首先需要对卷积循环神经网络模型中的权值和偏置进行初始化设置,即给卷积循环神经网络中的输入层与隐藏层之间的权值和偏置设置初始值,并给隐藏层和输出层之间的权值和偏置设置初始值。初始化卷积循环神经网络模型的权值和偏置是进行模型训练的一个必要步骤,对卷积循环神经网络模型的权值和偏置进行合理的初始化设置,有利于提高模型训练速度。
S20:获取字体图像训练样本,采用中文二级字库对字体图像训练样本中的手写字图像进行标注,并按预设分配规则将字体图像训练样本分为训练集和测试集。
具体地,服务器从数据库中获取字体图像训练样本,为后续模型训练提供数据来源。其中,字体图像训练样本指用于训练神经网络模型的手写字样本,包括多个手写字图像,手写字图像指携带有不同人手写的汉字的图像。获取字体图像训练样本后,采用中文二级字库中的标准字体对字体图像训练样本中的手写字图像进行标注,获取与手写字图像关联的标签汉字。标签汉字指从二级中文字库获取的与手写字图像匹配的标准字体的汉字,标准字体包括但不限于宋体、楷体和仿宋。如字体图像训练样本中的手写字图像为不同人写的手写字“忍”“饥”“挨”“饿”,采用中文二级字库中的标准字体对字体图像训练样本中不同人写的“忍”“饥”“挨”“饿”,进行标注,中文二级字库中的宋体、楷体或者仿宋对应的“忍”“饥”“挨”“饿”则为各手写字图像对应的标签汉字。
其中,训练集(training set)是用于调整卷积循环神经网络模型中的参数的数据。测试集(test set)是用于测试训练好的卷积循环神经网络模型的识别准确率的数据。具体地,采用十折交叉验证方法将字体图像训练样本划分成训练集和测试集。其中,十折交叉验证方法是一种常用的测试算法准确性的方法。本实施例中,采用十折交叉验证方法将字体图像训练样本按照9:1的比例对进行分类,即将字体图像 训练样本分为10组,其中的9组字体图像训练样本作为训练集,用于训练卷积循环神经网络模型,剩余的1组字体图像训练样本作为测试集,用于验证训练好的卷积循环神经网络模型的准确率。
S30:将训练集输入到卷积循环神经网络模型中,获取卷积循环神经网络模型的前向输出和后向输出,根据卷积循环神经网络模型的前向输出和后向输出,采用基于连续时间分类算法的反向传播算法更新卷积循环神经网络模型中的权值和偏置,获取初始手写字识别模型。
其中,连续时间分类(Connectionist temporal classification,以下简称CTC)算法指用于解决输入特征和输出标签之间对齐关系不确定的时间序列问题,CTC可以端到端同时优化模型参数和对齐切分的边界的算法。初始手写字识别模型指将训练集中的字体图像训练样本输入到卷积循环神经网络模型中进行训练后的模型。反向传播(Back Propagation)算法是指按照时序状态的反向顺序调整隐藏层与输出层之间的权值和偏置、以及输入层与隐藏层之间的权值和偏置的算法。
具体地,服务器在获取训练集后,对训练集中的手写字图像进行顺序标注,使得手写字图像中的每个手写字都携带有对应的顺序标签。如训练集中手写字图像包含“北京欢迎你”等手写字,对每个手写字进行顺序标注,使得“北”携带有顺序标签“110”,“京”携带有顺序标签“111”,“欢”携带有顺序标签“112”,“迎”携带有顺序标签“113”,“你”携带有顺序标签“114”。然后,将训练集中的手写字图像输入到循环神经网络模型中进行训练,隐藏层通过计算获取对应的前向输出和后向输出,其中,前向输出指按照时间顺序输出的第u个手写字的概率。后向输出是按照时间逆顺序输出的第u个手写字的概率。如“北京欢迎你”假设第u个手写字为“欢”,t-1时刻输出为“京”,根据t-1时刻的输出“京”和t时刻的输入“欢”计算t时刻的输出,该t时刻的输出可能包括“欢、坎和双”,则前向输出指t时刻输出为“欢”概率。假设t+1时刻输出为“迎”,根据t+1时刻的输出“迎”和t时刻的输入“欢”计算t时刻的输出,该t时刻的输出可能包括“欢、坎和双”,则后向输出是指t时刻输出为“欢”概率。
本实施例中,由于采用的是CTC算法更新卷积循环神经网络模型中的权值和偏置,因此,训练集中的手写字图像具体指三个或三个以上手写字形成的单行手写字对应的图像。在卷积循环神经网络模型中,手写字图像的前向输出和后向输出输入到卷积循环神经网络模型中的循环神经网络模型的输出层后,该循环神经网络模型的输出层对前向输出和后向输出进行计算,获取目标输出。其中,目标输出的计算公式为o=ln(a+b)=lna+ln(1+e lnb-lna),a指前向输出,b指后向输出,o表示目标输出。
获取训练集的目标输出后,卷积循环神经网络模型基于该目标输出和标签汉字构建误差函数,利用误差函数求偏导更新卷积循环神经网络模型中的权值和偏置,从而获取初始手写字识别模型。采用连续时间分类算法的反向传播算法更新卷积循环神经网络模型中的权值和偏置,使得权值和偏置的更新是根据训练集中单行手写字对应的手写字图像构建的误差函数进行的更新,解决了输入和输出对齐关系不确定的时间序列问题,保证了初始手写字识别模型是根据时间序列在进行训练的,提高了模型训练的准确性。
S40:将测试集输入到初始手写字识别模型中,获取识别准确率,若识别准确率大于预设准确率,则确定初始手写字识别模型为目标手写字识别模型。
其中,目标手写字识别模型指经过测试集对初始手写字识别模型进行测试后确定的识别准确率符合预设准确率的模型,该目标手写字识别模型可用于识别手写字图像的模型。在初始手写字识别模型训练完成后,将测试集中每一手写字训练样本的手写字图像依次输入到初始手写字识别模型中,获取该初始手写字识别模型的识别准确率。
步骤S40具体包括如下步骤:首先,将测试集中每一手写字训练样本的手写字图像依次输入到初始手写字识别模型中,获取每个手写字图像对应的识别汉字,本实施例中的识别汉字具体指手写字图像经过初始手写字识别模型识别得到的汉字。然后,根据手写字图像对应的识别汉字和标签汉字判断该初始手写字识别模型对手写字图像的识别是否准确,若准确,则将识别准确数量加1,然后根据公式:识别准确率=识别准确数量/测试集手写字图像的数量,计算该初始手写字识别模型的识别准确率。若该初始手写字识别模型的识别准确率大于预设准确率,则确定该初始手写字识别模型为目标手写字识别模 型;反之,若该初始手写字识别模型的识别准确率不大于预设准确率,则需重新进行初始手写字识别模型训练,直至初始手写字识别模型的识别准确率符合要求。其中,预设准确率是预先设置的用于评价初始手写字识别模型的准确率符合预设要求的阈值。例如,预设准确率为82%,测试集在经过初始手写字识别模型的识别后,得到的识别准确率大于82%(如85%或者90%等),则表示该初始手写字识别模型对手写字训练样本的识别准确率达到了要求,该初始手写字识别模型可以确定为目标手写字识别模型。
本实施例所提供的汉字模型训练方法中,将训练集输入到卷积循环神经网络模型中,获取前向输出和后向输出,然后基于前向输出和后向输出计算目标输出,采用基于连续时间分类算法的反向传播算法更新卷积循环神经网络模型中的权值和偏置,获取手写字训练模型,可以有效提高模型训练的准确性。最后将测试集输入到手写字训练模型中进行测试,若手写字训练模型对手写字训练样本的识别准确率大于预设准确率,则表示手写字训练模型对手写字训练样本的识别准确率达到了要求,将该手写字训练模型确定为用于识别手写字图像的目标手写字识别模型,以使获得的目标手写字识别模型对手写字进行识别,具有较高的识别准确性。
在一实施例中,由于卷积循环神经网络模型是由卷积神经网络模型和循环神经网络模型组成的一种神经网络模型,因此在基于卷积循环神经网络模型训练初始手写字识别模型时,需采用卷积神经网络模型和循环神经网络模型进行模型训练。如图3所示,步骤S30,将训练集输入到卷积循环神经网络模型中,获取卷积循环神经网络模型的前向输出和后向输出,根据卷积循环神经网络模型的前向输出和后向输出,采用基于连续时间分类算法的反向传播算法更新卷积循环神经网络模型中的权值和偏置,获取初始手写字识别模型,具体包括如下步骤:
S31:将训练集中手写字图像输入到卷积神经网络模型中,获取训练集中手写字图像对应的手写字图像特征。
具体地,卷积神经网络模型包括多层卷积层和池化层。服务器在将训练集中的字体图像训练样本中的手写字图像输入卷积神经网络模型中进行训练,通过每一层卷积层的计算,获取每一层的卷积层的输出,卷积层的输出可以通过公式a m l=σ(z m l)=σ(a m l-1*W l+b l)计算,其中,a m l表示第l层卷积层的第m个顺序标签的输出,即就是要获取手写字图像对应的手写字图像特征,z m l表示未采用激活函数处理前的第m个顺序标签的输出,a m l-1表示l-1层卷积层的第m个顺序标签输出(即上一层的输出),σ表示激活函数,对于卷积层采用的激活函数σ为ReLU(Rectified Linear Unit,线性整流函数),相比其他激活函数的效果会更好,*表示卷积运算,W l表示第l层的卷积核(权值),b l表示第l层卷积层的偏置。若第l层是池化层,则在池化层采用最大池化的下样采样对卷积层的输出进行降维处理,具体公式为a m l=pool(a m l-1),其中pool是指下采样计算,该下采样计算可以选择最大池化的方法,最大池化实际上就是在m*m的样本中取最大值。最后通过公式
Figure PCTCN2018094405-appb-000001
获取输出层的输出,T (m)表示卷积神经网络模型输出层的输出,该输出即是要获取第m个顺序标签所对应的手写字图像的手写字图像特征,该手写字图像特征携带有顺序标签,该手写字图像特征的顺序标签与该图像标签对应的手写字图像的顺序标签一致。
S32:将训练集中手写字图像对应的手写字图像特征输入到循环神经网络模型中进行训练,获取循环神经网络模型的前向输出和后向输出,循环神经网络模型的前向输出的公式为
Figure PCTCN2018094405-appb-000002
其中,a(t,u)表示第t时刻第u个手写字图像特征对应的前向 输出,
Figure PCTCN2018094405-appb-000003
表示t时刻输出为空格的概率,l′ u表示手写字图像和空格的总长度,a(t-1,i)表示t-1时刻第i个汉字的前向输出;循环神经网络模型的后向输出的公式为
Figure PCTCN2018094405-appb-000004
其中,b(t,u)表示第t时刻第u个手写字图像特征对应的后向输出
Figure PCTCN2018094405-appb-000005
表示t+1时刻输出为空格的概率,a(t+1,i)表示t+1时刻第i个汉字的后向输出。
其中,空格指相邻汉字之间的空白的地方。具体地,将卷积神经网络模型输出的手写字图像特征输入到循环神经网络模型的隐藏层中,根据公式h (m)=σ'(U'T (m-1)+W'T (m)+b')获取该循环神经网络模型的隐藏层的输出,其中,h (m)表示第m个顺序标签在循环神经网络模型中的隐藏层的输出,σ′表示循环神经网络模型的隐藏层的激活函数,U'表示卷积神经网络模型的卷积层和循环神经网络模型的隐藏层之间的权值,若第l层是池化层,则U'表示卷积神经网络模型的池化层和循环神经网络模型的隐藏层之间的权值。W′表示隐藏层和隐藏层之间的权值,b′表示输入层和隐藏层之间的偏置,T (m)表示循环神经网络模型的输入层获取的第m个顺序标签所对应的手写字图像的手写字图像特征。
然后,将循环神经网络模型的隐藏层的输出h (m)通过公式o (m)=V'h (m)+c'计算输入到循环神经网络模型中的输出层的输入,其中,o (m)表示循环神经网络模型中输入给输出层的输入,V'表示循环神经网络模型的隐藏层和输出层之间的权值,c′表示隐藏层和输出层之间的偏置。在输出层中根据公式
Figure PCTCN2018094405-appb-000006
Figure PCTCN2018094405-appb-000007
分别获取循环神经网络模型的前向输出和后向输出,其中,a(t,u)表示第t时刻第u个汉字对应的前向输出,b(t,u)表示第t时刻第u个汉字对应的后向输出。
S33:根据循环神经网络模型的前向输出和后向输出,构建损失函数,并根据损失函数,采用基于连续时间分类算法的反向传播算法更新调整循环神经网络模型和卷积神经网络模型中的权值和偏置,获取初始手写字识别模型,损失函数的具体表达式为:
Figure PCTCN2018094405-appb-000008
其中,x表示输入的汉字,z表示输入的汉字x对应的输出,u表示第u个汉字,z′表示汉字的长度,a(t,u)表示第t时刻第u个汉字对应的前向输出,b(t,u)表示第t时刻第u个汉字对应的后向输出。
具体地,卷积神经网络模型将手写字图像特征输入到循环神经网络模型隐藏层中,根据公式
Figure PCTCN2018094405-appb-000009
获取手写字图像在隐藏层的前向输出,根据公式
Figure PCTCN2018094405-appb-000010
获取手写字图像在隐藏层后向输出,然后将前向输出和后向输出输入到输出层,根据公式o=ln(ab)=ln a+ln(1+e lnb-lna)获取该手写字图像在循环神经网络模型的输出层的目标输出。
获取目标输出后,将目标输出和标签汉字输入到损失函数中,损失函数的具体表达式为
Figure PCTCN2018094405-appb-000011
然后根据损失函数获取单行手写字对应的手写字图像的误差E loss(x,z)。在获取E loss(x,z)后,通过对E loss(x,z)求偏导,更新调整循环神经网络模型和卷积神经网络模型中的权值和偏置,获取初始手写字识别模型。其中,求偏导的公式为
Figure PCTCN2018094405-appb-000012
其中θ表示卷积循环神经网络模型中的权值和偏置的集合。
步骤S31-S33,通过卷积神经网络模型获取训练集中手写字图像对应的手写字图像特征,然后将手写字图像特征输入到循环神经网络模型中进行训练,获取前向输出和后向输出,并根据前向输出和后向输出与标签汉字构建损失函数。最后根据损失函数,采用基于连续时间分类算法的反向传播算法更新调整循环神经网络模型和卷积神经网络模型中的权值和偏置,获取初始手写字识别模型,保证模型训练的准确性和速度。
本实施例所提供的汉字模型训练方法中,将训练集输入到卷积循环神经网络模型中,通过卷积神经网络模型,获取手写字图像对应的手写字图像特征,然后将手写字图像特征输入到循环神经网络模型中,采用基于连续时间分类算法的反向传播算法更新卷积循环神经网络模型中的权值和偏置,使得卷积循环神经网络模型中的权值和偏置是根据时间序列的手写字图像进行更新的,通过各手写字与前后相邻的手写字之间的关系识别手写字,有效提高了初始手写字识别模型的准确性。为了进一步验证初始手写字识别模型的准确性,将测试集输入到初始手写字识别模型中进行测试,若初始手写字识别模型对字体图像训练样本的识别准确率大于预设准确率,则表示初始手写字识别模型对字体图像训练样本的识别准确率达到了要求,该初始手写字识别模型确定为用于识别手写字图像的目标手写字识别模型,该目标手写字识别模型具有较高的识别准确性。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
在一实施例中,提供一种汉字模型训练装置,该汉字模型训练装置与上述实施例中汉字模型训练方法一一对应。如图4所示,该汉字模型训练装置包括模型初始化模块10、训练样本处理模块20、初始模型获取模块30和目标模型获取模块40,各功能模块详细说明如下:
模型初始化模块10,用于初始化卷积循环神经网络模型的权值和偏置。
训练样本处理模块20,用于获取字体图像训练样本,采用中文二级字库对字体图像训练样本中的手写字图像进行标注,并按预设分配规则将字体图像训练样本分为训练集和测试集。
初始模型获取模块30,用于将训练集输入到卷积循环神经网络模型中,获取卷积循环神经网络模型的前向输出和后向输出,根据卷积循环神经网络模型的前向输出和后向输出,采用基于连续时间分类算法的反向传播算法更新卷积循环神经网络模型中的权值和偏置,获取初始手写字识别模型。
目标模型获取模块40,用于将测试集输入到初始手写字识别模型中,获取识别准确率,若识别准 确率大于预设准确率,则确定初始手写字识别模型为目标手写字识别模型。
具体地,卷积循环神经网络模型包括卷积神经网络模型及循环神经网络模型。
训练模型获取模块30包括图像特征获取单元31、模型输出获取单元32和初始模型获取单元33。
图像特征获取单元31,用于将训练集中手写字图像输入到卷积神经网络模型中,获取训练集中手写字图像对应的手写字图像特征。
模型输出获取单元32,用于将训练集中手写字图像对应的手写字图像特征输入到循环神经网络模型中进行训练,获取循环神经网络模型的前向输出和后向输出,循环神经网络模型的前向输出的公式为
Figure PCTCN2018094405-appb-000013
其中,其中,a(t,u)表示第t时刻第u个手写字图像特征对应的前向输出,
Figure PCTCN2018094405-appb-000014
表示t时刻输出为空格的概率,l′ u表示手写字图像和空格的总长度,a(t-1,i)表示t-1时刻第i个汉字的前向输出;循环神经网络模型的后向输出的公式为
Figure PCTCN2018094405-appb-000015
其中,b(t,u)表示第t时刻第u个手写字图像特征对应的后向输出
Figure PCTCN2018094405-appb-000016
表示t+1时刻输出为空格的概率,a(t+1,i)表示t+1时刻第i个汉字的后向输出。
初始模型获取单元33,用于根据循环神经网络模型的前向输出和后向输出,构建损失函数,并根据损失函数,采用基于连续时间分类算法的反向传播算法更新调整循环神经网络模型和卷积神经网络模型中的权值和偏置,获取初始手写字识别模型,损失函数的具体表达式为:
Figure PCTCN2018094405-appb-000017
其中,x表示输入的汉字,z表示输入的汉字x对应的输出,u表示第u个汉字,z′表示汉字的长度,a(t,u)表示第t时刻第u个汉字对应的前向输出,b(t,u)表示第t时刻第u个汉字对应的后向输出。
在一实施例中,如图5所示,提供一种汉字识别方法,该汉字识别方法具体包括如下步骤:
S51:获取原始图像,原始图像包括手写字和背景图像。
其中,原始图像指没有经过任何处理的特定图像,该特定图像是指需要包括手写字的图像。本实施例中的原始图像包括手写字和背景图像。其中,背景图像是指原始图像上的背景图案对应的图像。该原始图像的获取方式包括但不限于从网页上爬取或者通过访问与服务器相连的数据库上获取,该数据库上的原始图像可以是终端设备预先上传的图像。
S52:对原始图像进行预处理,获取有效图像。
其中,有效图像指原始图像经过预处理后的图像。服务器获取有效图像的具体步骤为:(1)判断原始图像是否为彩色图像,若原始图像为彩色图像,则对原始图像进行灰度化处理,获取灰度图像,使得彩色图像中每个像素对应的三个分量R(红色)、G(绿色)和B(蓝色)可以用一个值替代,有助于简化后续进行极差标准化处理的复杂度。可以理解地,若原始图像不为彩色图像,则原始图像为灰度图像,无需再进行灰度化处理。(2)对灰度图像对应的像素矩阵进行极差标准化处理,获取有效图像。对灰度图像对应的像素矩阵进行极差标准化处理可以在保留像素矩阵中相对关系,同时又可以提高计算速度。
S53:采用核密度估计算法和腐蚀方法对有效图像进行处理,去除背景图像,获取包括手写字的目标图像。
目标图像指仅包含手写字部分的图像。核密度估计算法是一种从数据样本本身出发研究数据分布特征,用于估计概率密度函数的非参数方法。核密度估计算法的具体公式为
Figure PCTCN2018094405-appb-000018
表示像素的估计概率密度,K(.)为核函数,h为像素范围,x为要估计概率密度的像素,x i为h范围内的第i个像素,n为h范围内像素为x的个数。腐蚀方法指对图像进行腐蚀处理的方法,其中,腐蚀指去除图像中背景图像的部分,仅保留手写字的部分。
本实施例中,采用核密度估计算法的公式对有效图像对应的频率分布直方图进行处理,获取频率分布直方图对应的平滑曲线,根据平滑曲线上的极小值和极大值,获取极小值和极大值对应的像素,然后根据极大值和极小值对应的像素对有效图像进行分层处理,在分层处理后,对分层处理后的图像进行腐蚀处理,去除背景图像,保留手写字部分。最后将经过分层和腐蚀处理的图像进行叠加处理,获取包括手写字的目标图像。其中,叠加处理指将分层后的仅保留有手写字部分的图像叠加成一个图像的处理过程,从而实现获取包括手写字的目标图像的目的。
S54:采用文字定位技术对目标图像进行文字定位,获取文本行图像。
其中,文字定位技术指对文字区域进行定位的技术。文字定位技术包括但不限于文本检测(Connectionist Text Proposal Network,以下简称CTPN)技术和光学字符识别(Optical Character Recognition,OCR)技术。其中,CPTN指用于进行图像文字检测的常用网络技术。OCR技术是指对文本资料的图像文件进行分析识别处理,获取文字及版面信息的技术。一般分为两个步骤:1.文字定位,即找到文字在图片中的位置;2.文字识别,即识别出找到的文字。本实施例中,采用OCR技术中文字定位的步骤。
具体地,以OCR技术为例进行文字定位的步骤如下:
(1)先采用邻近搜索方法从步骤S5342中获取的连通区域中,任意选取一个连通区域作为起始连通区域,计算剩余连通区域(除其实区域外的其他连通区域)与该起始连通区域之间的距离,选取区域距离小于预设阈值的连通区域作为目标连通区域,以便确定膨胀操作的方向(即上、下、左和右)。其中,预设阈值是预先设定好的用于判断两个连通区域之间的距离的阈值。邻近搜索方法是指从一个起始连通区域出发,可以找到该起始连通区域的水平外切矩形,将连通区域扩展到整个矩形,当该起始连通区域与最邻近区域的距离小于预设阈值时,对这个矩形进行膨胀操作,其膨胀的方向是最邻近区域的所在方向的方法。只有当膨胀方向为水平方向时,进行膨胀操作。其中,区域距离是指两个连通区域的距离,若按照临近边界进行计算时,还需要减去区域长度,即通过公式
Figure PCTCN2018094405-appb-000019
计算得出x′ c,通过公式
Figure PCTCN2018094405-appb-000020
计算得出y′ c,即得到
Figure PCTCN2018094405-appb-000021
在获取(x′ c,y′ c)后,根据区域距离的计算公式
Figure PCTCN2018094405-appb-000022
获取区域距离,其中,S为起始连通区域,S′为剩余连通区域,(x c,y c)为两个连通区域间的中心向量差,
Figure PCTCN2018094405-appb-000023
(x',y')表示剩余连通区域S'所在矩形的左上角的坐标点,(w',z')表示剩余连通区域S'所在矩形的右下角的坐标点,(x,y)表示起始连通区域S所在矩 形的左上角的坐标点,(w,z)表示起始连通区域S所在矩形的右下角的坐标点,本实施例中将(x,y)对应的点(即起始连通区域S所在矩形的左上角的坐标点)作为原点。
(2)基于目标连通区域的方向确定膨胀操作的方向,按照确定的膨胀方向对起始连通区域进行膨胀处理,获取文本行图像。其中,膨胀处理是腐蚀处理是用于形态学中将图像进行扩大的处理。采用MATLAB中内置的imdilate函数对二值化图像的连通区域进行腐蚀处理。文本行图像指采用文字定位技术获取的单行手写字对应的图像。具体地,对起始连通区域进行膨胀处理包括如下步骤:选取一个n×n的结构元素,本实施例中是以像素矩阵中每个元素相邻的8个元素值作为该元素的连通区域的,因此,选取的结构元素为3×3的像素矩阵。结构元素是一个n×n的像素矩阵,其中的矩阵元素包括0或1,按照目标连通区域的方向,对连通区域进行扫描,将结构元素与目标连通区域方向上被结构元素覆盖的连通区域进行逻辑与运算,若结果都为0,则保持不变;若不全为0,则将结构元素覆盖的像素矩阵都变为1,该变为1的部分则为起始连通区域被膨胀的部分。逻辑与运算的运算规则为0&&0=0,0&&1=0,1&&0=0,1&&1=1。其中,&&为逻辑与运算符号。对目标图像进行文字定位,获取文本行图像,可以节省模型的识别时间,同时可以提高识别结果的准确性。
S55:将文本行图像输入到目标手写字识别模型中进行识别,获取文本行图像对应的识别结果,目标手写字识别模型是采用上述汉字模型训练方法获取到的。
其中,目标手写字识别模型是预先训练好的用于识别手写字的模型。识别结果指识别概率大于预设概率的手写字图像经过卷积循环神经网络模型识别获取的结果。具体地,将文本行图像输入到目标手写字识别模型中,获取每一文本行图像对应的识别概率,该识别概率是指经过目标手写字模型识别获取的该文本行图像对应的汉字的概率。将识别概率和预设概率进行比较,若识别概率大于预设概率,则获取对应的识别结果,有助于提高识别结果的准确性。
如预设概率为85%,将“北京欢迎你”对应的文本行图像输入到目标手写字识别模型中,获取的识别结果可能可能为“北京欢迎你”、“北京坎迎你”和“北京双迎你”,其中,“北京欢迎你”对应的识别概率为99%,“北京坎迎你”和“北京双迎你”的识别概率分别为50%和60%,将识别概率和预设概率进行比较,99%大于85%,对应的识别结果则为“北京欢迎你”。
本实施例所提供的汉字识别方法,通过对原始图像进行预处理,获取有效图像,并采用核密度估计算法和腐蚀方法对有效图像进行处理,去除背景图像的部分,保留仅含有手写字的目标图像,可以节省模型的识别时间。采用文字定位技术对目标图像进文字定位,获取文本行图像,将获取的文本行图像输入到目标手写字识别模型中识别,基于文本行图像对应的识别概率值,获取识别结果。采用目标手写字识别模型对文本行图像进行识别,可以提高识别准确率。
在一实施例中,如图6所示,步骤S52,对原始图像进行预处理,获取有效图像,具体包括如下步骤:
S521:对原始图像进行放大和灰度化处理,获取灰度图像。
由于在原始图像中,手写字本身的尺寸相对于背景图像而言较小,在对原始图像进行灰度化处理时,手写字容易被误处理掉,因此,为了保证手写字不会再灰度化处理时被误清除,需要对原始图像对应的每个像素进行放大处理,如原始图像中第n个像素的大小为x n,对原始图像中的每个像素进行幂次放大处理,使得x n变为
Figure PCTCN2018094405-appb-000024
本实施例中,将原始图像中的像素进行放大处理,可以有效避免在对原始图像进行灰度化处理时,手写字被误处理掉。
在原始图像进行放大处理后,若原始图像不是灰度图像而是彩色图像时,则需要对原始图像进行灰度化处理,获取灰度图像。可以理解地,若原始图像为灰度图像,则不需要进行灰度化处理。当原始图像为彩色图像时,对原始图像进行灰度化处理的具体步骤为:采用公式Y=0.299R+0.587G+0.114B对原始图像中的每个像素进行处理,获取每个像素对应的采样像素,依据该采样像素形成灰度图像;其中,R(红色)、G(绿色)和B(蓝色)是原始图像中的三个分量, 采样像素是灰度图像中用于替换彩色图像中R、G和B三个分量对应的像素。
对原始图像为彩色图像进行灰度化处理,有效减少了后续步骤获取有效图像时需要处理的数据量和计算的复杂度。
S522:对灰度图像对应的像素矩阵进行极差标准化处理,获取有效图像,其中,极差标准化处理的公式为
Figure PCTCN2018094405-appb-000025
x是标准化前有效图像的像素,x'是标准化后有效图像的像素,M min是灰度图像对应的像素矩阵M中最小的像素,M max是灰度图像对应的像素矩阵M中最大的像素。
其中,极差标准化处理是对数据进行处理,使数据压缩在(0,1)范围内的处理方法。对灰度图像对应的像素矩阵进行价差标准化处理并乘上255,可以方便对像素矩阵中的数据进行处理,同时保留像素矩阵中各像素的相互关系。灰度图像中,背景图像和每个手写字都有各自对应的像素矩阵。在获取灰度图像中的背景图像和每个手写字对应的像素矩阵后,对像素矩阵进行极差标准化处理,获取极差标准化处理后的像素矩阵对应的有效图像。对像素矩阵进行极差标准化处理,能够提高获取包括手写字的目标图像的处理速度。
步骤S521-S522,通过对原始图像进行放大处理,可以有效避免在对原始图像在下一个步骤中对原始图像进行灰度化处理时,将手写字误处理掉的情况发生。对原始图像进行灰度化处理,获取灰度图像可以减少后续步骤中需要处理的数据量。对灰度图像进行极差标准化处理,能够提高获取包括手写字的目标图像的处理速度。
在一实施例中,如图7所示,步骤S53,采用核密度估计算法和腐蚀方法对有效图像进行处理,去除背景图像,获取包括手写字的目标图像,具体包括如下步骤:
S531:对有效图像中的像素出现的次数进行统计,获取有效图像对应的频率分布直方图。
其中,频率分布直方图的横轴表示样本数据的连续值,横轴上的每个小区间对应一个组的组距,作为小矩形的底边;纵轴表示频率与组距的比值,并用该比值作为小矩形的高,以多个小矩形构成的一组图称为频率直方图。具体地,获取有效图像后,在频率直方图的横轴表示像素为(0,255)之间的连续值,横轴上每个小矩形对应的组距为1,纵轴表示小矩形对应的像素出现的频率与组距的比值,该比值即为对应的小矩形的高。该频率分布直方图可以形象地将有效图像中的像素出现的次数展示出来,使得数据的分布情况一目了然地反映出来。
S532:采用高斯核密度估算方法对频率分布直方图进行处理,获取频率分布直方图对应的频率极大值和频率极小值,并根据频率极大值和频率极小值获取对应的像素。
高斯核密度估算方法指核函数为高斯核的核密度估算方法。其中,高斯核对应的函数为
Figure PCTCN2018094405-appb-000026
其中,K (x)指像素(自变量)为x的高斯核函数,x指像素,e和π为常数。频率极大值指在频率分布直方图中,频率值大小为极大值的频率值;频率极小值指在频率分布直方图中,频率值大小为极小值的频率值。具体地,采用高斯核密度函数估算方法对获取的有效图像对应的频率分布直方图进行高斯平滑处理,获取该频率分布直方图对应的高斯平滑曲线。基于该高斯平滑曲线上的频率极大值和频率极小值,获取频率极大值和频率极小值对应横轴上的像素。本实施例中,获取频率极大值和频率极小值对应的像素,便于后续对有效图像进行分层区分,获取分层图像。
S533:基于频率极大值和频率极小值对应的像素对有效图像进行分层处理,获取分层图像。
分层图像指基于频率极大值和频率极小值对有效图像进行分层处理得到的图像。获取频率极大值和频率极小值对应的像素,根据频率极大值对应的像素对有效图像进行分层处理,有效图像中有多少个频率极大值,对应的有效图像的像素就被聚类为多少类,该有效图像就会被分为几层。然后以频率极小值对应的像素作为类之间的边界值,根据类之间的边界则可以每一层分层图像对应的像素。
如有效图像中的频率极大值对应的像素分别为12、54、97、113、159、172,频率极小值对应的像素分别为26、69、104、139和163,根据有效图像中的频率极大值的个数可以确定该有效图像的像素 可以被分为6类,该有效图像可以被分为6层,频率极小值对应的像素作为类之间的边界值,由于最小的像素为0,最大的像素为255,因此,根据类之间的边界值则可以确定以像素为12的分层图像,该分层图像对应的像素范围为[0,26);以像素为54的分层图像,该分层图像对应的像素范围为[26,69);以像素为97的分层图像,该分层图像对应的像素范围为[69,104);以像素为113的分层图像,该分层图像对应的像素范围为[104,139);以像素为159的分层图像,该分层图像对应的像素范围为[139,163);以像素为172的分层图像,该分层图像对应的像素范围为[163,255]。
S534:对分层图像进行腐蚀和叠加处理,获取包括手写字的目标图像。
获取分层图像后,对分层图像进行二值化处理。其中,二值化处理是指将图像上的像素设置为0(黑色)或1(白色),将整个图像呈现出明显的黑白效果的处理。对分层图像进行二值化处理后,对二值化处理后的分层图像进行腐蚀处理,去除背景图像部分,保留分层图像上的手写字部分。其中,腐蚀处理是用于形态学中去除图像的某部分的内容的操作。由于每个分层图像上的像素是属于不同范围的像素,因此,对分层图像进行腐蚀处理后,还需要将每个分层图像叠加,生成仅含有手写字的目标图像。
步骤S531-S534,通过获取有效图像对应的频率分布直方图,并根据频率分布直方图获取频率极大值和频率极小值对应的像素,从而获取分层图像。最后对分层图像进行二值化、腐蚀和叠加处理,完成对原始图像中手写字和背景图像的识别,去除背景图像,获取包括手写字的目标图像。
在一实施例中,如图8所示,步骤S534中,对分层图像进行腐蚀和叠加处理,获取包括手写字的目标图像,具体包括如下步骤:
S5341:对分层图像进行二值化处理,获取分层二值化图像。
分层二值化图像指对分层图像进行二值化处理获取的图像。具体地,获取分层图像后,基于分层图像的采样像素和预先选取的阈值进行比较,将采样大于等于阈值的像素设置为1,小于阈值的像素设置为0的过程。本实施例中,0代表背景像素,1代表目标像素(手写字像素)。该阈值可以通过计算分层图像的类间方差获取,也可以根据经验值获取。阈值的大小会影响分层图像二值化处理的效果,若阈值选取合适,则对分层图像进行二值化处理的效果就比较好,相应地,若阈值选取不合适,则影响分层图像二值化处理的效果。为了方便操作,简化计算过程,本实施例中的阈值根据经验值确定。
S5342:对分层二值化图像中的像素进行检测标记,获取分层二值化图像对应的连通区域。
其中,连通区域是指某一特定像素周围的邻接像素所围成的区域。如某特定像素为0,其周围的邻接像素为1,则将邻接像素所围成的区域作为连通区域。
获取每个分层图像对应的分层二值化图像后,对分层二值化图像对应的像素矩阵进行逐行扫描,将符合连通规则(4邻域连通或者8邻域连通)的像素向相同的标号标记出来。4邻域连通指一个特定像素与上、下、左、右四个方向相邻的像素相同的情况;8邻域连通指一个特定像素上、下、左、右、左上、左下、右上、右下八个方向相邻的像素相同的情况。
具体地,像素矩阵包括行和列。对二值化图像中的像素进行检测标记的具体过程为:(1)逐行扫描像素矩阵,把每行中连续为1的像素(目标像素)组成一个序列,该序列称为团,标记好该团的起点、终点以及所在的行号。团的起点指团的第一个像素,团的终点指团的最后一个像素。(2)对像素矩阵中除了第行外的剩余行里的团,比较某一特定剩余行中的团与前行中的所有团是否有重合区域,若没有重合区域,则给该特定剩余行中的团一个新的标号;如果该特定剩余行中的团仅与上行中一个团有重合区域,则将上行的该团的标号赋给它;如果该特定剩余行与上行中有两个以上的团有重合区域,则给对应的团赋一个相关联团的最小标号,并将上行的这几个团中的标记写入等价对,说明它们属于一类。其中,相关联团指与特定剩余行的团有重合区域的上行的团;等价对指相互连通的团上的标号。
例如,一像素矩阵中的特定剩余行为第三行,该第三行中有两个团(A,B),其中A团与第二行中的两个团(该两个团的标号为1,2)有重合区域,则将第二行中的两个团的最小标号1赋给该A团,A团的标号为1,并将A团、1团和2团对应的标号记为等价对,即将(1,2)记为等价对。标号为1和标号为2的团则称为一个连通区域。
S5343:对分层二值化图像对应的连通区域进行腐蚀和叠加处理,获取包括手写字的目标图像。
采用MATLAB中的imerode函数或者Open CV中的cvErode函数对分层二值化图像的连通区域进行腐蚀处理。具体地,选取一个结构像素,本实施例是以像素矩阵中某个特征像素相邻的8个像素作为 该特征像素的连通区域的,因此,选取的结构像素3×3的像素矩阵。使用结构像素对分层二值化图像的像素矩阵进行扫描,比较分层二值化图像中的像素矩阵与结构像素是否完全一致,若完全一致时,则像素矩阵中对应的9个像素为都变为1;若不完全一致,则像素矩阵中对应的9个像素都变为0,其中,0(黑色)则为分层二值化图像被腐蚀的部分。
基于预先设置的手写字区域抗腐蚀能力范围对分层二值化图像进行筛选,对于不在手写字区域抗腐蚀能力范围内的分层二值化图像部分删除,获取分层二值化图像中在手写字区域抗腐蚀能力范围内的部分。对筛选出的符合手写字区域抗腐蚀能力范围的每个分层二值化图像部分对应的像素矩阵进行叠加,就可以获取到仅含有手写字的目标图像。其中,手写字区域抗腐蚀能力可以采用公式:
Figure PCTCN2018094405-appb-000027
计算,s 1表示分层二值化图像中被腐蚀后的总面积,s 2表示分层二值化图像中被腐蚀前的总面积。
如预先设置的手写字区域抗腐蚀能力范围为[0.05,0.8],根据公式
Figure PCTCN2018094405-appb-000028
计算每个分层二值化图像被腐蚀后的总面积和分层二值化图像被腐蚀前的总面积的比值。通过计算,分层二值化图像中某区域腐蚀后的总面积和腐蚀前的总面积的比值不在预先设置的手写字区域抗腐蚀能力范围内,则表示该区域的分层二值化图像是手写字,需要保留。分层二值化图像中的某区域腐蚀后的总面积和腐蚀前的总面积的比值在[0.05,0.8]范围内,则表示该区域的分层二值化图像是手写字,需要保留。采用imadd函数对每个分层二值化图像对应的像素矩阵进行叠加,获取含有手写字的目标图像。imadd函数是计算机语言中的一个函数,用于对分层图像进行叠加
步骤S5341-S5343,对分层图像进行二值化处理,获取分层二值化图像,然后对分层二值化图像中的像素进行检测标记,获取分层二值化图像对应的像素矩阵中每个像素的连通区域,采用结构像素对每个像素的连通区域进行检测,对与结构像素不完全一致的像素矩阵中的像素都变为0,像素为0的分层二值化图像为黑色,该黑色部分则是分层二值化图像被腐蚀的部分,通过计算分层二值化图像被腐蚀后的总面积和分层二值化图像被腐蚀前的总面积的比值,判断该比值是否在预先设置的手写字区域抗腐蚀能力范围,去除背景图像,保留手写字,达到获取包括手写字的目标图像的目的。
该汉字识别方法通过对原始图像进行放大和灰度化处理,获取灰度图像,然后对灰度图像进行价差标准化处理,获取有效图像。方便后续步骤采用高斯核密度估计算法对有效图像进行分层、二值化、腐蚀和叠加处理,去除背景图像,保留只含有手写字的目标图像。采用文字定位技术对目标图像进行文字定位,获取文本行图像,将获取的文本行图像输入到目标手写字识别模型中识别,基于文本行图像对应的识别概率值,获取识别结果,可以提高手写字识别的精准度。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
在一实施例中,提供一种汉字识别装置,该汉字识别装置与上述实施例中汉字识别方法一一对应。如图9所示,该汉字识别装置包括原始图像获取模块51、有效图像获取模块52、目标图像获取模块53、文本行图像获取模块54和识别结果获取模块55。各功能模块详细说明如下:
原始图像获取模块51,用于获取原始图像,原始图像包括手写字和背景图像。
有效图像获取模块52,用于对原始图像进行预处理,获取有效图像。
目标图像获取模块53,用于采用核密度估计算法和腐蚀方法对有效图像进行处理,去除背景图像,获取包括手写字的目标图像。
文本行图像获取模块54,用于采用文字定位技术对目标图像进行文字定位,获取文本行图像。
识别结果获取模块55,用于将文本行图像输入到目标手写字识别模型中进行识别,获取文本行图像对应的识别结果,目标手写字识别模型是采用上述汉字模型训练方法获取到的。
具体地,有效图像获取模块52包括灰度图像获取单元521和极差标准化处理单元522。
灰度图像获取单元521,用于对原始图像进行放大和灰度化处理,获取灰度图像。
极差标准化处理单元522,用于对灰度图像对应的像素矩阵进行极差标准化处理,获取有效图像,其中,极差标准化处理的公式为
Figure PCTCN2018094405-appb-000029
x是标准化前有效图像的像素,x'是标准化后有效图像的像素,M min是灰度图像对应的像素矩阵M中最小的像素,M max是灰度图像对应的像素矩阵M中最大的像素。
具体地,目标图像获取模块53包括第一处理单元531、第二处理单元532、分层图像获取单元533和分层图像处理单元534。
第一处理单元531,用于对有效图像中的像素出现的次数进行统计,获取有效图像对应的频率分布直方图。
第二处理单元532,用于采用高斯核密度估算方法对频率分布直方图进行处理,获取频率分布直方图对应的频率极大值和频率极小值,并根据频率极大值和频率极小值获取对应的像素。
分层图像获取单元533,用于基于频率极大值和频率极小值对应的像素对有效图像进行分层处理,获取分层图像。
分层图像处理单元534,用于对分层图像进行腐蚀和叠加处理,获取包括手写字的目标图像。
具体地,分层图像处理单元534包括二值化处理单元5341、连通区域获取单元5342和连通区域处理单元5343。
二值化处理单元5341,用于对分层图像进行二值化处理,获取分层二值化图像。
连通区域获取单元5342,用于对分层二值化图像中的像素进行检测标记,获取分层二值化图像对应的连通区域。
连通区域处理单元5343,用于对分层二值化图像对应的连通区域进行腐蚀和叠加处理,获取包括手写字的目标图像。
在一实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图10所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储目标手写字识别模型。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种汉字模型训练方法。
在一实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,处理器执行计算机可读指令时实现以下步骤:初始化卷积循环神经网络模型的权值和偏置;获取字体图像训练样本,采用中文二级字库对字体图像训练样本中的手写字图像进行标注,并按预设分配规则将字体图像训练样本分为训练集和测试集;将训练集输入到卷积循环神经网络模型中,获取卷积循环神经网络模型的前向输出和后向输出,根据卷积循环神经网络模型的前向输出和后向输出,采用基于连续时间分类算法的反向传播算法更新卷积循环神经网络模型中的权值和偏置,获取初始手写字识别模型;将测试集输入到初始手写字识别模型中,获取识别准确率,若识别准确率大于预设准确率,则确定初始手写字识别模型为目标手写字识别模型。
在一实施例中,处理器执行计算机可读指令时还实现以下步骤:将训练集中手写字图像输入到卷积神经网络模型中,获取训练集中手写字图像对应的手写字图像特征;将训练集中手写字图像对应的手写字图像特征输入到循环神经网络模型中进行训练,获取循环神经网络模型的前向输出和后向输出,前向输出的公式为
Figure PCTCN2018094405-appb-000030
其中,a(t,u)表示第t时刻第u个手写字图像特征对应的前向输出,
Figure PCTCN2018094405-appb-000031
表示t时刻输出为空格的概率,l′ u表示手写字图像和空格的总长度, a(t-1,i)表示t-1时刻第i个汉字的前向输出;循环神经网络模型的后向输出的公式为
Figure PCTCN2018094405-appb-000032
其中,b(t,u)表示第t时刻第u个手写字图像特征对应的后向输出
Figure PCTCN2018094405-appb-000033
表示t+1时刻输出为空格的概率,a(t+1,i)表示t+1时刻第i个汉字的后向输出;根据循环神经网络模型的前向输出和后向输出,构建损失函数,并根据损失函数,采用基于连续时间分类算法的反向传播算法更新调整循环神经网络模型和卷积神经网络模型中的权值和偏置,获取初始手写字识别模型,损失函数的具体表达式为:
Figure PCTCN2018094405-appb-000034
其中,x表示输入的汉字,z表示输入的汉字x对应的输出,u表示第u个汉字,z′表示汉字的长度,a(t,u)表示第t时刻第u个汉字对应的前向输出,b(t,u)表示第t时刻第u个汉字对应的后向输出。
在一实施例中,提供了一个或多个存储有计算机可读指令的非易失性可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器实现如下步骤:初始化卷积循环神经网络模型的权值和偏置;获取字体图像训练样本,采用中文二级字库对字体图像训练样本中的手写字图像进行标注,并按预设分配规则将字体图像训练样本分为训练集和测试集;将训练集输入到卷积循环神经网络模型中,获取卷积循环神经网络模型的前向输出和后向输出,根据卷积循环神经网络模型的前向输出和后向输出,采用基于连续时间分类算法的反向传播算法更新卷积循环神经网络模型中的权值和偏置,获取初始手写字识别模型;将测试集输入到初始手写字识别模型中,获取识别准确率,若识别准确率大于预设准确率,则确定初始手写字识别模型为目标手写字识别模型。
在一实施例中,计算机可读指令被处理器执行时还实现以下步骤:将训练集中手写字图像输入到卷积神经网络模型中,获取训练集中手写字图像对应的手写字图像特征;将训练集中手写字图像对应的手写字图像特征输入到循环神经网络模型中进行训练,获取循环神经网络模型的前向输出和后向输出,前向输出的公式为
Figure PCTCN2018094405-appb-000035
其中,a(t,u)表示第t时刻第u个手写字图像特征对应的前向输出,
Figure PCTCN2018094405-appb-000036
表示t时刻输出为空格的概率,l′ u表示手写字图像和空格的总长度,a(t-1,i)表示t-1时刻第i个汉字的前向输出;循环神经网络模型的后向输出的公式为
Figure PCTCN2018094405-appb-000037
其中,b(t,u)表示第t时刻第u个手写字图像特征对应的后向输出
Figure PCTCN2018094405-appb-000038
表示t+1时刻输出为空格的概率,a(t+1,i)表示t+1时刻第i个汉字的后向输出;根据循环神经网络模型的前向输出和后向输出,构建损失函数,并根据损失函数,采用基于连续时间分类算法的反向传播算法更新调整循环神经网络模型和卷积神经网络模型中的权值和偏置,获取初始手写字识别模型,损失函数的具体表达式为:
Figure PCTCN2018094405-appb-000039
其中,x表示输入的汉 字,z表示输入的汉字x对应的输出,u表示第u个汉字,z′表示汉字的长度,a(t,u)表示第t时刻第u个汉字对应的前向输出,b(t,u)表示第t时刻第u个汉字对应的后向输出。
在一实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,处理器执行计算机可读指令时实现以下步骤:获取原始图像,原始图像包括手写字和背景图像;对原始图像进行预处理,获取有效图像;采用核密度估计算法和腐蚀方法对有效图像进行处理,去除背景图像,获取包括手写字的目标图像;采用文字定位技术对目标图像进行文字定位,获取文本行图像;将文本行图像输入到目标手写字识别模型中进行识别,获取文本行图像对应的识别结果,目标手写字识别模型是采用上述汉字模型训练方法获取到的。
在一实施例中,处理器执行计算机可读指令时还实现以下步骤:对原始图像进行放大和灰度化处理,获取灰度图像;对灰度图像对应的像素矩阵进行极差标准化处理,获取有效图像,其中,极差标准化处理的公式为
Figure PCTCN2018094405-appb-000040
x是标准化前有效图像的像素,x'是标准化后有效图像的像素,M min是灰度图像对应的像素矩阵M中最小的像素,M max是灰度图像对应的像素矩阵M中最大的像素。
在一实施例中,处理器执行计算机可读指令时还实现以下步骤:对有效图像中的像素出现的次数进行统计,获取有效图像对应的频率分布直方图;采用高斯核密度估算方法对频率分布直方图进行处理,获取频率分布直方图对应的频率极大值和频率极小值,并根据频率极大值和频率极小值获取对应的像素;基于频率极大值和频率极小值对应的像素对有效图像进行分层切分,获取分层图像;对分层图像进行腐蚀和叠加处理,获取包括手写字的目标图像。
在一实施例中,处理器执行计算机可读指令时还实现以下步骤:对分层图像进行二值化处理,获取分层二值化图像;对分层二值化图像中的像素进行检测标记,获取分层二值化图像对应的连通区域;对分层二值化图像对应的连通区域进行腐蚀和叠加处理,获取包括手写字的目标图像。
在一实施例中,提供了一个或多个存储有计算机可读指令的非易失性可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器实现如下步骤:获取原始图像,原始图像包括手写字和背景图像;对原始图像进行预处理,获取有效图像;采用核密度估计算法和腐蚀方法对有效图像进行处理,去除背景图像,获取包括手写字的目标图像;采用文字定位技术对目标图像进行文字定位,获取文本行图像;将文本行图像输入到目标手写字识别模型中进行识别,获取文本行图像对应的识别结果,目标手写字识别模型是采用上述汉字模型训练方法获取到的。
在一实施例中,计算机可读指令被处理器执行时还实现以下步骤:对原始图像进行放大和灰度化处理,获取灰度图像;对灰度图像对应的像素矩阵进行极差标准化处理,获取有效图像,其中,极差标准化处理的公式为
Figure PCTCN2018094405-appb-000041
x是标准化前有效图像的像素,x'是标准化后有效图像的像素,M min是灰度图像对应的像素矩阵M中最小的像素,M max是灰度图像对应的像素矩阵M中最大的像素。
在一实施例中,计算机可读指令被处理器执行时还实现以下步骤:对有效图像中的像素出现的次数进行统计,获取有效图像对应的频率分布直方图;采用高斯核密度估算方法对频率分布直方图进行处理,获取频率分布直方图对应的频率极大值和频率极小值,并根据频率极大值和频率极小值获取对应的像素;基于频率极大值和频率极小值对应的像素对有效图像进行分层切分,获取分层图像;对分层图像进行腐蚀和叠加处理,获取包括手写字的目标图像。
在一实施例中,计算机可读指令被处理器执行时还实现以下步骤:对分层图像进行二值化处理,获 取分层二值化图像;对分层二值化图像中的像素进行检测标记,获取分层二值化图像对应的连通区域;对分层二值化图像对应的连通区域进行腐蚀和叠加处理,获取包括手写字的目标图像。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种汉字模型训练方法,其特征在于,包括:
    初始化卷积循环神经网络模型的权值和偏置;
    获取字体图像训练样本,采用中文二级字库对所述字体图像训练样本中的手写字图像进行标注,并按预设分配规则将所述字体图像训练样本分为训练集和测试集;
    将所述训练集输入到卷积循环神经网络模型中,获取卷积循环神经网络模型的前向输出和后向输出,根据所述卷积循环神经网络模型的前向输出和后向输出,采用基于连续时间分类算法的反向传播算法更新所述卷积循环神经网络模型中的权值和偏置,获取初始手写字识别模型;
    将所述测试集输入到所述初始手写字识别模型中,获取识别准确率,若所述识别准确率大于预设准确率,则确定所述初始手写字识别模型为目标手写字识别模型。
  2. 如权利要求1所述的汉字模型训练方法,其特征在于,所述卷积循环神经网络模型包括卷积神经网络模型及循环神经网络模型;
    所述将所述训练集输入到卷积循环神经网络模型中,获取卷积循环神经网络模型的前向输出和后向输出,根据所述卷积循环神经网络模型的前向输出和后向输出,采用基于连续时间分类算法的反向传播算法更新所述卷积循环神经网络模型中的权值和偏置,获取初始手写字识别模型,包括:
    将训练集中手写字图像输入到卷积神经网络模型中,获取训练集中手写字图像对应的手写字图像特征;将所述训练集中手写字图像对应的手写字图像特征输入到循环神经网络模型中进行训练,获取所述循环神经网络模型的前向输出和后向输出,所述循环神经网络模型的前向输出的公式为
    Figure PCTCN2018094405-appb-100001
    其中,a(t,u)表示第t时刻第u个所述手写字图像特征对应的前向输出,
    Figure PCTCN2018094405-appb-100002
    表示t时刻输出为空格的概率,l′ u表示手写字图像和空格的总长度,a(t-1,i)表示t-1时刻第i个汉字的前向输出;所述循环神经网络模型的后向输出的公式为
    Figure PCTCN2018094405-appb-100003
    其中,b(t,u)表示第t时刻第u个所述手写字图像特征对应的后向输出
    Figure PCTCN2018094405-appb-100004
    表示t+1时刻输出为空格的概率,a(t+1,i)表示t+1时刻第i个汉字的后向输出;
    根据所述循环神经网络模型的前向输出和后向输出,构建损失函数,并根据所述损失函数,采用基于连续时间分类算法的反向传播算法更新调整所述循环神经网络模型和所述卷积神经网络模型中的权值和偏置,获取初始手写字识别模型,所述损失函数的具体表达式为:
    Figure PCTCN2018094405-appb-100005
    其中,x表示输入的汉字,z表示输入的汉字x对应的输出,u表示第u个汉字,z′表示汉字的长度,a(t,u)表示第t时刻第u个汉字对应的前向输出,b(t,u)表示第t时刻第u个汉字对应的后向输出。
  3. 一种汉字识别方法,其特征在于,包括:
    获取原始图像,所述原始图像包括手写字和背景图像;
    对所述原始图像进行预处理,获取有效图像;
    采用核密度估计算法和腐蚀方法对所述有效图像进行处理,去除背景图像,获取包括所述手写字的目标图像;
    采用文字定位技术对所述目标图像进行文字定位,获取文本行图像;
    将所述文本行图像输入到目标手写字识别模型中进行识别,获取所述文本行图像对应的识别结果,所述目标手写字识别模型是采用权利要求1或2所述汉字模型训练方法获取到的。
  4. 如权利要求3所述的汉字识别方法,其特征在于,所述对所述原始图像进行预处理,获取有效图像,包括:
    对所述原始图像进行放大和灰度化处理,获取灰度图像;
    对所述灰度图像对应的像素矩阵进行极差标准化处理,获取有效图像,其中,所述极差标准化处理的公式为
    Figure PCTCN2018094405-appb-100006
    x是标准化前有效图像的像素,x'是标准化后有效图像的像素,M min是所述灰度图像对应的像素矩阵M中最小的像素,M max是所述灰度图像对应的像素矩阵M中最大的像素。
  5. 如权利要求3所述的汉字识别方法,其特征在于,所述采用核密度估计算法和腐蚀方法对所述有效图像进行处理,去除背景图像,获取包括所述手写字的目标图像,包括:
    对所述有效图像中的像素出现的次数进行统计,获取所述有效图像对应的频率分布直方图;
    采用高斯核密度估算方法对所述频率分布直方图进行处理,获取所述频率分布直方图对应的频率极大值和频率极小值,并根据所述频率极大值和频率极小值获取对应的像素;
    基于所述频率极大值和所述频率极小值对应的像素对有效图像进行分层切分,获取分层图像;
    对所述分层图像进行腐蚀和叠加处理,获取包括所述手写字的目标图像。
  6. 如权利要求5所述的汉字识别方法,其特征在于,所述对所述分层图像进行腐蚀和叠加处理,获取包括所述手写字的目标图像,包括:
    对所述分层图像进行二值化处理,获取分层二值化图像;
    对所述分层二值化图像中的像素进行检测标记,获取所述分层二值化图像对应的连通区域;
    对所述分层二值化图像对应的连通区域进行腐蚀和叠加处理,获取包括所述手写字的目标图像。
  7. 一种汉字模型训练装置,其特征在于,包括:
    模型初始化模块,用于初始化卷积循环神经网络模型的权值和偏置;
    训练样本处理模块,用于获取字体图像训练样本,采用中文二级字库对所述字体图像训练样本中的手写字图像进行标注,并按预设分配规则将所述字体图像训练样本分为训练集和测试集;
    初始模型获取模块,用于将所述训练集输入到卷积循环神经网络模型中,获取卷积循环神经网络模型的前向输出和后向输出,根据所述卷积循环神经网络模型的前向输出和后向输出,采用基于连续时间分类算法的反向传播算法更新所述卷积循环神经网络模型中的权值和偏置,获取初始手写字识别模型;
    目标模型获取模块,用于将所述测试集输入到所述初始手写字识别模型中,获取识别准确率,若所述识别准确率大于预设准确率,则确定所述初始手写字识别模型为目标手写字识别模型。
  8. 一种汉字识别装置,其特征在于,包括:
    原始图像获取模块,用于获取原始图像,所述原始图像包括手写字和背景图像;
    有效图像获取模块,用于对所述原始图像进行预处理,获取有效图像;
    目标图像获取模块,用于采用核密度估计算法和腐蚀方法对所述有效图像进行处理,去除背景图像,获取包括所述手写字的目标图像;
    文本行图像获取模块,用于采用文字定位技术对所述目标图像进行文字定位,获取文本行图像;
    识别结果获取模块,用于将所述文本行图像输入到目标手写字识别模型中进行识别,获取所述文本行图像对应的识别结果,所述目标手写字识别模型是采用权利要求1或2所述汉字模型训练方法获取到的。
  9. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现如下步骤:
    初始化卷积循环神经网络模型的权值和偏置;
    获取字体图像训练样本,采用中文二级字库对所述字体图像训练样本中的手写字图像进行标注,并 按预设分配规则将所述字体图像训练样本分为训练集和测试集;
    将所述训练集输入到卷积循环神经网络模型中,获取卷积循环神经网络模型的前向输出和后向输出,根据所述卷积循环神经网络模型的前向输出和后向输出,采用基于连续时间分类算法的反向传播算法更新所述卷积循环神经网络模型中的权值和偏置,获取初始手写字识别模型;
    将所述测试集输入到所述初始手写字识别模型中,获取识别准确率,若所述识别准确率大于预设准确率,则确定所述初始手写字识别模型为目标手写字识别模型。
  10. 如权利要求9所述的计算机设备,其特征在于,所述卷积循环神经网络模型包括卷积神经网络模型及循环神经网络模型;
    所述将所述训练集输入到卷积循环神经网络模型中,获取卷积循环神经网络模型的前向输出和后向输出,根据所述卷积循环神经网络模型的前向输出和后向输出,采用基于连续时间分类算法的反向传播算法更新所述卷积循环神经网络模型中的权值和偏置,获取初始手写字识别模型,包括:
    将训练集中手写字图像输入到卷积神经网络模型中,获取训练集中手写字图像对应的手写字图像特征;
    将所述训练集中手写字图像对应的手写字图像特征输入到循环神经网络模型中进行训练,获取所述循环神经网络模型的前向输出和后向输出,所述循环神经网络模型的前向输出的公式为
    Figure PCTCN2018094405-appb-100007
    其中,a(t,u)表示第t时刻第u个所述手写字图像特征对应的前向输出,
    Figure PCTCN2018094405-appb-100008
    表示t时刻输出为空格的概率,l′ u表示手写字图像和空格的总长度,a(t-1,i)表示t-1时刻第i个汉字的前向输出;所述循环神经网络模型的后向输出的公式为
    Figure PCTCN2018094405-appb-100009
    其中,b(t,u)表示第t时刻第u个所述手写字图像特征对应的后向输出
    Figure PCTCN2018094405-appb-100010
    表示t+1时刻输出为空格的概率,a(t+1,i)表示t+1时刻第i个汉字的后向输出;
    根据所述循环神经网络模型的前向输出和后向输出,构建损失函数,并根据所述损失函数,采用基于连续时间分类算法的反向传播算法更新调整所述循环神经网络模型和所述卷积神经网络模型中的权值和偏置,获取初始手写字识别模型,所述损失函数的具体表达式为:
    Figure PCTCN2018094405-appb-100011
    其中,x表示输入的汉字,z表示输入的汉字x对应的输出,u表示第u个汉字,z′表示汉字的长度,a(t,u)表示第t时刻第u个汉字对应的前向输出,b(t,u)表示第t时刻第u个汉字对应的后向输出。
  11. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现如下步骤:
    获取原始图像,所述原始图像包括手写字和背景图像;
    对所述原始图像进行预处理,获取有效图像;
    采用核密度估计算法和腐蚀方法对所述有效图像进行处理,去除背景图像,获取包括所述手写字的目标图像;
    采用文字定位技术对所述目标图像进行文字定位,获取文本行图像;
    将所述文本行图像输入到目标手写字识别模型中进行识别,获取所述文本行图像对应的识别结果,所述目标手写字识别模型是采用权利要求1或2所述汉字模型训练方法获取到的。
  12. 如权利要求11所述的计算机设备,其特征在于,所述对所述原始图像进行预处理,获取有效图 像,包括:
    对所述原始图像进行放大和灰度化处理,获取灰度图像;
    对所述灰度图像对应的像素矩阵进行极差标准化处理,获取有效图像,其中,所述极差标准化处理的公式为
    Figure PCTCN2018094405-appb-100012
    x是标准化前有效图像的像素,x'是标准化后有效图像的像素,M min是所述灰度图像对应的像素矩阵M中最小的像素,M max是所述灰度图像对应的像素矩阵M中最大的像素。
  13. 如权利要求11所述的计算机设备,其特征在于,所述采用核密度估计算法和腐蚀方法对所述有效图像进行处理,去除背景图像,获取包括所述手写字的目标图像,包括:
    对所述有效图像中的像素出现的次数进行统计,获取所述有效图像对应的频率分布直方图;
    采用高斯核密度估算方法对所述频率分布直方图进行处理,获取所述频率分布直方图对应的频率极大值和频率极小值,并根据所述频率极大值和频率极小值获取对应的像素;
    基于所述频率极大值和所述频率极小值对应的像素对有效图像进行分层切分,获取分层图像;
    对所述分层图像进行腐蚀和叠加处理,获取包括所述手写字的目标图像。
  14. 如权利要求13所述的计算机设备,其特征在于,所述对所述分层图像进行腐蚀和叠加处理,获取包括所述手写字的目标图像,包括:
    对所述分层图像进行二值化处理,获取分层二值化图像;
    对所述分层二值化图像中的像素进行检测标记,获取所述分层二值化图像对应的连通区域;
    对所述分层二值化图像对应的连通区域进行腐蚀和叠加处理,获取包括所述手写字的目标图像。
  15. 一个或多个存储有计算机可读指令的非易失性可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:
    初始化卷积循环神经网络模型的权值和偏置;
    获取字体图像训练样本,采用中文二级字库对所述字体图像训练样本中的手写字图像进行标注,并按预设分配规则将所述字体图像训练样本分为训练集和测试集;
    将所述训练集输入到卷积循环神经网络模型中,获取卷积循环神经网络模型的前向输出和后向输出,根据所述卷积循环神经网络模型的前向输出和后向输出,采用基于连续时间分类算法的反向传播算法更新所述卷积循环神经网络模型中的权值和偏置,获取初始手写字识别模型;
    将所述测试集输入到所述初始手写字识别模型中,获取识别准确率,若所述识别准确率大于预设准确率,则确定所述初始手写字识别模型为目标手写字识别模型。
  16. 如权利要求15所述的非易失性可读存储介质,其特征在于,所述卷积循环神经网络模型包括卷积神经网络模型及循环神经网络模型;
    所述将所述训练集输入到卷积循环神经网络模型中,获取卷积循环神经网络模型的前向输出和后向输出,根据所述卷积循环神经网络模型的前向输出和后向输出,采用基于连续时间分类算法的反向传播算法更新所述卷积循环神经网络模型中的权值和偏置,获取初始手写字识别模型,包括:
    将训练集中手写字图像输入到卷积神经网络模型中,获取训练集中手写字图像对应的手写字图像特征;
    将所述训练集中手写字图像对应的手写字图像特征输入到循环神经网络模型中进行训练,获取所述循环神经网络模型的前向输出和后向输出,所述循环神经网络模型的前向输出的公式为
    Figure PCTCN2018094405-appb-100013
    其中,a(t,u)表示第t时刻第u个所述手写字图像特征对应的前向输出,
    Figure PCTCN2018094405-appb-100014
    表示t时刻输出为空格的概率,l′ u表示手写字图像和空格的总长度,a(t-1,i)表示t-1时刻第i个汉字的前向输出;所述循环神经网络模型的后向输出的公式为
    Figure PCTCN2018094405-appb-100015
    其中,b(t,u)表示第t时刻第u个所述手写字图像特征对应的后向输出
    Figure PCTCN2018094405-appb-100016
    表示t+1时刻输出为空格的概率,a(t+1,i)表示t+1时刻第i个汉字的后向输出;
    根据所述循环神经网络模型的前向输出和后向输出,构建损失函数,并根据所述损失函数,采用基于连续时间分类算法的反向传播算法更新调整所述循环神经网络模型和所述卷积神经网络模型中的权值和偏置,获取初始手写字识别模型,所述损失函数的具体表达式为:
    Figure PCTCN2018094405-appb-100017
    其中,x表示输入的汉字,z表示输入的汉字x对应的输出,u表示第u个汉字,z′表示汉字的长度,a(t,u)表示第t时刻第u个汉字对应的前向输出,b(t,u)表示第t时刻第u个汉字对应的后向输出。
  17. 一个或多个存储有计算机可读指令的非易失性可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:
    获取原始图像,所述原始图像包括手写字和背景图像;
    对所述原始图像进行预处理,获取有效图像;
    采用核密度估计算法和腐蚀方法对所述有效图像进行处理,去除背景图像,获取包括所述手写字的目标图像;
    采用文字定位技术对所述目标图像进行文字定位,获取文本行图像;
    将所述文本行图像输入到目标手写字识别模型中进行识别,获取所述文本行图像对应的识别结果,所述目标手写字识别模型是采用权利要求1或2所述汉字模型训练方法获取到的。
  18. 如权利要求17所述的非易失性可读存储介质,其特征在于,所述对所述原始图像进行预处理,获取有效图像,包括:
    对所述原始图像进行放大和灰度化处理,获取灰度图像;
    对所述灰度图像对应的像素矩阵进行极差标准化处理,获取有效图像,其中,所述极差标准化处理的公式为
    Figure PCTCN2018094405-appb-100018
    x是标准化前有效图像的像素,x'是标准化后有效图像的像素,M min是所述灰度图像对应的像素矩阵M中最小的像素,M max是所述灰度图像对应的像素矩阵M中最大的像素。
  19. 如权利要求17所述的非易失性可读存储介质,其特征在于,所述采用核密度估计算法和腐蚀方法对所述有效图像进行处理,去除背景图像,获取包括所述手写字的目标图像,包括:
    对所述有效图像中的像素出现的次数进行统计,获取所述有效图像对应的频率分布直方图;
    采用高斯核密度估算方法对所述频率分布直方图进行处理,获取所述频率分布直方图对应的频率极大值和频率极小值,并根据所述频率极大值和频率极小值获取对应的像素;
    基于所述频率极大值和所述频率极小值对应的像素对有效图像进行分层切分,获取分层图像;
    对所述分层图像进行腐蚀和叠加处理,获取包括所述手写字的目标图像。
  20. 如权利要求19所述的非易失性可读存储介质,其特征在于,所述对所述分层图像进行腐蚀和叠加处理,获取包括所述手写字的目标图像,包括:
    对所述分层图像进行二值化处理,获取分层二值化图像;
    对所述分层二值化图像中的像素进行检测标记,获取所述分层二值化图像对应的连通区域;
    对所述分层二值化图像对应的连通区域进行腐蚀和叠加处理,获取包括所述手写字的目标图像。
PCT/CN2018/094405 2018-06-04 2018-07-04 汉字模型训练方法、汉字识别方法、装置、设备及介质 WO2019232874A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810563512.7 2018-06-04
CN201810563512.7A CN108710866B (zh) 2018-06-04 2018-06-04 汉字模型训练方法、汉字识别方法、装置、设备及介质

Publications (1)

Publication Number Publication Date
WO2019232874A1 true WO2019232874A1 (zh) 2019-12-12

Family

ID=63870377

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/094405 WO2019232874A1 (zh) 2018-06-04 2018-07-04 汉字模型训练方法、汉字识别方法、装置、设备及介质

Country Status (2)

Country Link
CN (1) CN108710866B (zh)
WO (1) WO2019232874A1 (zh)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111414917A (zh) * 2020-03-18 2020-07-14 民生科技有限责任公司 一种低像素密度文本的识别方法
CN111414916A (zh) * 2020-02-29 2020-07-14 中国平安财产保险股份有限公司 图像中文本内容提取生成方法、装置及可读存储介质
CN111539414A (zh) * 2020-04-26 2020-08-14 梁华智能科技(上海)有限公司 一种ocr图像字符识别和字符校正的方法及系统
CN112052852A (zh) * 2020-09-09 2020-12-08 国家气象信息中心 一种基于深度学习的手写气象档案资料的字符识别方法
CN112183027A (zh) * 2020-08-31 2021-01-05 同济大学 一种基于人工智能的艺术作品生成系统及方法
CN112200216A (zh) * 2020-09-03 2021-01-08 上海眼控科技股份有限公司 汉字识别方法、装置、计算机设备和存储介质
CN113362249A (zh) * 2021-06-24 2021-09-07 平安普惠企业管理有限公司 文字图像合成方法、装置、计算机设备及存储介质
CN113436222A (zh) * 2021-05-31 2021-09-24 新东方教育科技集团有限公司 图像处理方法、图像处理装置、电子设备及存储介质
CN113792851A (zh) * 2021-09-09 2021-12-14 北京百度网讯科技有限公司 字体生成模型训练方法、字库建立方法、装置及设备
CN117649672A (zh) * 2024-01-30 2024-03-05 湖南大学 基于主动学习与迁移学习的字体类别视觉检测方法和系统
CN111414916B (zh) * 2020-02-29 2024-05-31 中国平安财产保险股份有限公司 图像中文本内容提取生成方法、装置及可读存储介质

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109543777B (zh) * 2018-11-28 2020-10-27 中国科学院自动化研究所 手写汉字书写质量评价方法及系统
CN109902678A (zh) * 2019-02-12 2019-06-18 北京奇艺世纪科技有限公司 模型训练方法、文字识别方法、装置、电子设备及计算机可读介质
CN110110585B (zh) * 2019-03-15 2023-05-30 西安电子科技大学 基于深度学习的智能阅卷实现方法及系统、计算机程序
CN110033052A (zh) * 2019-04-19 2019-07-19 济南浪潮高新科技投资发展有限公司 一种ai识别手写字体的自训练方法及自训练平台
CN110135411B (zh) * 2019-04-30 2021-09-10 北京邮电大学 名片识别方法和装置
CN110321788A (zh) * 2019-05-17 2019-10-11 平安科技(深圳)有限公司 训练数据处理方法、装置、设备及计算机可读存储介质
CN111539424A (zh) * 2020-04-21 2020-08-14 北京云从科技有限公司 一种基于ocr的图像处理方法、系统、设备及介质
CN111898603A (zh) * 2020-08-10 2020-11-06 上海瑞美锦鑫健康管理有限公司 一种基于深度神经网络的体检单识别方法和系统
CN111950548B (zh) * 2020-08-10 2023-07-28 河南大学 一种引入字库文字图像进行深度模板匹配的汉字识别方法
CN112163508A (zh) * 2020-09-25 2021-01-01 中国电子科技集团公司第十五研究所 一种基于真实场景的文字识别方法、系统及ocr终端
CN112766051A (zh) * 2020-12-29 2021-05-07 有米科技股份有限公司 基于Attention的图像文字识别方法及装置
CN113903043B (zh) * 2021-12-11 2022-05-06 绵阳职业技术学院 一种基于孪生度量模型的印刷汉字字体识别方法
CN114549296B (zh) * 2022-04-21 2022-07-12 北京世纪好未来教育科技有限公司 图像处理模型的训练方法、图像处理方法及电子设备

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170364744A1 (en) * 2016-06-20 2017-12-21 Machine Learning Works, LLC Neural network based recognition of mathematical expressions
CN107590497A (zh) * 2017-09-20 2018-01-16 重庆邮电大学 基于深度卷积神经网络的脱机手写汉字识别方法
CN107943967A (zh) * 2017-11-28 2018-04-20 华南理工大学 基于多角度卷积神经网络与循环神经网络的文本分类算法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105184226A (zh) * 2015-08-11 2015-12-23 北京新晨阳光科技有限公司 数字识别方法和装置及神经网络训练方法和装置
CN107122809B (zh) * 2017-04-24 2020-04-28 北京工业大学 基于图像自编码的神经网络特征学习方法
CN107316054A (zh) * 2017-05-26 2017-11-03 昆山遥矽微电子科技有限公司 基于卷积神经网络和支持向量机的非标准字符识别方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170364744A1 (en) * 2016-06-20 2017-12-21 Machine Learning Works, LLC Neural network based recognition of mathematical expressions
CN107590497A (zh) * 2017-09-20 2018-01-16 重庆邮电大学 基于深度卷积神经网络的脱机手写汉字识别方法
CN107943967A (zh) * 2017-11-28 2018-04-20 华南理工大学 基于多角度卷积神经网络与循环神经网络的文本分类算法

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111414916A (zh) * 2020-02-29 2020-07-14 中国平安财产保险股份有限公司 图像中文本内容提取生成方法、装置及可读存储介质
CN111414916B (zh) * 2020-02-29 2024-05-31 中国平安财产保险股份有限公司 图像中文本内容提取生成方法、装置及可读存储介质
CN111414917B (zh) * 2020-03-18 2023-05-12 民生科技有限责任公司 一种低像素密度文本的识别方法
CN111414917A (zh) * 2020-03-18 2020-07-14 民生科技有限责任公司 一种低像素密度文本的识别方法
CN111539414A (zh) * 2020-04-26 2020-08-14 梁华智能科技(上海)有限公司 一种ocr图像字符识别和字符校正的方法及系统
CN111539414B (zh) * 2020-04-26 2023-05-23 梁华智能科技(上海)有限公司 一种ocr图像字符识别和字符校正的方法及系统
CN112183027B (zh) * 2020-08-31 2022-09-06 同济大学 一种基于人工智能的艺术作品生成系统及方法
CN112183027A (zh) * 2020-08-31 2021-01-05 同济大学 一种基于人工智能的艺术作品生成系统及方法
CN112200216A (zh) * 2020-09-03 2021-01-08 上海眼控科技股份有限公司 汉字识别方法、装置、计算机设备和存储介质
CN112052852A (zh) * 2020-09-09 2020-12-08 国家气象信息中心 一种基于深度学习的手写气象档案资料的字符识别方法
CN112052852B (zh) * 2020-09-09 2023-12-29 国家气象信息中心 一种基于深度学习的手写气象档案资料的字符识别方法
CN113436222A (zh) * 2021-05-31 2021-09-24 新东方教育科技集团有限公司 图像处理方法、图像处理装置、电子设备及存储介质
CN113362249A (zh) * 2021-06-24 2021-09-07 平安普惠企业管理有限公司 文字图像合成方法、装置、计算机设备及存储介质
CN113362249B (zh) * 2021-06-24 2023-11-24 广州云智达创科技有限公司 文字图像合成方法、装置、计算机设备及存储介质
CN113792851A (zh) * 2021-09-09 2021-12-14 北京百度网讯科技有限公司 字体生成模型训练方法、字库建立方法、装置及设备
CN113792851B (zh) * 2021-09-09 2023-07-25 北京百度网讯科技有限公司 字体生成模型训练方法、字库建立方法、装置及设备
CN117649672A (zh) * 2024-01-30 2024-03-05 湖南大学 基于主动学习与迁移学习的字体类别视觉检测方法和系统
CN117649672B (zh) * 2024-01-30 2024-04-26 湖南大学 基于主动学习与迁移学习的字体类别视觉检测方法和系统

Also Published As

Publication number Publication date
CN108710866A (zh) 2018-10-26
CN108710866B (zh) 2024-02-20

Similar Documents

Publication Publication Date Title
WO2019232874A1 (zh) 汉字模型训练方法、汉字识别方法、装置、设备及介质
WO2019232853A1 (zh) 中文模型训练、中文图像识别方法、装置、设备及介质
WO2019232872A1 (zh) 手写字模型训练方法、汉字识别方法、装置、设备及介质
WO2019232873A1 (zh) 文字模型训练方法、文字识别方法、装置、设备及介质
WO2019232849A1 (zh) 汉字模型训练方法、手写字识别方法、装置、设备及介质
CN110569830B (zh) 多语言文本识别方法、装置、计算机设备及存储介质
WO2019232843A1 (zh) 手写模型训练、手写图像识别方法、装置、设备及介质
CN106446896B (zh) 一种字符分割方法、装置及电子设备
WO2019232852A1 (zh) 手写字训练样本获取方法、装置、设备及介质
TWI744283B (zh) 一種單詞的分割方法和裝置
Ul-Hasan et al. Offline printed Urdu Nastaleeq script recognition with bidirectional LSTM networks
US9367766B2 (en) Text line detection in images
WO2019232870A1 (zh) 手写字训练样本获取方法、装置、计算机设备及存储介质
WO2019232850A1 (zh) 手写汉字图像识别方法、装置、计算机设备及存储介质
CN110647829A (zh) 一种票据的文本识别方法及系统
CN112818812A (zh) 图像中表格信息的识别方法、装置、电子设备及存储介质
CN109740606B (zh) 一种图像识别方法及装置
He et al. Historical manuscript dating based on temporal pattern codebook
Islam et al. Text detection and recognition using enhanced MSER detection and a novel OCR technique
CN113158808A (zh) 中文古籍字符识别、组段与版面重建方法、介质和设备
CN115082934B (zh) 一种金融票据中手写汉字分割识别方法
CN115461792A (zh) 手写文本识别方法、装置和系统,手写文本搜索方法和系统,以及计算机可读存储介质
CN113158977A (zh) 改进FANnet生成网络的图像字符编辑方法
Verma et al. A novel approach for structural feature extraction: contour vs. direction
CN113361666B (zh) 一种手写字符识别方法、系统及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18922035

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 11/03/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18922035

Country of ref document: EP

Kind code of ref document: A1