WO2019232870A1 - 手写字训练样本获取方法、装置、计算机设备及存储介质 - Google Patents

手写字训练样本获取方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2019232870A1
WO2019232870A1 PCT/CN2018/094345 CN2018094345W WO2019232870A1 WO 2019232870 A1 WO2019232870 A1 WO 2019232870A1 CN 2018094345 W CN2018094345 W CN 2018094345W WO 2019232870 A1 WO2019232870 A1 WO 2019232870A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
handwriting
target
neural network
font
Prior art date
Application number
PCT/CN2018/094345
Other languages
English (en)
French (fr)
Inventor
吴启
周罡
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019232870A1 publication Critical patent/WO2019232870A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words

Definitions

  • the present application relates to the field of handwriting recognition, and in particular, to a method, device, computer equipment, and storage medium for obtaining handwriting training samples.
  • the current handwriting recognition model usually requires manual handwriting training samples to train the handwriting recognition model.
  • This manual handwriting training sample requires manual writing and manual labeling. Each person's writing habits are different.
  • using manual labeling training samples is low in efficiency and limited in number, which affects the handwriting recognition model. Training efficiency and accuracy.
  • a method for obtaining handwriting training samples includes:
  • a handwriting training sample acquisition device includes:
  • An original image acquisition module configured to acquire an original image, where the original image includes handwriting and a background image
  • An effective image acquisition module configured to pre-process the original image to obtain an effective image
  • a target image acquisition module configured to process the effective image by using a kernel density estimation algorithm and an erosion method, remove a background image, and obtain a target image including the handwriting;
  • a single font image acquisition module configured to use a vertical projection method to perform single font cutting on the target image to obtain a single font image
  • a recognition result acquisition module configured to input the single font image into a target handwriting recognition model for recognition, and when the recognition probability of the single font image is greater than a preset probability, obtain a recognition result corresponding to the single font image ;
  • a target Chinese character confirmation module configured to query a semantic database based on the recognition result to obtain a corresponding target Chinese character
  • a handwriting training sample acquisition module is configured to associate the single font image with a corresponding target Chinese character to obtain a handwriting training sample.
  • a computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor.
  • the processor executes the computer-readable instructions, the following steps are implemented:
  • One or more non-volatile readable storage media storing computer-readable instructions, and when the computer-readable instructions are executed by one or more processors, the one or more processors implement the following steps:
  • FIG. 1 is an application scenario diagram of a method for obtaining handwriting training samples according to an embodiment of the present application
  • FIG. 2 is a flowchart of a method for obtaining handwriting training samples according to an embodiment of the present application
  • FIG. 3 is a specific flowchart of step S20 in FIG. 2;
  • FIG. 4 is a specific flowchart of step S30 in FIG. 2;
  • step S34 in FIG. 4 is a specific flowchart of step S34 in FIG. 4;
  • FIG. 6 is another flowchart of a method for obtaining handwriting training samples according to an embodiment of the present application.
  • FIG. 7 is a specific flowchart of step S73 in FIG. 6;
  • FIG. 8 is a schematic diagram of a handwriting training sample acquisition device according to an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a computer device in an embodiment of the present application.
  • the method for obtaining handwriting training samples provided in the embodiments of the present application can be applied in the application environment shown in FIG. 1.
  • the application environment of the handwriting training sample acquisition method includes a server and a client.
  • the client communicates with the server through a network.
  • the client is a device that can interact with the user, including, but not limited to, a computer, a smartphone, and a tablet. And other equipment.
  • the method for obtaining handwriting training samples provided in the embodiments of the present application is applied to a server.
  • a handwriting training sample acquisition method includes the following steps:
  • the original image refers to a specific image that has not undergone any processing, and the specific image refers to an image that needs to include handwriting.
  • the original images in this embodiment include handwriting and background images.
  • the background image refers to an image corresponding to a background pattern on the original image.
  • the method for acquiring the original image includes, but is not limited to, crawling from a webpage or acquiring the database connected to the server, and the original image on the database may be an image uploaded in advance by the terminal device.
  • the effective image refers to the image after the original image is preprocessed.
  • the specific steps for the server to obtain a valid image are: (1) determine whether the original image is a color image, and if the original image is a color image, perform grayscale processing on the original image to obtain a grayscale image so that each pixel in the color image corresponds to The three components of R (red), G (green), and B (blue) can be replaced with one value, which helps to simplify the complexity of subsequent extreme normalization processing. Understandably, if the original image is not a color image, the original image is a grayscale image, and no further graying process is required. (2) Perform a range normalization process on the pixel matrix corresponding to the grayscale image to obtain a valid image. Performing the range normalization processing on the pixel matrix corresponding to the grayscale image can preserve the relative relationship in the pixel matrix while improving the calculation speed.
  • S30 Use a kernel density estimation algorithm and an erosion method to process the effective image, remove the background image, and obtain a target image including handwriting.
  • Kernel density estimation algorithm is a non-parametric method that studies the data distribution characteristics from the data sample itself and is used to estimate the probability density function.
  • the specific formula of the kernel density estimation algorithm is Represents the estimated probability density of the pixel, K (.) Is the kernel function, h is the pixel range, x is the pixel whose probability density is to be estimated, x i is the i-th pixel in the h range, and n is the pixel in the h range where x is Number.
  • the etching method refers to a method of performing an etching process on an image, wherein the etching refers to removing an unnecessary part of an image and leaving only a necessary part.
  • the formula of the kernel density estimation algorithm is used to process the frequency distribution histogram corresponding to the effective image, to obtain the smooth curve corresponding to the frequency distribution histogram, and to obtain the minimum value according to the minimum value and the maximum value on the smoothed curve.
  • the layered image is corroded to remove the background image. Keep the handwritten part.
  • the layered and corroded images are superimposed to obtain the target image.
  • the superposition processing refers to a process of superimposing the layered image with only the handwritten portion into one image, thereby achieving the purpose of obtaining a target image containing only handwritten characters.
  • the vertical projection method refers to a method in which each line of handwritten characters is projected in a vertical direction to obtain a vertical projection histogram.
  • the vertical projection histogram is a graph that reflects the number of pixels of the target image in the vertical direction.
  • the abscissa axis of the vertical projection histogram represents the width of the target image, and the ordinate indicates the distribution of the number of pixels of the target image.
  • the cutting threshold cuts the target image to obtain a single font image.
  • Single font image refers to the image corresponding to a single handwriting.
  • the cutting threshold refers to a preset value for cutting handwriting in a target image to obtain a single font.
  • the target image is cut with a single font. If the preset cutting threshold is 10, when the number of pixels in the vertical projection histogram corresponding to the target image is less than or equal to 10 (0, 9 and 10), then the number of pixels (0, 9 and 10) corresponds to The position of the abscissa is a dividing point between two adjacent handwritings, and the target image is cut by a single font at the dividing point to obtain a single font image corresponding to the target image. Understandably, the pixels corresponding to each handwriting are relatively concentrated, and the pixels corresponding to the gap between Chinese characters are sparse. The density of the pixels is reflected in the corresponding vertical projection histogram, which means that there are Chinese characters. The number of pixels corresponding to pixels is relatively high, and the number of pixels corresponding to pixels without Chinese characters is relatively low.
  • the vertical projection method can effectively perform single font cutting on the target image, obtain single font images, and provide technical support for subsequent model recognition.
  • S50 The single font image is input into the target handwriting recognition model for recognition.
  • the recognition probability of the single font image is greater than a preset probability, the recognition result corresponding to the single font image is obtained.
  • the target handwriting recognition model is a pre-trained model for identifying handwriting.
  • the preset probability refers to a preset value for judging whether the recognition probability meets a requirement.
  • the recognition result refers to an output with a recognition probability greater than a preset probability.
  • a single font image is input into a target handwriting recognition model, and a recognition probability corresponding to each single font image is obtained.
  • the recognition probability refers to a probability that the single font image may be a specific Chinese character.
  • the recognition probability is compared with a preset probability. If the recognition probability is greater than the preset probability, the corresponding recognition result is obtained, which is helpful to improve the accuracy of the recognition result.
  • a single font image corresponding to "Sea” is input into the target handwriting recognition model to obtain a recognition result with a recognition probability greater than the preset probability.
  • the recognition result may be " ⁇ ” or " ⁇ ” ", That is, the recognition probability of the single font image corresponding to" sea “as” ⁇ “or” ⁇ "is greater than 85%, so two recognition results” ⁇ “or” ⁇ "may be output.
  • S60 Query the semantic database based on the recognition result to obtain the target Chinese character corresponding to the single font image.
  • the semantic database is a preset knowledge base for performing semantic analysis on the recognition results. Semantic analysis is an analysis of the context-dependent nature of the recognition results.
  • the semantic library is composed of a large number of Chinese sentences.
  • the target Chinese character is the Chinese character corresponding to the single-font image that matches the semantics after querying the semantic database.
  • the target Chinese character needs to be further determined according to the semantic database, such as the recognition result corresponding to the four single-font images of "sea”, “dead”, “stone”, and “rotten” as “ ⁇ ” or ““ “ “Sea”, “dead”, “stone”, and “rotten” or “column”.
  • the semantic database needs to be queried.
  • the Chinese sentence judgments included are more accurate recognition results.
  • the handwriting training samples are training samples used for training of other models.
  • the target Chinese character is associated with a single font image and stored as a handwriting training sample in the database, so that other models can directly call the handwriting training sample in the database for training, improving the training of the model. effectiveness.
  • the original image is pre-processed to obtain a valid image, and the effective image is processed by using a kernel density estimation algorithm and an erosion method to remove the background image portion, leaving only the
  • the handwritten target image provides a data source for subsequent single font cutting.
  • the vertical projection method is used to cut the single font of the target image to obtain the single font image.
  • the obtained single font image is input to the target handwriting recognition model for recognition, and the recognition result is obtained based on the recognition probability value corresponding to the single font image.
  • the semantic database based on the recognition results, obtain the target Chinese characters corresponding to the single font image according to the Chinese sentences stored in the semantic database, associate the acquired target Chinese characters with the single font image as training samples and store them in the database, which is convenient for subsequent model training Call the handwriting training samples in the database for training to improve the efficiency of model training.
  • step S20 the original image is pre-processed to obtain a valid image, which specifically includes the following steps:
  • the size of the handwriting itself is relatively small compared to the background image.
  • the handwriting is easily mishandled. Therefore, in order to ensure that the handwriting is no longer grayscale It is mistakenly cleared during the processing of the original image.
  • Each pixel corresponding to the original image needs to be enlarged.
  • the size of the nth pixel in the original image is x n , and the pixels in the original image are enlarged by power to make x n change. for In this embodiment, enlarging the pixels in the original image can effectively avoid handwriting being mistakenly processed when the original image is grayed out.
  • the original image is enlarged, if the original image is not a grayscale image but a color image, it is necessary to perform grayscale processing on the original image to obtain a grayscale image. Understandably, if the original image is a grayscale image, no grayscale processing is required.
  • Graying the original image as a color image effectively reduces the amount of data and computational complexity required to obtain valid images in subsequent steps.
  • S22 Perform range normalization processing on the pixel matrix corresponding to the grayscale image to obtain a valid image, where the range normalization formula is: x is the pixel of the effective image before normalization, x ′ is the pixel of the effective image after normalization, M min is the smallest pixel in the pixel matrix M corresponding to the grayscale image, and M max is the largest pixel in the pixel matrix M corresponding to the grayscale image .
  • the range standardization processing is a processing method for processing data to make the data compressed in the range of (0, 1). Standardizing the spread of the pixel matrix corresponding to the grayscale image and multiplying it by 255 can facilitate the processing of the data in the pixel matrix, while retaining the relationship between the pixels in the pixel matrix.
  • the background image and each handwriting have their own corresponding pixel matrix. After obtaining the background image in the grayscale image and the pixel matrix corresponding to each handwriting, the pixel matrix is subjected to a range normalization process to obtain an effective image corresponding to the pixel matrix after the range normalization process. Performing the range normalization processing on the pixel matrix can improve the processing speed of obtaining the target image.
  • step S30 uses a kernel density estimation algorithm and an erosion method to process the effective image, removes the background image, and obtains a target image including handwriting, which specifically includes the following steps:
  • S31 Count the number of occurrences of pixels in the effective image, and obtain a frequency distribution histogram corresponding to the effective image.
  • the horizontal axis of the frequency distribution histogram represents continuous values of the sample data, and each cell on the horizontal axis corresponds to the group distance of a group as the bottom edge of the small rectangle; the vertical axis represents the ratio of the frequency to the group distance, and uses this
  • the ratio is the height of a small rectangle, and a group of graphs composed of multiple small rectangles is called a frequency histogram.
  • the horizontal axis of the frequency histogram indicates that the pixels are continuous values between (0, 255), the group distance corresponding to each small rectangle on the horizontal axis is 1, and the vertical axis indicates the corresponding value of the small rectangle.
  • the ratio is the height of the corresponding small rectangle.
  • the frequency distribution histogram can vividly display the number of occurrences of pixels in the effective image, so that the distribution of the data can be reflected at a glance.
  • S32 The Gaussian kernel density estimation method is used to process the frequency distribution histogram to obtain the frequency maximum and frequency minimum corresponding to the frequency distribution histogram, and obtain corresponding pixels according to the frequency maximum and frequency minimum.
  • Gaussian kernel density estimation method refers to a kernel density estimation method whose kernel function is a Gaussian kernel.
  • the function corresponding to the Gaussian kernel is Among them, K (x) refers to a Gaussian kernel function in which pixels (independent variables) are x, x refers to pixels, and e and ⁇ are constants.
  • the frequency maximum value refers to the frequency value whose frequency value is the maximum value in the frequency distribution histogram; the frequency minimum value refers to the frequency value whose frequency value is the minimum value in the frequency distribution histogram.
  • a Gaussian kernel density function estimation method is used to perform Gaussian smoothing on the frequency distribution histogram corresponding to the obtained effective image, and obtain a Gaussian smooth curve corresponding to the frequency distribution histogram. Based on the frequency maximum and the frequency minimum on the Gaussian smooth curve, pixels corresponding to the frequency maximum and the frequency minimum on the horizontal axis are obtained. In this embodiment, pixels corresponding to the maximum frequency value and the minimum frequency value are acquired, which facilitates subsequent hierarchical differentiation of valid images and acquires a hierarchical image.
  • S33 Perform layer processing on the effective image based on the pixels corresponding to the maximum frequency and the minimum frequency to obtain a layered image.
  • Layered image refers to the image obtained by layering the effective image based on the frequency maximum and frequency minimum. Obtain the pixels corresponding to the frequency maximum and frequency minimum, and layer the effective image according to the pixels corresponding to the frequency maximum. How many frequency maximums are in the effective image, the pixels of the corresponding effective image are aggregated. As many classes as there are, the effective image is divided into several layers. Then the pixels corresponding to the minimum frequency are used as the boundary values between the classes. According to the boundaries between the classes, the pixels corresponding to each layer of the layered image can be obtained.
  • the pixels corresponding to the maximum frequency in the effective image are 12, 54, 97, 113, 159, and 172, and the pixels corresponding to the minimum frequency are 26, 69, 104, 139, and 163.
  • the number of frequency maxima can determine that the pixels of the effective image can be divided into 6 categories, and the effective image can be divided into 6 layers.
  • the pixels corresponding to the minimum frequency are used as the boundary value between the classes. Is 0, and the largest pixel is 255.
  • a layered image with a pixel of 12 can be determined, and the layered image corresponds to a pixel range of [0,26); with a pixel of 54, A layered image with a corresponding pixel range of [26,69); a layered image with 97 pixels and a corresponding pixel range of [69,104); a layered image with 113 pixels.
  • the layered image corresponds to a pixel range of [104,139); a layered image with a pixel of 159 corresponds to a pixel range of [139,163); a layered image with a pixel of 172 corresponds to a layered image
  • the pixel range is [163,255].
  • the layered image is binarized.
  • the binarization process refers to a process in which pixels on an image are set to 0 (black) or 1 (white), and the entire image presents a clear black and white effect.
  • the binarized layered image is etched to remove the background image portion and retain the handwritten portion on the layered image.
  • the erosion process is an operation for removing the content of a part of an image in morphology. Because the pixels on each layered image are pixels belonging to different ranges, after the layered image is etched, each layered image also needs to be superimposed to generate a target image containing only handwriting.
  • the original image is enlarged to prevent mishandling due to the small size of the handwriting itself, and then grayscale processing is performed to obtain grayscale.
  • Images can reduce the amount of data that needs to be processed in subsequent steps to obtain a valid image.
  • the pixel matrix corresponding to the grayscale image is subjected to extreme difference normalization processing, and the range of the pixel matrix is compressed to improve the processing speed of obtaining an effective image.
  • a corresponding frequency distribution histogram is obtained according to the effective image, and pixels corresponding to the maximum frequency value and the minimum frequency value are obtained according to the frequency distribution histogram, thereby obtaining a layered image.
  • the layered image is binarized, eroded, and superimposed to complete the recognition of the handwriting and background image in the original image.
  • the background image is removed to obtain the target image including handwriting.
  • step S34 the layered image is etched and superimposed, and the background image is removed to obtain a target image including handwriting, which specifically includes the following steps:
  • the layered binarized image refers to an image obtained by binarizing the layered image. Specifically, after obtaining the layered image, comparing the sampled pixels of the layered image with a pre-selected threshold, and setting the pixels whose sampling is greater than or equal to the threshold to 1, and the pixels less than the threshold to 0.
  • 0 represents a background pixel
  • 1 represents a target pixel (handwriting pixel).
  • This threshold can be obtained by calculating the inter-class variance of the layered image, or it can be obtained based on empirical values.
  • the size of the threshold will affect the effect of binarizing the layered image. If the threshold is selected properly, the effect of binarizing the layered image is better. Accordingly, if the threshold is not selected properly, the layered image will be affected. The effect of binarization.
  • the threshold in this embodiment is determined based on empirical values.
  • S342 Detect pixels in the layered binary image to obtain a connected area corresponding to the layered binary image.
  • the connected area refers to an area surrounded by adjacent pixels around a specific pixel. If a certain pixel is 0 and its neighboring pixels are 1, the area surrounded by the neighboring pixels is regarded as the connected area.
  • the pixel matrix corresponding to the layered binarized image is scanned progressively, and the pixel directions that meet the connectivity rule (4 neighborhood connectivity or 8 neighborhood connectivity) are scanned. Identical numbers are marked.
  • 4 neighborhood connectivity refers to the situation where a specific pixel is the same as the pixels adjacent in the four directions of up, down, left, and right;
  • 8 neighborhood connectivity refers to a specific pixel up, down, left, right, upper left, lower left, upper right, and right The case where the adjacent pixels in the next eight directions are the same.
  • the pixel matrix includes rows and columns.
  • the specific process of detecting and labeling the pixels in the binarized image is: (1) Scan the pixel matrix line by line, and form a sequence of pixels (target pixels) that are consecutively 1 in each line. This sequence is called a cluster, and it is labeled well. The start, end, and line number of the group. The starting point of the group refers to the first pixel of the group, and the ending point of the group refers to the last pixel of the group. (2) For the clusters in the remaining rows except the first row in the pixel matrix, compare whether the clusters in a specific residual row and all clusters in the previous row have coincident regions.
  • the associated group refers to the group on the previous line that has a coincident area with the group of the specific remaining line; the equivalent pair refers to the label on the group that is connected to each other.
  • the specific residual in a pixel matrix is the third row.
  • the cluster A and the two clusters in the second row are labeled 1
  • the smallest number 1 of the two groups in the second row is assigned to the A group, the number of the A group is 1, and the corresponding numbers of the A group, the 1 group, and the 2 group are recorded as Price pairs, that is, (1, 2) will be recorded as equivalent pairs.
  • the clique labeled 1 and 2 are called a connected region.
  • S343 Eroding and superimposing the connected areas corresponding to the layered binary image, removing the background image, and obtaining a target image including handwriting.
  • the imerode function in MATLAB or the cvErode function in OpenCV is used to etch the connected regions of the layered binary image. Specifically, one structural pixel is selected. In this embodiment, the eight pixel values adjacent to a characteristic pixel in the pixel matrix are used as the connected area of the characteristic pixel. Therefore, the selected structural pixel is a 3 ⁇ 3 pixel matrix. Use the structured pixels to scan the pixel matrix of the layered binary image, and compare whether the pixel matrix in the layered binary image is completely consistent with the structured pixels.
  • the corresponding 9 pixels in the pixel matrix are all Becomes 1; if they are not completely consistent, the corresponding 9 pixels in the pixel matrix will all become 0, where 0 (black) is the corroded part of the layered binary image.
  • the layered binarized image is filtered based on the preset corrosion resistance range of the handwritten area, and the layered binary image that is not within the range of the corrosion resistance of the handwritten area is partially deleted to obtain The area within the corrosion resistance of the handwriting area.
  • the anti-corrosion ability of the hand-written area can adopt the formula: Calculated, s 1 represents the total area after being eroded in the layered binary image, and s 2 represents the total area before being eroded in the layered binary image.
  • the preset anti-corrosion range of the handwriting area is [0.05,0.8], according to the formula Calculate the ratio of the total area of each layered binary image after being etched to the total area of the layered binary image before being etched.
  • the ratio of the total area after corrosion to the total area before corrosion in a layered binary image is not within the preset corrosion resistance range of the handwritten area, indicating that the layered binary image of the area is Handwriting needs to be kept.
  • the ratio of the total area after erosion to the total area before erosion in the layered binarized image is in the range of [0.05,0.8], which means that the layered binarized image in the area is handwritten and needs to be retained.
  • Area the structured pixels are used to detect the connected area of each pixel, the pixels in the pixel matrix that are not completely consistent with the structured pixels are all 0, the layered binary image with the pixels 0 is black, and the black part is It is the corroded part of the layered binary image.
  • By calculating the ratio of the total area of the layered binary image after being corroded and the total area of the layered binary image before being corroded it is determined whether the ratio is in a preset setting.
  • the range of anti-corrosion capability of the handwriting area The background image is removed and the handwriting is retained to achieve the purpose of obtaining the target image.
  • the target handwriting self-recognition model needs to be trained in advance.
  • the method for obtaining handwriting training samples further includes: training the target Handwriting recognition model.
  • the training target handwriting recognition model includes the following steps:
  • the convolutional neural network first needs to be Initially set the weights and offsets in the model, that is, set the initial values for the weights and offsets between the input layer and the hidden layer in the convolutional neural network, and set the weights and offsets between the hidden layer and the output layer. Offset setting initial value.
  • Initializing the weights and offsets of the convolutional neural network model is a necessary step for model training.
  • Reasonably initializing the weights and offsets of the convolutional neural network model is conducive to improving the speed of model training.
  • S72 Obtain font image training samples, label the font image training samples with a Chinese secondary font, and divide the font image training samples into training set images and test set images according to a preset allocation rule.
  • the font image training sample refers to a training sample formed in advance by carrying a handwritten image.
  • the font image training sample is an image collected by handwriting written by different people in advance and uploaded to the server as a training sample by taking a picture.
  • the Chinese secondary character library is used to label each handwriting in the font image training samples. For example, if you get 400 samples of font image training samples written by different people, each font image training sample is written with the words "bear hunger and starvation”. "" Hungry "" End “” Hungry "to label, so that each handwriting in each font image training sample has a corresponding label.
  • Standard fonts refer to different fonts collected in the secondary Chinese font library, such as Song style, Kai style, Microsoft Yahei, or imitation Song.
  • the font image training sample is divided into a training set image and a test set image according to a preset allocation rule.
  • the preset allocation rule refers to a preset rule for assigning font image training samples
  • the training set image refers to a single font image used for training a convolutional neural network model
  • the test set image refers to a trained image.
  • a single font image was tested with a convolutional neural network model.
  • the preset assignment rule is to use 80% of the font image training samples as training set images for training the convolutional neural network model, and 20% as test set images for testing the trained convolutional neural network model.
  • Annotating the font image training samples is convenient for comparing the training results with the output results when the model outputs training results to construct a loss function. Dividing font image training samples into training set images and test set images can avoid overfitting when using the training set images to verify the model and improve the accuracy of the model.
  • the initial handwriting recognition model refers to a convolutional neural network model trained on the training set images to recognize handwriting.
  • the convolutional neural network model includes multiple layers of convolutional layers and pooling layers.
  • the server After obtaining the training set image, the server inputs the training set image into the convolutional neural network model for training. Through the calculation of each layer of the convolutional layer, the output of the convolutional layer of each layer is obtained, and then used in the convolutional layer. Maximum pooling downsampling performs dimension reduction processing on the output of the convolutional layer.
  • the output of the -1 layer convolution layer (that is, the output of the previous layer).
  • Pool refers to the downsampling calculation.
  • the downsampling calculation can choose the maximum pooling method.
  • the maximum pooling is actually taking the largest of the n * n samples. value.
  • T represents the output of the output layer
  • represents the activation function of the output layer, which is generally a softmax function.
  • a loss function is constructed based on the forward output of the convolutional neural network model and the labels carried by handwriting. The weight function and bias in the convolutional neural network model are updated by the loss function to obtain the initial handwriting recognition model.
  • S74 Obtain a recognition accuracy rate corresponding to the initial handwriting recognition model based on the test set images. If the recognition accuracy rate is greater than a preset accuracy rate, obtain a target handwriting recognition model.
  • the preset accuracy value is a value preset to determine whether the accuracy of the handwriting recognition of the initial handwriting recognition model meets the requirements. After the initial handwriting recognition model is obtained, in order to verify the accuracy of the handwriting recognition of the initial handwriting recognition model, it needs to be verified through the test set images.
  • the specific verification process is: input a single font test set image into the initial handwriting recognition model for recognition, and obtain the recognition accuracy rate of the initial handwriting recognition model. If the recognition accuracy rate is large and the preset accuracy rate, it indicates the initial handwriting The accuracy of the word recognition model meets the requirements, and the initial handwriting recognition model can be determined as the target handwriting recognition model.
  • the target handwriting recognition model can be directly used to recognize handwriting.
  • Steps S71-S74 by dividing the font image training sample into a training set image and a test set image according to a preset distribution rule, inputting the training set image to a convolutional neural network model for training, and adjusting the weight in the convolutional neural network model And offset to obtain the initial handwriting recognition model for identifying single-font images.
  • the test set images are then input to the initial handwriting recognition model for recognition, and it is determined whether the recognition accuracy of the initial handwriting recognition model meets the requirements. If the requirements are met, it indicates that the initial handwriting recognition model has been trained and can be used to recognize handwriting.
  • the initial handwriting recognition model can be determined as the target handwriting recognition model. Using the target handwriting recognition model to recognize handwriting can effectively improve the recognition accuracy.
  • the training set image refers to the image containing handwriting used to obtain the initial handwriting recognition model.
  • the test set image refers to the image containing handwriting used to obtain the target handwriting recognition model.
  • the single-font image refers to an image containing handwriting when using a target handwriting recognition model for recognition.
  • step S73 based on the training set image, adjusts the weights and offsets in the convolutional neural network model to obtain the initial handwriting recognition model, which specifically includes the following steps:
  • S731 Input the training set image into the convolutional neural network model, and obtain the forward output of the convolutional neural network model.
  • S732 Construct a loss function according to the forward output of the convolutional neural network model, and calculate the partial derivative of the loss function. Reversely update the weights and offsets in the convolutional neural network model to obtain the initial handwriting recognition model.
  • the formula for partial derivatives of weights in a convolutional neural network model is The formula for the partial derivative of the bias in the convolutional neural network model is
  • a loss function is constructed based on the forward output of the convolutional neural network model and the labels carried by the test image of a single font.
  • the loss function is specifically expressed as Among them, J ( ⁇ ) is the loss function, n is the number of training samples, x i is the value of the i-th training set image input to the convolutional neural network model, and h ⁇ is the weight and bias of the convolutional neural network model.
  • h ⁇ x i represents the forward output of the convolutional neural network model processed by the convolutional neural network model of the i-th training set image
  • y i represents the first corresponding to x i
  • the labels of i training samples, ⁇ represents the set of weights and biases (w, b).
  • the partial derivative of the loss function is obtained, and the weights and offsets in the convolutional neural network model are reversely updated.
  • the specific steps include the following steps: Based on the loss function, the weights and offsets in the convolutional neural network model are respectively biased. And update the weights and biases of the convolutional neural network model. Specifically, according to Derive partial derivatives for the bias in the convolutional neural network model. Partial derivative of weights in a convolutional neural network model.
  • Steps S731-S732 construct a loss function from the forward output of the convolutional neural network model, then find the partial derivative by the loss function, update the weights and offsets in the convolutional neural network model in reverse, obtain the initial handwriting recognition model, and complete Training process.
  • This method obtains a grayscale image by enlarging and graying the original image, and then performs standardization of the price difference on the grayscale image to obtain a valid image. It is convenient for the subsequent steps to use the kernel density estimation algorithm and the erosion method to process the effective image, remove the background image part, and retain the target image containing only handwriting.
  • the vertical projection method is used to cut the single font of the target image to obtain the single font image.
  • the obtained single font image is input to the target handwriting recognition model for recognition, and the recognition result is obtained based on the recognition probability value corresponding to the single font image.
  • the semantic database based on the recognition results, obtain the target Chinese characters corresponding to the single font image according to the Chinese sentences stored in the semantic database, associate the acquired target Chinese characters with the single font image as training samples and store them in the database, which is convenient for subsequent model training Call the handwriting training samples in the database for training to improve the efficiency of model training.
  • a handwriting training sample acquisition device corresponds to the handwriting training sample acquisition method in the above embodiment in a one-to-one correspondence.
  • the handwriting training sample acquisition device includes an original image acquisition module 10, an effective image acquisition module 20, a target image acquisition module 30, a single-font image acquisition module 40, a recognition result acquisition module 50, and a target Chinese character confirmation module 60.
  • handwriting training sample acquisition module 70 The detailed description of each function module is as follows:
  • the original image obtaining module 10 is configured to obtain an original image, and the original image includes a handwriting and a background image.
  • the effective image acquisition module 20 is configured to pre-process the original image to obtain a valid image.
  • a target image acquisition module 30 is configured to process a valid image by using a kernel density estimation algorithm and an erosion method, remove a background image, and obtain a target image including handwriting.
  • the single-font image acquisition module 40 is configured to perform single-font cutting on a target image by using a vertical projection method to obtain a single-font image.
  • the recognition result acquisition module 50 is configured to input a single font image into a target handwriting recognition model for recognition. When the recognition probability of the single font image is greater than a preset probability, obtain a recognition result corresponding to the single font image.
  • the target Chinese character confirmation module 60 is configured to query a semantic library based on the recognition result to obtain a target Chinese character corresponding to a single font image.
  • a handwriting training sample acquisition module 70 is configured to associate a single font image with a corresponding target Chinese character to obtain a handwriting training sample.
  • the effective image acquisition module 20 includes a grayscale image acquisition unit 21 and a range normalization processing unit 22.
  • the gray image acquisition unit 21 is configured to perform enlargement and gray processing on the original image to obtain a gray image.
  • the range standardization processing unit 22 is configured to perform range standardization processing on a pixel matrix corresponding to a grayscale image to obtain a valid image, where the formula of the range standardization processing is x is the pixel of the effective image before normalization, x ′ is the pixel of the effective image after normalization, M min is the smallest pixel in the pixel matrix M corresponding to the grayscale image, and M max is the largest pixel in the pixel matrix M corresponding to the grayscale image .
  • the target image acquisition module 30 includes a first processing unit 31, a second processing unit 32, a layered image acquisition unit 33, and an erosion and superposition processing unit 34.
  • the first processing unit 31 is configured to count the number of occurrences of pixels in the effective image, and obtain a histogram of the frequency distribution corresponding to the effective image.
  • the second processing unit 32 is configured to process the frequency distribution histogram by using a Gaussian kernel density estimation method to obtain a frequency maximum value and a frequency minimum value corresponding to the frequency distribution histogram, and according to the frequency maximum value and the frequency minimum value Get the corresponding pixels.
  • a layered image acquisition unit 33 is configured to perform a layered processing on the effective image based on the pixels corresponding to the frequency maximum and the frequency minimum to obtain a layered image.
  • the erosion and superposition processing unit 34 is configured to perform an erosion and superposition process on the layered image, remove the background image, and obtain a target image including handwriting.
  • the erosion and superposition processing unit 34 includes a binarization processing unit 341, a connected area acquisition unit 342, and a connected area processing unit 343.
  • a binarization processing unit 341 is configured to perform binarization processing on the layered image to obtain a layered binarized image.
  • the connected region obtaining unit 342 is configured to detect and mark pixels in the layered binary image to obtain a connected region corresponding to the layered binary image.
  • the connected region processing unit 343 is configured to perform erosion and superposition processing on the connected regions corresponding to the layered binary image, remove the background image, and obtain a target image including handwriting.
  • the handwriting training sample acquisition device further includes a model initialization unit 71, a training sample acquisition and processing unit 72, an initial handwriting recognition model 73, and a target handwriting recognition model unit 74.
  • a model initialization unit 71 is configured to initialize weights and offsets of a convolutional neural network model.
  • the training sample acquisition and processing unit 72 is configured to obtain font image training samples, label the font image training samples with a secondary Chinese font library, and divide the font image training samples into a training set image and a test set image according to a preset allocation rule.
  • the initial handwriting recognition model 73 is used to adjust the weights and offsets in the convolutional neural network model based on the training set images to obtain the initial handwriting recognition model.
  • the target handwriting recognition model unit 74 is configured to obtain the recognition accuracy rate corresponding to the initial handwriting recognition model based on the test set images. If the recognition accuracy rate is greater than a preset accuracy rate, a target handwriting recognition model is obtained.
  • the initial handwriting recognition model 73 includes a forward output acquisition unit 731 and a weight and offset update unit 732.
  • a forward output acquisition unit 731 is configured to input the training set image into the convolutional neural network model, and obtain the forward output of the convolutional neural network model.
  • a l-1 represents the output of the l-1th convolution layer
  • represents the activation function
  • W l represents the weight of the l-th convolution layer
  • b l represents the offset of the l-th convolution layer
  • T represents the output of the output layer
  • ⁇ ′ represents the activation function of the output layer.
  • a weight and bias updating unit 732 is configured to construct a loss function according to the forward output of the convolutional neural network model, and obtain partial derivatives of the loss function, and reversely update the weight and bias in the convolutional neural network model to obtain Initial handwriting recognition model, where the formula for partial derivative of the weights in the convolutional neural network model is The formula for the partial derivative of the bias in the convolutional neural network model is
  • a computer device is provided.
  • the computer device may be a server, and the internal structure diagram may be as shown in FIG. 9.
  • the computer device includes a processor, a memory, a network interface, and a database connected through a system bus.
  • the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer-readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer-readable instructions in a non-volatile storage medium.
  • the database of the computer equipment is used to store the acquired handwriting training samples.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer-readable instructions are executed by a processor to implement a handwriting training sample acquisition method.
  • a computer device including a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor.
  • the processor executes the computer-readable instructions, the following steps are implemented: Image, original image includes handwriting and background image; preprocess the original image to obtain a valid image; use kernel density estimation algorithms and erosion methods to process the effective image, remove the background image, and obtain the target image including handwriting; use vertical
  • the projection method performs single-font cutting on the target image to obtain a single-font image.
  • the single-font image is input to the target handwriting recognition model for recognition.
  • the corresponding Recognition results query the semantic database based on the recognition results to obtain the target Chinese characters corresponding to the single font image; associate the single font images with the corresponding target Chinese characters to obtain handwriting training samples.
  • the processor executes the computer-readable instructions, the following steps are further implemented: the original image is enlarged and grayed out to obtain a grayscale image; the pixel matrix corresponding to the grayscale image is subjected to extreme standardization processing to obtain Valid image, where the formula for range normalization is x is the pixel of the effective image before normalization, x ′ is the pixel of the effective image after normalization, M min is the smallest pixel in the pixel matrix M corresponding to the grayscale image, and M max is the largest pixel in the pixel matrix M corresponding to the grayscale image .
  • the processor when the processor executes the computer-readable instructions, the processor further implements the following steps: counting the number of occurrences of pixels in the effective image, obtaining a frequency distribution histogram corresponding to the effective image; and adopting a Gaussian kernel density estimation method for frequency distribution.
  • the histogram is processed to obtain the frequency maximum and frequency minimum corresponding to the frequency distribution histogram, and to obtain corresponding pixels according to the frequency maximum and frequency minimum; based on the frequency maximum and frequency minimum corresponding
  • the pixels perform a layered process on the effective image to obtain a layered image; a layered image is subjected to erosion and superposition processing to remove the background image and obtain a target image including handwriting.
  • the processor when the processor executes the computer-readable instructions, the following steps are further implemented: performing a binarization process on the layered image to obtain a layered binarized image; and detecting and marking pixels in the layered binarized image. To obtain the connected area corresponding to the layered binary image; etch and overlay the connected area corresponding to the layered binary image to remove the background image and obtain a target image including handwriting.
  • the processor executes the computer-readable instructions, the following steps are further implemented: initializing the weights and offsets of the convolutional neural network model; obtaining font image training samples, and using a Chinese secondary font library to label the font image training samples , And divide the font image training samples into training set images and test set images according to preset allocation rules; based on the training set images, adjust the weights and offsets in the convolutional neural network model to obtain the initial handwriting recognition model; Based on the test set images, the recognition accuracy rate corresponding to the initial handwriting recognition model is obtained. If the recognition accuracy rate is greater than a preset accuracy rate, a target handwriting recognition model is obtained.
  • the processor executes the computer-readable instructions, the following steps are further implemented: inputting the training set image into the convolutional neural network model, obtaining the forward output of the convolutional neural network model,
  • Output, z l is the output before the activation function is processed, a l-1 is the output of the convolution layer of the l-1 layer, ⁇ is the activation function, W l is the weight of the l layer of the convolution layer, and b l Represents the offset of the first convolution layer, T represents the output of the output layer, and ⁇ ′ represents the activation function of the output layer; the loss function is constructed according to the forward output of the convolutional neural network model, and the partial derivative of the loss function is calculated.
  • one or more non-volatile storage media storing computer-readable instructions are provided, and when the computer-readable instructions are executed by one or more processors, the one or more processors implement The following steps: Obtain the original image, including the handwriting and the background image; preprocess the original image to obtain the effective image; use the kernel density estimation algorithm and the erosion method to process the effective image, remove the background image, and obtain handwriting
  • the target image is cut with a single font using the vertical projection method to obtain a single font image.
  • the single font image is input to the target handwriting recognition model for recognition.
  • the recognition probability of the single font image is greater than a preset probability, then Obtain the recognition result corresponding to the single font image; query the semantic library based on the recognition result to obtain the target Chinese character corresponding to the single font image; associate the single font image with the corresponding target Chinese character to obtain a handwriting training sample.
  • the one or more processors when the computer-readable instructions are executed by one or more processors, the one or more processors further implement the following steps: enlarging and graying the original image to obtain a grayscale image;
  • the pixel matrix corresponding to the degree image is subjected to range standardization processing to obtain a valid image, where the formula for range standardization processing is x is the pixel of the effective image before normalization, x ′ is the pixel of the effective image after normalization, M min is the smallest pixel in the pixel matrix M corresponding to the grayscale image, and M max is the largest pixel in the pixel matrix M corresponding to the grayscale image .
  • the one or more processors when the computer-readable instructions are executed by one or more processors, the one or more processors further implement the following steps: Counting the number of occurrences of pixels in the effective image, and obtaining the frequency corresponding to the effective image Distribution histogram; Gaussian kernel density estimation method is used to process the frequency distribution histogram to obtain the frequency maximum and frequency minimum corresponding to the frequency distribution histogram, and obtain corresponding pixels according to the frequency maximum and frequency minimum ; Layer the effective image based on the pixels corresponding to the frequency maximum and frequency minimum to obtain a layered image; etch and overlay the layered image to remove the background image and obtain the target image including handwriting.
  • the one or more processors when the computer-readable instructions are executed by one or more processors, the one or more processors further implement the following steps: performing a binarization process on the layered image to obtain a layered binary image; Detect and mark the pixels in the layered binary image to obtain the connected areas corresponding to the layered binary image; etch and overlay the connected areas corresponding to the layered binary image, remove the background image, and obtain handwriting The target image of the word.
  • the one or more processors when the computer-readable instructions are executed by one or more processors, the one or more processors further implement the following steps: initializing weights and biases of the convolutional neural network model; obtaining font image training samples , Using the Chinese secondary font to label the font image training samples, and divide the font image training samples into training set images and test set images according to preset allocation rules; based on the training set images, the weights in the convolutional neural network model Adjust the offset and obtain the initial handwriting recognition model. Based on the test set images, obtain the recognition accuracy rate corresponding to the initial handwriting recognition model. If the recognition accuracy rate is greater than the preset accuracy rate, obtain the target handwriting recognition model.
  • the one or more processors when the computer-readable instructions are executed by one or more processors, the one or more processors further implement the following steps: inputting a training set image into a convolutional neural network model to obtain a convolutional neural network
  • a l represents the output of the l-th convolution layer
  • z l represents the output before the activation function is processed
  • a l-1 represents the output of the l-1 convolution layer
  • represents the activation function
  • W l Represents the weight of the l-th convolution layer
  • b l represents the offset of the l-th convolution layer
  • T represents the output of the output layer
  • ⁇ ′ represents the activation function of the output layer
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Character Discrimination (AREA)

Abstract

本申请公开了手写字训练样本获取方法、装置、计算机设备及存储介质。该方法包括:获取原始图像,原始图像包括手写字和背景图像;对原始图像进行预处理,获取有效图像;采用核密度估计算法和腐蚀方法对有效图像进行处理,去除背景图像,获取包括手写字的目标图像;采用垂直投影方法对目标图像进行单字体切割,获取单字体图像;将单字体图像输入到目标手写字识别模型中进行识别,当单字体图像的识别概率大于预设概率时,则获取单字体图像对应的识别结果;基于识别结果查询语义库,获取单字体图像对应的目标汉字;将单字体图像和对应的目标汉字关联,获取手写字训练样本。该方法可以获取手写字训练样本过程简单方便,提高模型训练的效率。

Description

手写字训练样本获取方法、装置、计算机设备及存储介质
本申请以2018年6月4日提交的申请号为201810564731.7,名称为“手写字训练样本获取方法、装置、计算机设备及存储介质”的中国发明专利申请为基础,并要求其优先权。
技术领域
本申请涉及手写字识别领域,尤其涉及一种手写字训练样本获取方法、装置、计算机设备及存储介质。
背景技术
在手写字识别过程中,通常需要采用预先训练好的手写字识别模型进行识别,以获取识别结果。当前手写字识别模型通常需要采用人工手写的训练样本训练该手写字识别模型。这种人工手写的训练样本需人工书写并进行人工标注,每个人的书写习惯不相同,在手写字数量庞大的情况下,采用人工标注训练样本效率低,并且数量有限,影响手写字识别模型的训练效率和准确性。
发明内容
基于此,有必要针对上述技术问题,提供一种方便后续模型训练时直接调用手写字训练样本,提高模型训练的效率和准确性的手写字训练样本获取方法、装置、计算机设备及存储介质。
一种手写字训练样本获取方法,包括:
获取原始图像,所述原始图像包括手写字和背景图像;
对所述原始图像进行预处理,获取有效图像;
采用核密度估计算法和腐蚀方法对所述有效图像进行处理,去除背景图像,获取包括所述手写字的目标图像;
采用垂直投影方法对所述目标图像进行单字体切割,获取单字体图像;
将所述单字体图像输入到目标手写字识别模型中进行识别,当所述单字体图像的识别概率大于预设概率时,则获取所述单字体图像对应的识别结果;
基于所述识别结果查询语义库,获取所述单字体图像对应的目标汉字;
将所述单字体图像和对应的目标汉字关联,获取手写字训练样本。
一种手写字训练样本获取装置,包括:
原始图像获取模块,用于获取原始图像,所述原始图像包括手写字和背景图像;
有效图像获取模块,用于对所述原始图像进行预处理,获取有效图像;
目标图像获取模块,用于采用核密度估计算法和腐蚀方法对所述有效图像进行处理,去除背景图像,获取包括所述手写字的目标图像;
单字体图像获取模块,用于采用垂直投影方法对所述目标图像进行单字体切割,获取单字体图像;
识别结果获取模块,用于将所述单字体图像输入到目标手写字识别模型中进行识别,当所述单字体图像的识别概率大于预设概率时,则获取所述单字体图像对应的识别结果;
目标汉字确认模块,用于基于所述识别结果查询语义库,获取对应的目标汉字;
手写字训练样本获取模块,用于将所述单字体图像和对应的目标汉字关联,获取手写字训练样本。
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
获取原始图像,所述原始图像包括手写字和背景图像;
对所述原始图像进行预处理,获取有效图像;
采用核密度估计算法和腐蚀方法对所述有效图像进行处理,去除背景图像,获取包括所述手写字的目标图像;
采用垂直投影方法对所述目标图像进行单字体切割,获取单字体图像;
将所述单字体图像输入到目标手写字识别模型中进行识别,当所述单字体图像的识别概率大于预设概率时,则获取所述单字体图像对应的识别结果;
基于所述识别结果查询语义库,获取所述单字体图像对应的目标汉字;
将所述单字体图像和对应的目标汉字关联,获取手写字训练样本。
一个或多个存储有计算机可读指令的非易失性可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器实现如下步骤:
获取原始图像,所述原始图像包括手写字和背景图像;
对所述原始图像进行预处理,获取有效图像;
采用核密度估计算法和腐蚀方法对所述有效图像进行处理,去除背景图像,获取包括所述手写字的目标图像;
采用垂直投影方法对所述目标图像进行单字体切割,获取单字体图像;
将所述单字体图像输入到目标手写字识别模型中进行识别,当所述单字体图像的识别概率大于预设概率时,则获取所述单字体图像对应的识别结果;
基于所述识别结果查询语义库,获取所述单字体图像对应的目标汉字;
将所述单字体图像和对应的目标汉字关联,获取手写字训练样本。
本申请的一个或多个实施例的细节在下面的附图及描述中提出。本申请的其他特征和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请一实施例中手写字训练样本获取方法的一应用场景图;
图2是本申请一实施例中手写字训练样本获取方法的一流程图;
图3是图2中步骤S20的一具体流程图;
图4是图2中步骤S30的一具体流程图;
图5是图4中步骤S34的一具体流程图;
图6是本申请一实施例中手写字训练样本获取方法的另一流程图;
图7是图6中步骤S73的一具体流程图;
图8是本申请一实施例中手写字训练样本获取装置的一示意图;
图9是本申请一实施例中计算机设备的一示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请实施例提供的手写字训练样本获取方法,可应用在如图1的应用环境中。该手写字训练样本获取方法的应用环境包括服务器和客户端,其中,客户端通过网络与服务器进行通信,客户端是可与用户进行人机交互的设备,包括但不限于电脑、智能手机和平板等设备。本申请实施例提供的手写字训练样本获取方法应用于服务器。
在一实施例中,如图2所示,提供一种手写字训练样本获取方法,该手写字训练样本获取方法包括如下步骤:
S10:获取原始图像,原始图像包括手写字和背景图像。
其中,原始图像指没有经过任何处理的特定图像,该特定图像是指需要包括手写字的图像。本实施例中的原始图像包括手写字和背景图像。其中,背景图像是指原始图像上的背景图案对应的图像。该原始图像的获取方式包括但不限于从网页上爬取或者通过访问与服务器相连的数据库上获取,该数据库上的原始图像可以是终端设备预先上传的图像。
S20:对原始图像进行预处理,获取有效图像。
其中,有效图像指原始图像经过预处理后的图像。服务器获取有效图像的具体步骤为:(1)判 断原始图像是否为彩色图像,若原始图像为彩色图像,则对原始图像进行灰度化处理,获取灰度图像,使得彩色图像中每个像素对应的三个分量R(红色)、G(绿色)和B(蓝色)可以用一个值替代,有助于简化后续进行极差标准化处理的复杂度。可以理解地,若原始图像不为彩色图像,则原始图像为灰度图像,无需再进行灰度化处理。(2)对灰度图像对应的像素矩阵进行极差标准化处理,获取有效图像。对灰度图像对应的像素矩阵进行极差标准化处理可以在保留像素矩阵中相对关系,同时又可以提高计算速度。
S30:采用核密度估计算法和腐蚀方法对有效图像进行处理,去除背景图像,获取包括手写字的目标图像。
核密度估计算法是一种从数据样本本身出发研究数据分布特征,用于估计概率密度函数的非参数方法。核密度估计算法的具体公式为
Figure PCTCN2018094345-appb-000001
表示像素的估计概率密度,K(.)为核函数,h为像素范围,x为要估计概率密度的像素,x i为h范围内的第i个像素,n为h范围内像素为x的个数。腐蚀方法指对图像进行腐蚀处理的方法,其中,腐蚀指去除图像的不需要的部分,仅保留需要的部分。
本实施例中,采用核密度估计算法的公式对有效图像对应的频率分布直方图进行处理,获取频率分布直方图对应的平滑曲线,根据平滑曲线上的极小值和极大值,获取极小值和极大值对应的像素,然后根据极大值和极小值对应的像素对有效图像进行分层处理,在分层处理后,对分层处理后的图像进行腐蚀处理,去除背景图像,保留手写字部分。最后将经过分层和腐蚀处理的图像进行叠加处理,获取目标图像。其中,叠加处理指将分层后的仅保留有手写字部分的图像叠加成一个图像的处理过程,从而实现获取只包含手写字的目标图像的目的。
S40:采用垂直投影方法对目标图像进行单字体切割,获取单字体图像。
其中,垂直投影方法是指将每一行手写字进行垂直方向的投影,获取垂直投影直方图的方法。垂直投影直方图是指反映目标图像在垂直方向上的像素数量的图,垂直投影直方图的横坐标轴表示目标图像的宽度,纵坐标表示目标图像的像素数量分布情况。
具体地,逐行扫描目标图像中的每一行手写字并获取每一行手写字对应的像素的数量,基于像素和像素的数量形成垂直投影直方图,再根据该垂直投影直方图,按照预先设置的切割阈值对目标图像进行切割,获取单字体图像。单字体图像指单个手写字对应的图像。其中,切割阈值指预先设置好的用于切割目标图像中的手写字,获取单字体的值。当扫描到目标图像对应的垂直投影直方图中的纵坐标上的像素数量小于等于阈值时,则表示对应的横坐标的位置是两个相邻手写字之间的分隔点,在该分隔点对目标图像进行单字体切割。如预先设置的切割阈值为10,当扫描到目标图像对应的垂直投影直方图中像素数量为小于等于10时(0、9和10),则该像素数量值(0、9和10)对应的横坐标所在的位置是两个相邻手写字之间的分割点,在该分割点对目标图像进行单字体切割,获取该目标图像对应的单字体图像。可以理解地,每一个手写字对应的像素是比较集中的,汉字与汉字之间的间隙对应的像素是比较稀疏的,像素的密集程度反应在对应的垂直投影直方图中,则为有汉字的像素对应的像素数量比较高,没有汉字的像素对应的像素数量比较低,通过垂直投影方法能够有效对目标图像进行单字体切割,获取单字体图像,为后续进行模型识别提供技术支持。
S50:将单字体图像输入到目标手写字识别模型中进行识别,当单字体图像的识别概率大于预设概率时,则获取单字体图像对应的识别结果。
其中,目标手写字识别模型是预先训练好的用于识别手写字的模型。预设概率指预先设置的用于判断识别概率是否满足要求的值。识别结果指识别概率大于预设概率的输出。具体地,将单字体图像输入到目标手写字识别模型中,获取每一单字体图像对应的识别概率,该识别概率是指该单字体图像可能为某一具体汉字的概率。将识别概率和预设概率进行比较,若识别概率大于预设概率,则获取对应的识别结果,有助于提高识别结果的准确性。
如预设概率为85%,将“海”对应的单字体图像输入到目标手写字识别模型中,获取识别概率大于预设概率对应的识别结果,该识别结果可能可能为“诲”或“海”,即“海”对应的单字体图像识别为“诲”或“海”的识别概率均大于85%,因此可能输出两个识别结果“诲”或“海”。
S60:基于识别结果查询语义库,获取单字体图像对应的目标汉字。
其中,语义库是预先设置的用于对识别结果进行语义分析的知识库。语义分析是对识别结果进行上下文有关性质的分析。语义库是由大量的中文句子组成。目标汉字是查询语义库后符合语义的单字体图像所对应的汉字。
具体地,在获取识别结果后,还需要根据语义库进一步确定目标汉字,如“海”“枯”、“石”以及“烂”这四个单字体图像对应的识别结果为“诲”或“海”、“枯”、“石”以及“烂”或“栏”,为了进一步确定存在两个或两个以上识别结果对应的单字体图像的目标汉字,因此需查询语义库,根据语义库中收录的中文句子判断更加准确的识别结果。通过查询语义库“海枯石烂”符合语义,则确定每一单字体图像对应的目标汉字为“海”“枯”“石”“烂”,根据语义库确定目标汉字,可以提高对单字体图像识别的准确率。
S70:将单字体图像和对应的目标汉字关联,获取手写字训练样本。
其中,手写字训练样本是用于进行其他模型训练的训练样本。具体地,步骤S60获取的目标汉字后,将目标汉字与单字体图像关联,并作为手写字训练样本存储在数据库中,以便其他模型直接调用数据库中的手写字训练样本进行训练,提高模型训练的效率。
本申请实施例所提供的手写字训练样本获取方法中,通过对原始图像进行预处理,获取有效图像,并采用核密度估计算法和腐蚀方法对有效图像进行处理,去除背景图像部分,保留仅含有手写字的目标图像,为后续进行单字体切割提供数据来源。采用垂直投影方法对目标图像进行单字体切割,获取单字体图像,将获取的单字体图像输入到目标手写字识别模型中识别,基于单字体图像对应的识别概率值,获取识别结果。基于识别结果查询语义库,根据语义库中存储的中文句子获取单字体图像对应的目标汉字,将获取的目标汉字和单字体图像关联起来作为训练样本并存储在数据库中,方便后续模型训练时直接调用数据库中的手写字训练样本进行训练,提高模型训练的效率。
在一实施例中,如图3所示,步骤S20,对原始图像进行预处理,获取有效图像,具体包括如下步骤:
S21:对原始图像进行放大和灰度化处理,获取灰度图像。
由于在原始图像中,手写字本身的尺寸相对于背景图像而言较小,在对原始图像进行灰度化处理时,手写字容易被误处理掉,因此,为了保证手写字不会再灰度化处理时被误清除,需要对原始图像对应的每个像素进行放大处理,如原始图像中第n个像素的大小为x n,对原始图像中的像素进行幂次放大处理,使得x n变为
Figure PCTCN2018094345-appb-000002
本实施例中,将原始图像中的像素进行放大处理,可以有效避免在对原始图像进行灰度化处理时,手写字被误处理掉。
在原始图像进行放大处理后,若原始图像不是灰度图像而是彩色图像时,则需要对原始图像进行灰度化处理,获取灰度图像。可以理解地,若原始图像为灰度图像,则不需要进行灰度化处理。当原始图像为彩色图像时,对原始图像进行灰度化处理的具体步骤为:采用公式Y=0.299R+0.587G+0.114B对原始图像中的每个像素进行处理,获取每个像素对应的采样像素,依据该采样像素形成灰度图像;其中,R(红色)、G(绿色)和B(蓝色)是原始图像中的三个分量,采样像素是灰度图像中用于替换彩色图像中R、G和B三个分量对应的像素。
对原始图像为彩色图像进行灰度化处理,有效减少了后续步骤获取有效图像时需要处理的数据量和计算的复杂度。
S22:对灰度图像对应的像素矩阵进行极差标准化处理,获取有效图像,其中,极差标准化处理的公式为
Figure PCTCN2018094345-appb-000003
x是标准化前有效图像的像素,x'是标准化后有效图像的像素, M min是灰度图像对应的像素矩阵M中最小的像素,M max是灰度图像对应的像素矩阵M中最大的像素。
其中,极差标准化处理是对数据进行处理,使数据压缩在(0,1)范围内的处理方法。对灰度图像对应的像素矩阵进行价差标准化处理并乘上255,可以方便对像素矩阵中的数据进行处理,同时保留像素矩阵中各像素的相互关系。灰度图像中,背景图像和每个手写字都有各自对应的像素矩阵。在获取灰度图像中的背景图像和每个手写字对应的像素矩阵后,对像素矩阵进行极差标准化处理,获取极差标准化处理后的像素矩阵对应的有效图像。对像素矩阵进行极差标准化处理,能够提高获取目标图像的处理速度。
在一实施例中,如图4所示,步骤S30,采用核密度估计算法和腐蚀方法对有效图像进行处理,去除背景图像,获取包括手写字的目标图像,具体包括如下步骤:
S31:对有效图像中的像素出现的次数进行统计,获取有效图像对应的频率分布直方图。
其中,频率分布直方图的横轴表示样本数据的连续值,横轴上的每个小区间对应一个组的组距,作为小矩形的底边;纵轴表示频率与组距的比值,并用该比值作为小矩形的高,以多个小矩形构成的一组图称为频率直方图。具体地,获取有效图像后,在频率直方图的横轴表示像素为(0,255)之间的连续值,横轴上每个小矩形对应的组距为1,纵轴表示小矩形对应的像素出现的频率与组距的比值,该比值即为对应的小矩形的高。该频率分布直方图可以形象地将有效图像中的像素出现的次数展示出来,使得数据的分布情况一目了然地反映出来。
S32:采用高斯核密度估算方法对频率分布直方图进行处理,获取频率分布直方图对应的频率极大值和频率极小值,并根据频率极大值和频率极小值获取对应的像素。
高斯核密度估算方法指核函数为高斯核的核密度估算方法。其中,高斯核对应的函数为
Figure PCTCN2018094345-appb-000004
其中,K (x)指像素(自变量)为x的高斯核函数,x指像素,e和π为常数。频率极大值指在频率分布直方图中,频率值大小为极大值的频率值;频率极小值指在频率分布直方图中,频率值大小为极小值的频率值。具体地,采用高斯核密度函数估算方法对获取的有效图像对应的频率分布直方图进行高斯平滑处理,获取该频率分布直方图对应的高斯平滑曲线。基于该高斯平滑曲线上的频率极大值和频率极小值,获取频率极大值和频率极小值对应横轴上的像素。本实施例中,获取频率极大值和频率极小值对应的像素,便于后续对有效图像进行分层区分,获取分层图像。
S33:基于频率极大值和频率极小值对应的像素对有效图像进行分层处理,获取分层图像。
分层图像指基于频率极大值和频率极小值对有效图像进行分层处理得到的图像。获取频率极大值和频率极小值对应的像素,根据频率极大值对应的像素对有效图像进行分层处理,有效图像中有多少个频率极大值,对应的有效图像的像素就被聚类为多少类,该有效图像就会被分为几层。然后以频率极小值对应的像素作为类之间的边界值,根据类之间的边界则可以每一层分层图像对应的像素。
如有效图像中的频率极大值对应的像素分别为12、54、97、113、159、172,频率极小值对应的像素分别为26、69、104、139和163,根据有效图像中的频率极大值的个数可以确定该有效图像的像素可以被分为6类,该有效图像可以被分为6层,频率极小值对应的像素作为类之间的边界值,由于最小的像素为0,最大的像素为255,因此,根据类之间的边界值则可以确定以像素为12的分层图像,该分层图像对应的像素范围为[0,26);以像素为54的分层图像,该分层图像对应的像素范围为[26,69);以像素为97的分层图像,该分层图像对应的像素范围为[69,104);以像素为113的分层图像,该分层图像对应的像素范围为[104,139);以像素为159的分层图像,该分层图像对应的像素范围为[139,163);以像素为172的分层图像,该分层图像对应的像素范围为[163,255]。
S34:对分层图像进行腐蚀和叠加处理,去除背景图像,获取包括手写字的目标图像。
获取分层图像后,对分层图像进行二值化处理。其中,二值化处理是指将图像上的像素设置为0(黑色)或1(白色),将整个图像呈现出明显的黑白效果的处理。对分层图像进行二值化处理后,对二值化处理后的分层图像进行腐蚀处理,去除背景图像部分,保留分层图像上的手写字部分。其中,腐 蚀处理是用于形态学中去除图像的某部分的内容的操作。由于每个分层图像上的像素是属于不同范围的像素,因此,对分层图像进行腐蚀处理后,还需要将每个分层图像叠加,生成仅含有手写字的目标图像。
本申请实施例所提供的手写字训练样本获取方法中,对原始图像进行放大处理,可以防止由于手写字本身的尺寸过小被误处理掉的情况发生,然后进行灰度化处理,获取灰度图像,可以减少后续步骤获取有效图像时需要处理的数据量。然后对灰度图像对应的像素矩阵进行极差标准化处理,压缩像素矩阵的范围,提高获取有效图像的处理速度。然后根据有效图像获取对应的频率分布直方图,并根据频率分布直方图获取频率极大值和频率极小值对应的像素,从而获取分层图像。最后对分层图像进行二值化、腐蚀和叠加处理,完成对原始图像中手写字和背景图像的识别,去除背景图像,获取包括手写字的目标图像。
在一实施例中,如图5所示,步骤S34中,对分层图像进行腐蚀和叠加处理,去除背景图像,获取包括手写字的目标图像,具体包括如下步骤:
S341:对分层图像进行二值化处理,获取分层二值化图像。
分层二值化图像指对分层图像进行二值化处理获取的图像。具体地,获取分层图像后,基于分层图像的采样像素和预先选取的阈值进行比较,将采样大于等于阈值的像素设置为1,小于阈值的像素设置为0的过程。本实施例中,0代表背景像素,1代表目标像素(手写字像素)。该阈值可以通过计算分层图像的类间方差获取,也可以根据经验值获取。阈值的大小会影响分层图像二值化处理的效果,若阈值选取合适,则对分层图像进行二值化处理的效果就比较好,相应地,若阈值选取不合适,则影响分层图像二值化处理的效果。为了方便操作,简化计算过程,本实施例中的阈值根据经验值确定。
S342:对分层二值化图像中的像素进行检测标记,获取分层二值化图像对应的连通区域。
其中,连通区域是指某一特定像素周围的邻接像素所围成的区域。如某特定像素为0,其周围的邻接像素为1,则将邻接像素所围成的区域作为连通区域。
获取每个分层图像对应的分层二值化图像后,对分层二值化图像对应的像素矩阵进行逐行扫描,将符合连通规则(4邻域连通或者8邻域连通)的像素向相同的标号标记出来。4邻域连通指一个特定像素与上、下、左、右四个方向相邻的像素相同的情况;8邻域连通指一个特定像素上、下、左、右、左上、左下、右上、右下八个方向相邻的像素相同的情况。
具体地,像素矩阵包括行和列。对二值化图像中的像素进行检测标记的具体过程为:(1)逐行扫描像素矩阵,把每一行中连续为1的像素(目标像素)组成一个序列,该序列称为团,标记好该团的起点、终点以及所在的行号。团的起点指团的第一个像素,团的终点指团的最后一个像素。(2)对像素矩阵中除了第一行外的剩余行里的团,比较某一特定剩余行中的团与前一行中的所有团是否有重合区域,若没有重合区域,则给该特定剩余行中的团一个新的标号;如果该特定剩余行中的团仅与上一行中一个团有重合区域,则将上一行的该团的标号赋给它;如果该特定剩余行与上一行中有两个以上的团有重合区域,则给对应的团赋一个相关联团的最小标号,并将上一行的这几个团中的标记写入等价对,说明它们属于一类。其中,相关联团指与特定剩余行的团有重合区域的上一行的团;等价对指相互连通的团上的标号。
例如,一像素矩阵中的特定剩余行为第三行,该第三行中有两个团(A,B),其中A团与第二行中的两个团(该两个团的标号为1,2)有重合区域,则将第二行中的两个团的最小标号1赋给该A团,A团的标号为1,并将A团、1团和2团对应的标号记为等价对,即将(1,2)记为等价对。标号为1和标号为2的团则称为一个连通区域。
S343:对分层二值化图像对应的连通区域进行腐蚀和叠加处理,去除背景图像,获取包括手写字的目标图像。
采用MATLAB中的imerode函数或者Open CV中的cvErode函数对分层二值化图像的连通区域进行腐蚀处理。具体地,选取一个结构像素,本实施例是以像素矩阵中某个特征像素相邻的8个像素值作为该特征像素的连通区域的,因此,选取的结构像素3×3的像素矩阵。使用结构像素对分层二值化图像的像素矩阵进行扫描,比较分层二值化图像中的像素矩阵与结构像素是否完全一致,若完全一致时,则像素矩阵中对应的9个像素为都变为1;若不完全一致,则像素矩阵中对应的9个像素都变为0,其 中,0(黑色)则为分层二值化图像被腐蚀的部分。
基于预先设置的手写字区域抗腐蚀能力范围对分层二值化图像进行筛选,对于不在手写字区域抗腐蚀能力范围内的分层二值化图像部分删除,获取分层二值化图像中在手写字区域抗腐蚀能力范围内的部分。对筛选出的符合手写字区域抗腐蚀能力范围的每个分层二值化图像部分对应的像素矩阵进行叠加,就可以获取到仅含有手写字的目标图像。其中,手写字区域抗腐蚀能力可以采用公式:
Figure PCTCN2018094345-appb-000005
计算,s 1表示分层二值化图像中被腐蚀后的总面积,s 2表示分层二值化图像中被腐蚀前的总面积。
如预先设置的手写字区域抗腐蚀能力范围为[0.05,0.8],根据公式
Figure PCTCN2018094345-appb-000006
计算每个分层二值化图像被腐蚀后的总面积和分层二值化图像被腐蚀前的总面积的比值。通过计算,分层二值化图像中某区域腐蚀后的总面积和腐蚀前的总面积的比值不在预先设置的手写字区域抗腐蚀能力范围内,则表示该区域的分层二值化图像是手写字,需要保留。分层二值化图像中的某区域腐蚀后的总面积和腐蚀前的总面积的比值在[0.05,0.8]范围内,则表示该区域的分层二值化图像是手写字,需要保留。对每个分层二值化图像对应的像素矩阵进行叠加,则可以获取含有手写字的目标图像。
对分层图像进行二值化处理,获取分层二值化图像,然后对分层二值化图像中的像素进行检测标记,获取分层二值化图像对应的像素矩阵中每个像素的连通区域,采用结构像素对每个像素的连通区域进行检测,对与结构像素不完全一致的像素矩阵中的像素都变为0,像素为0的分层二值化图像为黑色,该黑色部分则是分层二值化图像被腐蚀的部分,通过计算分层二值化图像被腐蚀后的总面积和分层二值化图像被腐蚀前的总面积的比值,判断该比值是否在预先设置的手写字区域抗腐蚀能力范围,去除背景图像,保留手写字,达到获取目标图像的目的。
在一实施例中,将单字体图像输入到目标手写字识别模型中进行识别,首先需要预先训练好目标手写自识别模型,如图6所示,该手写字训练样本获取方法还包括:训练目标手写字识别模型,其中,训练目标手写字识别模型,具体包括如下步骤:
S71:初始化卷积神经网络模型的权值和偏置。
卷积神经网络的输入层和隐藏层之间存在对应的权值和偏置,隐藏层和输出层之间存在有对应的权值和偏置,在模型训练时,首先需要对卷积神经网络模型中的权值和偏置进行初始化设置,即给卷积神经网络中的输入层与隐藏层之间的权值和偏置设置初始值,并给隐藏层和输出层之间的权值和偏置设置初始值。初始化卷积神经网络模型的权值和偏置是进行模型训练的一个必要步骤,对卷积神经网络模型的权值和偏置进行合理的初始化设置,有利于提高模型训练速度。
S72:获取字体图像训练样本,采用中文二级字库对字体图像训练样本进行标注,并按预设分配规则将字体图像训练样本分为训练集图像和测试集图像。
字体图像训练样本指预先获取的携带有手写字的图像形成的训练样本。该字体图像训练样本是预先收集的不同人写的手写字,通过拍照上传给服务器作为训练样本的图像。
获取字体图像训练样本后,采用中文二级字库对字体图像训练样本中的每个手写字进行标注。如获取400个不同人写的字体图像训练样本,每个字体图像训练样本都写有“忍饥挨饿”,用中文二级字库中的标准字体分别对400个字体图像训练样本中的“忍”“饥”“挨”“饿”进行标注,使得每个字体图像训练样本中的每个手写字都有对应的标签。标准字体是指中文二级字库中收集的不同字体,如宋体、楷体、微软雅黑或仿宋等字体。
对字体图像训练样本中的每个手写字标注完成后,按照预设分配规则将字体图像训练样本分为训练集图像和测试集图像。其中,预设分配规则指预先设置好的用于分配字体图像训练样本的规则,训练集图像指用于对卷积神经网络模型进行训练的单字体图像,测试集图像指用于对训练好的卷积神经网络模型进行测试的单字体图像。如预设分配规则为将字体图像训练样本中的80%作为训练集图像用于对卷积 神经网络模型进行训练,20%作为测试集图像用于对训练好的卷积神经网络模型进行测试。
对字体图像训练样本进行标注便于在模型输出训练结果时,和输出结果进行比较,构建损失函数。将字体图像训练样本分为训练集图像和测试集图像可以避免采用训练集图像对模型进行验证时出现过拟合情况,提高模型的准确性。
S73:基于训练集图像,对卷积神经网络模型中的权值和偏置进行调整,获取初始手写字识别模型。
其中,初始手写字识别模型指通过训练集图像训练,用于识别手写字的卷积神经网络模型。具体地,卷积神经网络模型包括多层卷积层和池化层。服务器在获取训练集图像后,将该训练集图像输入卷积神经网络模型中进行训练,通过每一层卷积层的计算,获取每一层的卷积层的输出,然后在卷积层采用最大池化下样采样对卷积层的输出进行降维处理,具体公式为a l=pool(a l-1),其中,a l表示第l层输出层的输出,a l-1表示l-1层卷积层的输出(即上一层的输出),pool指下采样计算,该下采样计算可以选择最大池化的方法,最大池化实际上就是在n*n的样本中取最大值。最后将降维处理后的卷积层的输出输入到输出层通过公式T=σ'(a l),进行计算,获取对应的卷积神经网络模型的前向输出。其中,T表示输出层的输出,σ表示输出层的激活函数,一般为softmax函数。根据卷积神经网络模型的前向输出和手写字携带的标签构建损失函数,通过损失函数更新卷积神经网络模型中的权值和偏置,获取初始手写字识别模型。
S74:基于测试集图像,获取初始手写字识别模型对应的识别准确率,若识别准确率大于预设准确率,则获取目标手写字识别模型。
预设准确率值预先设置的用于判断初始手写字识别模型对手写字识别的准确性是否满足要求的值。获取初始手写字识别模型后,为了验证初始手写字识别模型的对手写字识别的准确性,需要通过测试集图像进行验证。具体验证过程为:将单字体的测试集图像输入到初始手写字识别模型中进行识别,获取初始手写字识别模型的识别准确率,若识别准确率大与预设准确率,则表示该初始手写字识别模型的准确性满足要求,该初始手写字识别模型可以确定为目标手写字识别模型。该目标手写字识别模型可以直接用于识别手写字。
步骤S71-S74,通过将字体图像训练样本按照预设分配规则分为训练集图像和测试集图像,将训练集图像输入到卷积神经网络模型进行训练,调整卷积神经网络模型中的权值和偏置,获取初始手写字识别模型,用于识别单字体图像。然后将测试集图像输入到初始手写字识别模型进行识别,确定初始手写字识别模型的识别准确率是否满足要求,若满足要求,则表示初始手写字识别模型已经训练完成,可以用于识别手写字,该初始手写字识别模型可以确定为目标手写字识别模型。使用目标手写字识别模型识别手写字可以有效提高识别准确率。训练集图像是指获取初始手写字识别模型使用的含有手写字的图像。测试集图像是指获取目标手写字识别模型使用的含有手写字的图像。单字体图像指使用目标手写字识别模型进行识别时的含有手写字的图像。
在一实施例中,如图7所示,步骤S73,基于训练集图像,对卷积神经网络模型中的权值和偏置进行调整,获取初始手写字识别模型,具体包括如下步骤:
S731:将训练集图像输入到卷积神经网络模型中,获取卷积神经网络模型的前向输出,卷积神经网络模型的前向输出的计算公式为a l=σ(z l)=σ(a l-1*W l+b l)和T=σ'(a l),其中,a l表示第l层卷积层的输出,z l表示未采用激活函数处理前的输出,a l-1表示第l-1层卷积层的输出,σ表示激活函数,W l表示第l层卷积层的权值,b l表示第l层卷积层的偏置,T表示输出层的输出,σ′表示输出层的激活函数。
将训练集图像输入到卷积神经网络模型中,根据公式a l=σ(z l)=σ(a l-1*W l+b l)对训练集图 像进行处理,获取卷积层的输出,然后通过公式a l=pool(a l-1)在池化层对卷积层的输出进行降维处理,获取池化层的输出,最后通过公式T=σ'(a l)获取输出层的卷积神经网络模型的前向输出。
S732:根据卷积神经网络模型的前向输出构建损失函数,并对损失函数求偏导,反向更新卷积神经网络模型中的权值和偏置,获取初始手写字识别模型,其中,对卷积神经网络模型中的权值求偏导的公式为
Figure PCTCN2018094345-appb-000007
对卷积神经网络模型中的偏置求偏导的公式为
Figure PCTCN2018094345-appb-000008
具体地,根据卷积神经网络模型的前向输出和单字体的测试集图像携带的标签构建损失函数,该损失函数具体表示为
Figure PCTCN2018094345-appb-000009
其中,J (θ)为损失函数,n表示训练样本的个数,x i表示第i个训练集图像输入卷积神经网络模型的值,h θ表示卷积神经网络模型的权值和偏置对第i个训练集图像处理的参数,h θx i表示第i个训练集图像经过卷积神经网络模型处理的卷积神经网络模型的前向输出,y i表示与x i相对应的第i个训练样本的标签,θ表示权值和偏置的集合(w,b)。
具体地,对损失函数求偏导,反向更新卷积神经网络模型中的权值和偏置具体包括如下步骤:基于损失函数,分别对卷积神经网络模型中的权值和偏置求偏导,更新卷积神经网络模型的权值和偏置。具体地,根据
Figure PCTCN2018094345-appb-000010
对卷积神经网络模型中的偏置求偏导,根据
Figure PCTCN2018094345-appb-000011
对卷积神经网络模型中的权值求偏导。
步骤S731-S732通过卷积神经网络模型的前向输出构建损失函数,然后通过损失函数求偏导,反向更新卷积神经网络模型中的权值和偏置,获取初始手写字识别模型,完成训练过程。
该方法通过对原始图像进行放大和灰度化处理,获取灰度图像,然后对灰度图像进行价差标准化处理,获取有效图像。方便后续步骤采用核密度估计算法和腐蚀方法对有效图像进行处理,去除背景图像部分,保留只含有手写字的目标图像。采用垂直投影方法对目标图像进行单字体切割,获取单字体图像,将获取的单字体图像输入到目标手写字识别模型中识别,基于单字体图像对应的识别概率值,获取识别结果。基于识别结果查询语义库,根据语义库中存储的中文句子获取单字体图像对应的目标汉字,将获取的目标汉字和单字体图像关联起来作为训练样本并存储在数据库中,方便后续模型训练时直接调用数据库中的手写字训练样本进行训练,提高模型训练的效率。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
在一实施例中,提供一种手写字训练样本获取装置,该手写字训练样本获取装置与上述实施例中手写字训练样本获取方法一一对应。如图8所示,该手写字训练样本获取装置包括原始图像获取模块10、有效图像获取模块20、目标图像获取模块30、单字体图像获取模块40、识别结果获取模块50、目标汉字确认模块60和手写字训练样本获取模块70。各功能模块详细说明如下:
原始图像获取模块10,用于获取原始图像,原始图像包括手写字和背景图像。
有效图像获取模块20,用于对原始图像进行预处理,获取有效图像。
目标图像获取模块30,用于采用核密度估计算法和腐蚀方法对有效图像进行处理,去除背景图像,获取包括手写字的目标图像。
单字体图像获取模块40,用于采用垂直投影方法对目标图像进行单字体切割,获取单字体图像。
识别结果获取模块50,用于将单字体图像输入到目标手写字识别模型中进行识别,当单字体图像的识别概率大于预设概率时,则获取单字体图像对应的识别结果。
目标汉字确认模块60,用于基于识别结果查询语义库,获取单字体图像对应的目标汉字。
手写字训练样本获取模块70,用于将单字体图像和对应的目标汉字关联,获取手写字训练样本。
具体地,有效图像获取模块20包括灰度图像获取单元21和极差标准化处理单元22。
灰度图像获取单元21,用于对原始图像进行放大和灰度化处理,获取灰度图像。
极差标准化处理单元22,用于对灰度图像对应的像素矩阵进行极差标准化处理,获取有效图像,其中,极差标准化处理的公式为
Figure PCTCN2018094345-appb-000012
x是标准化前有效图像的像素,x'是标准化后有效图像的像素,M min是灰度图像对应的像素矩阵M中最小的像素,M max是灰度图像对应的像素矩阵M中最大的像素。
具体地,目标图像获取模块30包括第一处理单元31、第二处理单元32、分层图像获取单元33和腐蚀和叠加处理单元34。
第一处理单元31,用于对有效图像中的像素出现的次数进行统计,获取有效图像对应的频率分布直方图。
第二处理单元32,用于采用高斯核密度估算方法对频率分布直方图进行处理,获取频率分布直方图对应的频率极大值和频率极小值,并根据频率极大值和频率极小值获取对应的像素。
分层图像获取单元33,用于基于频率极大值和频率极小值对应的像素对有效图像进行分层处理,获取分层图像。
腐蚀和叠加处理单元34,用于对分层图像进行腐蚀和叠加处理,去除背景图像,获取包括手写字的目标图像。
具体地,腐蚀和叠加处理单元34包括二值化处理单元341、连通区域获取单元342和连通区域处理单元343。
二值化处理单元341,用于对分层图像进行二值化处理,获取分层二值化图像。
连通区域获取单元342,用于对分层二值化图像中的像素进行检测标记,获取分层二值化图像对应的连通区域。
连通区域处理单元343,用于对分层二值化图像对应的连通区域进行腐蚀和叠加处理,去除背景图像,获取包括手写字的目标图像。
具体地,手写字训练样本获取装置还包括模型初始化单元71、训练样本获取和处理单元72、初始手写字识别模型73和目标手写字识别模型单元74。
模型初始化单元71,用于初始化卷积神经网络模型的权值和偏置。
训练样本获取和处理单元72,用于获取字体图像训练样本,采用中文二级字库对字体图像训练样本进行标注,并按预设分配规则将字体图像训练样本分为训练集图像和测试集图像。
初始手写字识别模型73,用于基于训练集图像,对卷积神经网络模型中的权值和偏置进行调整,获取初始手写字识别模型。
目标手写字识别模型单元74,用于基于测试集图像,获取初始手写字识别模型对应的识别准确率,若识别准确率大于预设准确率,则获取目标手写字识别模型。
具体地,初始手写字识别模型73包括前向输出获取单元731和权值和偏置更新单元732。
前向输出获取单元731,用于将训练集图像输入到卷积神经网络模型中,获取卷积神经网络模型的前向输出,卷积神经网络模型的前向输出的计算公式为a l=σ(z l)=σ(a l-1*W l+b l)和T=σ'(a l),其中,a l表示第l层卷积层的输出,z l表示未采用激活函数处理前的输出,a l-1表示第 l-1层卷积层的输出,σ表示激活函数,W l表示第l层卷积层的权值,b l表示第l层卷积层的偏置,T表示输出层的输出,σ′表示输出层的激活函数。
权值和偏置更新单元732,用于根据卷积神经网络模型的前向输出构建损失函数,并对损失函数求偏导,反向更新卷积神经网络模型中的权值和偏置,获取初始手写字识别模型,其中,对卷积神经网络模型中的权值求偏导的公式为
Figure PCTCN2018094345-appb-000013
对卷积神经网络模型中的偏置求偏导的公式为
Figure PCTCN2018094345-appb-000014
在一实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图9所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储获取的手写字训练样本。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种手写字训练样本获取方法。
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,处理器执行计算机可读指令时实现以下步骤:获取原始图像,原始图像包括手写字和背景图像;对原始图像进行预处理,获取有效图像;采用核密度估计算法和腐蚀方法对有效图像进行处理,去除背景图像,获取包括手写字的目标图像;采用垂直投影方法对目标图像进行单字体切割,获取单字体图像;将单字体图像输入到目标手写字识别模型中进行识别,当单字体图像的识别概率大于预设概率时,则获取单字体图像对应的识别结果;基于识别结果查询语义库,获取单字体图像对应的目标汉字;将单字体图像和对应的目标汉字关联,获取手写字训练样本。
在一实施例中,处理器执行计算机可读指令时还实现以下步骤:对原始图像进行放大和灰度化处理,获取灰度图像;对灰度图像对应的像素矩阵进行极差标准化处理,获取有效图像,其中,极差标准化处理的公式为
Figure PCTCN2018094345-appb-000015
x是标准化前有效图像的像素,x'是标准化后有效图像的像素,M min是灰度图像对应的像素矩阵M中最小的像素,M max是灰度图像对应的像素矩阵M中最大的像素。
在一实施例中,处理器执行计算机可读指令时还实现以下步骤:对有效图像中的像素出现的次数进行统计,获取有效图像对应的频率分布直方图;采用高斯核密度估算方法对频率分布直方图进行处理,获取频率分布直方图对应的频率极大值和频率极小值,并根据频率极大值和频率极小值获取对应的像素;基于频率极大值和频率极小值对应的像素对有效图像进行分层处理,获取分层图像;对分层图像进行腐蚀和叠加处理,去除背景图像,获取包括手写字的目标图像。
在一实施例中,处理器执行计算机可读指令时还实现以下步骤:对分层图像进行二值化处理,获取分层二值化图像;对分层二值化图像中的像素进行检测标记,获取分层二值化图像对应的连通区域;对分层二值化图像对应的连通区域进行腐蚀和叠加处理,去除背景图像,获取包括手写字的目标图像。
在一实施例中,处理器执行计算机可读指令时还实现以下步骤:初始化卷积神经网络模型的权值和偏置;获取字体图像训练样本,采用中文二级字库对字体图像训练样本进行标注,并按预设分配规则将字体图像训练样本分为训练集图像和测试集图像;基于训练集图像,对卷积神经网络模型中的权值和偏置进行调整,获取初始手写字识别模型;基于测试集图像,获取初始手写字识别模型对应的识别准确 率,若识别准确率大于预设准确率,则获取目标手写字识别模型。
在一实施例中,处理器执行计算机可读指令时还实现以下步骤:将训练集图像输入到卷积神经网络模型中,获取卷积神经网络模型的前向输出,卷积神经网络模型的前向输出的计算公式为a l=σ(z l)=σ(a l-1*W l+b l)和T=σ'(a l),其中,a l表示第l层卷积层的输出,z l表示未采用激活函数处理前的输出,a l-1表示第l-1层卷积层的输出,σ表示激活函数,W l表示第l层卷积层的权值,b l表示第l层卷积层的偏置,T表示输出层的输出,σ′表示输出层的激活函数;根据卷积神经网络模型的前向输出构建损失函数,并对损失函数求偏导,反向更新卷积神经网络模型中的权值和偏置,获取初始手写字识别模型,其中,对卷积神经网络模型中的权值求偏导的公式为
Figure PCTCN2018094345-appb-000016
对卷积神经网络模型中的偏置求偏导的公式为
Figure PCTCN2018094345-appb-000017
在一个实施例中,提供了一个或多个存储有计算机可读指令的非易失性可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器实现如下步骤::获取原始图像,原始图像包括手写字和背景图像;对原始图像进行预处理,获取有效图像;采用核密度估计算法和腐蚀方法对有效图像进行处理,去除背景图像,获取包括手写字的目标图像;采用垂直投影方法对目标图像进行单字体切割,获取单字体图像;将单字体图像输入到目标手写字识别模型中进行识别,当单字体图像的识别概率大于预设概率时,则获取单字体图像对应的识别结果;基于识别结果查询语义库,获取单字体图像对应的目标汉字;将单字体图像和对应的目标汉字关联,获取手写字训练样本。
在一实施例中,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器还实现以下步骤:对原始图像进行放大和灰度化处理,获取灰度图像;对灰度图像对应的像素矩阵进行极差标准化处理,获取有效图像,其中,极差标准化处理的公式为
Figure PCTCN2018094345-appb-000018
x是标准化前有效图像的像素,x'是标准化后有效图像的像素,M min是灰度图像对应的像素矩阵M中最小的像素,M max是灰度图像对应的像素矩阵M中最大的像素。
在一实施例中,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器还实现以下步骤:对有效图像中的像素出现的次数进行统计,获取有效图像对应的频率分布直方图;采用高斯核密度估算方法对频率分布直方图进行处理,获取频率分布直方图对应的频率极大值和频率极小值,并根据频率极大值和频率极小值获取对应的像素;基于频率极大值和频率极小值对应的像素对有效图像进行分层处理,获取分层图像;对分层图像进行腐蚀和叠加处理,去除背景图像,获取包括手写字的目标图像。
在一实施例中,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器还实现以下步骤:对分层图像进行二值化处理,获取分层二值化图像;对分层二值化图像中的像素进行检测标记,获取分层二值化图像对应的连通区域;对分层二值化图像对应的连通区域进行腐蚀和叠加处理,去除背景图像,获取包括手写字的目标图像。
在一实施例中,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器还实现以下步骤:初始化卷积神经网络模型的权值和偏置;获取字体图像训练样本,采用中文二级字库对字体图像训练样本进行标注,并按预设分配规则将字体图像训练样本分为训练集图像和测试集图像;基于训练集图像,对卷积神经网络模型中的权值和偏置进行调整,获取初始手写字识别模型;基于测试集图像,获取初始手写字识别模型对应的识别准确率,若识别准确率大于预设准确率,则获取目标手写字识别模型。
在一实施例中,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器还实现以下步骤:将训练集图像输入到卷积神经网络模型中,获取卷积神经网络模型的前向输出,卷积神经网络模型的前向输出的计算公式为a l=σ(z l)=σ(a l-1*W l+b l)和T=σ'(a l),其中,a l表示第l层卷积层的输出,z l表示未采用激活函数处理前的输出,a l-1表示第l-1层卷积层的输出,σ表示激活函数,W l表示第l层卷积层的权值,b l表示第l层卷积层的偏置,T表示输出层的输出,σ′表示输出层的激活函数;根据卷积神经网络模型的前向输出构建损失函数,并对损失函数求偏导,反向更新卷积神经网络模型中的权值和偏置,获取初始手写字识别模型,其中,对卷积神经网络模型中的权值求偏导的公式为
Figure PCTCN2018094345-appb-000019
对卷积神经网络模型中的偏置求偏导的公式为
Figure PCTCN2018094345-appb-000020
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种手写字训练样本获取方法,其特征在于,包括:
    获取原始图像,所述原始图像包括手写字和背景图像;
    对所述原始图像进行预处理,获取有效图像;
    采用核密度估计算法和腐蚀方法对所述有效图像进行处理,去除背景图像,获取包括所述手写字的目标图像;
    采用垂直投影方法对所述目标图像进行单字体切割,获取单字体图像;
    将所述单字体图像输入到目标手写字识别模型中进行识别,当所述单字体图像的识别概率大于预设概率时,则获取所述单字体图像对应的识别结果;
    基于所述识别结果查询语义库,获取所述单字体图像对应的目标汉字;
    将所述单字体图像和对应的目标汉字关联,获取手写字训练样本。
  2. 如权利要求1所述的手写字训练样本获取方法,其特征在于,所述对所述原始图像进行预处理,获取有效图像,包括:
    对所述原始图像进行放大和灰度化处理,获取灰度图像;
    对所述灰度图像对应的像素矩阵进行极差标准化处理,获取有效图像,其中,所述极差标准化处理的公式为
    Figure PCTCN2018094345-appb-100001
    x是标准化前有效图像的像素,x'是标准化后有效图像的像素,M min是所述灰度图像对应的像素矩阵M中最小的像素,M max是所述灰度图像对应的像素矩阵M中最大的像素。
  3. 如权利要求1所述的手写字训练样本获取方法,其特征在于,所述采用核密度估计算法和腐蚀方法对所述有效图像进行处理,去除背景图像,获取包括所述手写字的目标图像,包括:
    对所述有效图像中的像素出现的次数进行统计,获取所述有效图像对应的频率分布直方图;
    采用高斯核密度估算方法对所述频率分布直方图进行处理,获取所述频率分布直方图对应的频率极大值和频率极小值,并根据所述频率极大值和频率极小值获取对应的像素;
    基于所述频率极大值和频率极小值对应的像素对有效图像进行分层处理,获取分层图像;
    对所述分层图像进行腐蚀和叠加处理,去除背景图像,获取包括所述手写字的目标图像。
  4. 如权利要求3所述的手写字训练样本获取方法,其特征在于,所述对所述分层图像进行腐蚀和叠加处理,去除背景图像,获取包括所述手写字的目标图像,包括:
    对所述分层图像进行二值化处理,获取分层二值化图像;
    对所述分层二值化图像中的像素进行检测标记,获取所述分层二值化图像对应的连通区域;
    对所述分层二值化图像对应的连通区域进行腐蚀和叠加处理,去除背景图像,获取包括所述手写字的目标图像。
  5. 如权利要求1所述的手写字训练样本获取方法,其特征在于,所述手写字训练样本获取方法还包括:
    初始化卷积神经网络模型的权值和偏置;
    获取字体图像训练样本,采用中文二级字库对所述字体图像训练样本进行标注,并按预设分配规则将所述字体图像训练样本分为训练集图像和测试集图像;
    基于所述训练集图像,对所述卷积神经网络模型中的权值和偏置进行调整,获取初始手写字识别模型;
    基于所述测试集图像,获取所述初始手写字识别模型对应的识别准确率,若所述识别准确率大于预设准确率,则获取目标手写字识别模型。
  6. 如权利要求5所述的手写字训练样本获取方法,其特征在于,所述基于所述训练集图像,对所述 卷积神经网络模型中的权值和偏置进行调整,获取初始手写字识别模型,包括:
    将所述训练集图像输入到所述卷积神经网络模型中,获取所述卷积神经网络模型的前向输出,所述卷积神经网络模型的前向输出的计算公式为a l=σ(z l)=σ(a l-1*W l+b l)和T=σ'(a l),其中,a l表示第l层卷积层的输出,z l表示未采用激活函数处理前的输出,a l-1表示第l-1层卷积层的输出,σ表示激活函数,W l表示第l层卷积层的权值,b l表示第l层卷积层的偏置,T表示输出层的输出,σ′表示输出层的激活函数;
    根据所述卷积神经网络模型的前向输出构建损失函数,并对损失函数求偏导,反向更新所述卷积神经网络模型中的权值和偏置,获取初始手写字识别模型,其中,对所述卷积神经网络模型中的权值求偏导的公式为
    Figure PCTCN2018094345-appb-100002
    对所述卷积神经网络模型中的偏置求偏导的公式为
    Figure PCTCN2018094345-appb-100003
  7. 一种手写字训练样本获取装置,其特征在于,包括:
    原始图像获取模块,用于获取原始图像,所述原始图像包括手写字和背景图像;
    有效图像获取模块,用于对所述原始图像进行预处理,获取有效图像;
    目标图像获取模块,用于采用核密度估计算法和腐蚀方法对所述有效图像进行处理,去除背景图像,获取包括所述手写字的目标图像;
    单字体图像获取模块,用于采用垂直投影方法对所述目标图像进行单字体切割,获取单字体图像;
    识别结果获取模块,用于将所述单字体图像输入到目标手写字识别模型中进行识别,当所述单字体图像的识别概率大于预设概率时,则获取所述单字体图像对应的识别结果;
    目标汉字确认模块,用于基于所述识别结果查询语义库,获取所述单字体图像对应的目标汉字;
    手写字训练样本获取模块,用于将所述单字体图像和对应的目标汉字关联,获取手写字训练样本。
  8. 如权利要求7所述的手写字训练样本获取装置,其特征在于,所述手写字训练样本获取装置还包括:
    模型初始化单元,用于初始化卷积神经网络模型的权值和偏置;
    训练样本获取和处理单元,用于获取字体图像训练样本,采用中文二级字库对所述字体图像训练样本进行标注,并按预设分配规则将所述字体图像训练样本分为训练集图像和测试集图像;
    初始手写字识别模型,用于基于所述训练集图像,对所述卷积神经网络模型中的权值和偏置进行调整,获取初始手写字识别模型;
    目标手写字识别模型单元,用于基于所述测试集图像,获取所述初始手写字识别模型对应的识别准确率,若所述识别准确率大于预设概率,则获取目标手写字识别模型。
  9. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现如下步骤:
    获取原始图像,所述原始图像包括手写字和背景图像;
    对所述原始图像进行预处理,获取有效图像;
    采用核密度估计算法和腐蚀方法对所述有效图像进行处理,去除背景图像,获取包括所述手写字的目标图像;
    采用垂直投影方法对所述目标图像进行单字体切割,获取单字体图像;
    将所述单字体图像输入到目标手写字识别模型中进行识别,当所述单字体图像的识别概率大于预设概率时,则获取所述单字体图像对应的识别结果;
    基于所述识别结果查询语义库,获取所述单字体图像对应的目标汉字;
    将所述单字体图像和对应的目标汉字关联,获取手写字训练样本。
  10. 如权利要求9所述的计算机设备,其特征在于,所述对所述原始图像进行预处理,获取有效图像,包括:
    对所述原始图像进行放大和灰度化处理,获取灰度图像;
    对所述灰度图像对应的像素矩阵进行极差标准化处理,获取有效图像,其中,所述极差标准化处理的公式为
    Figure PCTCN2018094345-appb-100004
    x是标准化前有效图像的像素,x′是标准化后有效图像的像素,M min是所述灰度图像对应的像素矩阵M中最小的像素,M max是所述灰度图像对应的像素矩阵M中最大的像素。
  11. 如权利要求9所述的计算机设备,其特征在于,所述采用核密度估计算法和腐蚀方法对所述有效图像进行处理,去除背景图像,获取包括所述手写字的目标图像,包括:
    对所述有效图像中的像素出现的次数进行统计,获取所述有效图像对应的频率分布直方图;
    采用高斯核密度估算方法对所述频率分布直方图进行处理,获取所述频率分布直方图对应的频率极大值和频率极小值,并根据所述频率极大值和频率极小值获取对应的像素;
    基于所述频率极大值和频率极小值对应的像素对有效图像进行分层处理,获取分层图像;
    对所述分层图像进行腐蚀和叠加处理,去除背景图像,获取包括所述手写字的目标图像。
  12. 如权利要求11所述的计算机设备,其特征在于,所述对所述分层图像进行腐蚀和叠加处理,去除背景图像,获取包括所述手写字的目标图像,包括:
    对所述分层图像进行二值化处理,获取分层二值化图像;
    对所述分层二值化图像中的像素进行检测标记,获取所述分层二值化图像对应的连通区域;
    对所述分层二值化图像对应的连通区域进行腐蚀和叠加处理,去除背景图像,获取包括所述手写字的目标图像。
  13. 如权利要求9所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还实现如下步骤:
    初始化卷积神经网络模型的权值和偏置;
    获取字体图像训练样本,采用中文二级字库对所述字体图像训练样本进行标注,并按预设分配规则将所述字体图像训练样本分为训练集图像和测试集图像;
    基于所述训练集图像,对所述卷积神经网络模型中的权值和偏置进行调整,获取初始手写字识别模型;
    基于所述测试集图像,获取所述初始手写字识别模型对应的识别准确率,若所述识别准确率大于预设准确率,则获取目标手写字识别模型。
  14. 如权利要求13所述的计算机设备,其特征在于,所述基于所述训练集图像,对所述卷积神经网络模型中的权值和偏置进行调整,获取初始手写字识别模型,包括:
    将所述训练集图像输入到所述卷积神经网络模型中,获取所述卷积神经网络模型的前向输出,所述卷积神经网络模型的前向输出的计算公式为a l=σ(z l)=σ(a l-1*W l+b l)和T=σ'(a l),其中,a l表示第l层卷积层的输出,z l表示未采用激活函数处理前的输出,a l-1表示第l-1层卷积层的输出,σ表示激活函数,W l表示第l层卷积层的权值,b l表示第l层卷积层的偏置,T表示输出层的输出,σ′表示输出层的激活函数;
    根据所述卷积神经网络模型的前向输出构建损失函数,并对损失函数求偏导,反向更新所述卷积神经网络模型中的权值和偏置,获取初始手写字识别模型,其中,对所述卷积神经网络模型中的权值求 偏导的公式为
    Figure PCTCN2018094345-appb-100005
    对所述卷积神经网络模型中的偏置求偏导的公式为
    Figure PCTCN2018094345-appb-100006
  15. 一个或多个存储有计算机可读指令的非易失性可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器实现如下步骤:
    获取原始图像,所述原始图像包括手写字和背景图像;
    对所述原始图像进行预处理,获取有效图像;
    采用核密度估计算法和腐蚀方法对所述有效图像进行处理,去除背景图像,获取包括所述手写字的目标图像;
    采用垂直投影方法对所述目标图像进行单字体切割,获取单字体图像;
    将所述单字体图像输入到目标手写字识别模型中进行识别,当所述单字体图像的识别概率大于预设概率时,则获取所述单字体图像对应的识别结果;
    基于所述识别结果查询语义库,获取所述单字体图像对应的目标汉字;
    将所述单字体图像和对应的目标汉字关联,获取手写字训练样本。
  16. 如权利要求15所述的非易失性可读存储介质,其特征在于,所述对所述原始图像进行预处理,获取有效图像,包括:
    对所述原始图像进行放大和灰度化处理,获取灰度图像;
    对所述灰度图像对应的像素矩阵进行极差标准化处理,获取有效图像,其中,所述极差标准化处理的公式为
    Figure PCTCN2018094345-appb-100007
    x是标准化前有效图像的像素,x'是标准化后有效图像的像素,M min是所述灰度图像对应的像素矩阵M中最小的像素,M max是所述灰度图像对应的像素矩阵M中最大的像素。
  17. 如权利要求15所述的非易失性可读存储介质,其特征在于,所述采用核密度估计算法和腐蚀方法对所述有效图像进行处理,去除背景图像,获取包括所述手写字的目标图像,包括:
    对所述有效图像中的像素出现的次数进行统计,获取所述有效图像对应的频率分布直方图;
    采用高斯核密度估算方法对所述频率分布直方图进行处理,获取所述频率分布直方图对应的频率极大值和频率极小值,并根据所述频率极大值和频率极小值获取对应的像素;
    基于所述频率极大值和频率极小值对应的像素对有效图像进行分层处理,获取分层图像;
    对所述分层图像进行腐蚀和叠加处理,去除背景图像,获取包括所述手写字的目标图像。
  18. 如权利要求17所述的非易失性可读存储介质,其特征在于,所述对所述分层图像进行腐蚀和叠加处理,去除背景图像,获取包括所述手写字的目标图像,包括:
    对所述分层图像进行二值化处理,获取分层二值化图像;
    对所述分层二值化图像中的像素进行检测标记,获取所述分层二值化图像对应的连通区域;
    对所述分层二值化图像对应的连通区域进行腐蚀和叠加处理,去除背景图像,获取包括所述手写字的目标图像。
  19. 如权利要求15所述的非易失性可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还实现如下步骤:
    初始化卷积神经网络模型的权值和偏置;
    获取字体图像训练样本,采用中文二级字库对所述字体图像训练样本进行标注,并按预设分配规则将所述字体图像训练样本分为训练集图像和测试集图像;
    基于所述训练集图像,对所述卷积神经网络模型中的权值和偏置进行调整,获取初始手写字识别模型;
    基于所述测试集图像,获取所述初始手写字识别模型对应的识别准确率,若所述识别准确率大于预 设准确率,则获取目标手写字识别模型。
  20. 如权利要求19所述的非易失性可读存储介质,其特征在于,所述基于所述训练集图像,对所述卷积神经网络模型中的权值和偏置进行调整,获取初始手写字识别模型,包括:
    将所述训练集图像输入到所述卷积神经网络模型中,获取所述卷积神经网络模型的前向输出,所述卷积神经网络模型的前向输出的计算公式为a l=σ(z l)=σ(a l-1*W l+b l)和T=σ'(a l),其中,a l表示第l层卷积层的输出,z l表示未采用激活函数处理前的输出,a l-1表示第l-1层卷积层的输出,σ表示激活函数,W l表示第l层卷积层的权值,b l表示第l层卷积层的偏置,T表示输出层的输出,σ′表示输出层的激活函数;
    根据所述卷积神经网络模型的前向输出构建损失函数,并对损失函数求偏导,反向更新所述卷积神经网络模型中的权值和偏置,获取初始手写字识别模型,其中,对所述卷积神经网络模型中的权值求偏导的公式为
    Figure PCTCN2018094345-appb-100008
    对所述卷积神经网络模型中的偏置求偏导的公式为
    Figure PCTCN2018094345-appb-100009
PCT/CN2018/094345 2018-06-04 2018-07-03 手写字训练样本获取方法、装置、计算机设备及存储介质 WO2019232870A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810564731.7 2018-06-04
CN201810564731.7A CN109063720A (zh) 2018-06-04 2018-06-04 手写字训练样本获取方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2019232870A1 true WO2019232870A1 (zh) 2019-12-12

Family

ID=64820319

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/094345 WO2019232870A1 (zh) 2018-06-04 2018-07-03 手写字训练样本获取方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN109063720A (zh)
WO (1) WO2019232870A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111275051A (zh) * 2020-02-28 2020-06-12 上海眼控科技股份有限公司 字符识别方法、装置、计算机设备和计算机可读存储介质
CN112990175A (zh) * 2021-04-01 2021-06-18 深圳思谋信息科技有限公司 手写中文字符的识别方法、装置、计算机设备和存储介质
CN113792851A (zh) * 2021-09-09 2021-12-14 北京百度网讯科技有限公司 字体生成模型训练方法、字库建立方法、装置及设备

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109784342B (zh) * 2019-01-24 2021-03-12 厦门商集网络科技有限责任公司 一种基于深度学习模型的ocr识别方法及终端
CN110136103B (zh) * 2019-04-24 2024-05-28 平安科技(深圳)有限公司 医学影像解释方法、装置、计算机设备及存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070005537A1 (en) * 2005-06-02 2007-01-04 Microsoft Corporation Handwriting recognition using a comparative neural network
CN102254196A (zh) * 2011-06-22 2011-11-23 江苏奥博洋信息技术有限公司 计算机鉴别手写汉字的方法
CN107844740A (zh) * 2017-09-05 2018-03-27 中国地质调查局西安地质调查中心 一种脱机手写、印刷汉字识别方法及系统

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070005537A1 (en) * 2005-06-02 2007-01-04 Microsoft Corporation Handwriting recognition using a comparative neural network
CN102254196A (zh) * 2011-06-22 2011-11-23 江苏奥博洋信息技术有限公司 计算机鉴别手写汉字的方法
CN107844740A (zh) * 2017-09-05 2018-03-27 中国地质调查局西安地质调查中心 一种脱机手写、印刷汉字识别方法及系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WANG, YAN: "Study on Algrithoms of Offline Handwritten Chinese Identification and Recognition", CHINESE DOCTORAL DISSERTATIONS FULL-TEXT DATABASE, INFORMATION SCIENCE AND TECHNOLOGY, no. 03, 15 March 2015 (2015-03-15), ISSN: 1674-022X *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111275051A (zh) * 2020-02-28 2020-06-12 上海眼控科技股份有限公司 字符识别方法、装置、计算机设备和计算机可读存储介质
CN112990175A (zh) * 2021-04-01 2021-06-18 深圳思谋信息科技有限公司 手写中文字符的识别方法、装置、计算机设备和存储介质
CN113792851A (zh) * 2021-09-09 2021-12-14 北京百度网讯科技有限公司 字体生成模型训练方法、字库建立方法、装置及设备
CN113792851B (zh) * 2021-09-09 2023-07-25 北京百度网讯科技有限公司 字体生成模型训练方法、字库建立方法、装置及设备

Also Published As

Publication number Publication date
CN109063720A (zh) 2018-12-21

Similar Documents

Publication Publication Date Title
WO2019232874A1 (zh) 汉字模型训练方法、汉字识别方法、装置、设备及介质
WO2019232873A1 (zh) 文字模型训练方法、文字识别方法、装置、设备及介质
WO2019232872A1 (zh) 手写字模型训练方法、汉字识别方法、装置、设备及介质
WO2019232853A1 (zh) 中文模型训练、中文图像识别方法、装置、设备及介质
WO2019232849A1 (zh) 汉字模型训练方法、手写字识别方法、装置、设备及介质
WO2019232870A1 (zh) 手写字训练样本获取方法、装置、计算机设备及存储介质
WO2019232843A1 (zh) 手写模型训练、手写图像识别方法、装置、设备及介质
CN110569830B (zh) 多语言文本识别方法、装置、计算机设备及存储介质
CN109726643B (zh) 图像中表格信息的识别方法、装置、电子设备及存储介质
WO2019232852A1 (zh) 手写字训练样本获取方法、装置、设备及介质
WO2017020723A1 (zh) 一种字符分割方法、装置及电子设备
WO2019232850A1 (zh) 手写汉字图像识别方法、装置、计算机设备及存储介质
CN110647829A (zh) 一种票据的文本识别方法及系统
CN110619274A (zh) 基于印章和签名的身份验证方法、装置和计算机设备
CN109740606B (zh) 一种图像识别方法及装置
CN109840524B (zh) 文字的类型识别方法、装置、设备及存储介质
CN114612469B (zh) 产品缺陷检测方法、装置、设备及可读存储介质
CN113158808A (zh) 中文古籍字符识别、组段与版面重建方法、介质和设备
CN113221956B (zh) 基于改进的多尺度深度模型的目标识别方法及装置
CN113569968B (zh) 模型训练方法、目标检测方法、装置、设备及存储介质
CN115239644B (zh) 混凝土缺陷识别方法、装置、计算机设备和存储介质
CN116596875B (zh) 晶圆缺陷检测方法、装置、电子设备及存储介质
US20220254148A1 (en) Defect detecting method based on dimensionality reduction of data, electronic device, and storage medium
Julca-Aguilar et al. Text/non-text classification of connected components in document images
CN108985294B (zh) 一种轮胎模具图片的定位方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18921379

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 11/03/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18921379

Country of ref document: EP

Kind code of ref document: A1