CN110472632B

CN110472632B - Character segmentation method and device based on character features and computer storage medium

Info

Publication number: CN110472632B
Application number: CN201910702665.XA
Authority: CN
Inventors: 刘晋; 高珍喻; 李云辉
Original assignee: Shanghai Maritime University
Current assignee: Shanghai Maritime University
Priority date: 2019-07-31
Filing date: 2019-07-31
Publication date: 2022-09-30
Anticipated expiration: 2039-07-31
Also published as: CN110472632A

Abstract

The invention provides a character segmentation method based on character features, which is applied to the technical field of image processing and comprises the following steps: acquiring an image to be processed; carrying out binarization processing on the image to be processed to obtain a binarized image; extracting the features of the binary image by adopting a basic feature extraction network; performing feature extraction on the forms of the characters according to the extracted features to acquire first features, and performing feature extraction on the number of the characters to acquire second features; fusing the first features and the second features by adopting a semantic segmentation network so as to generate a semantic segmentation graph; and determining the segmentation position of the character according to the semantic segmentation graph. In addition, the invention also discloses a character segmentation device based on character characteristics and a computer storage medium.

Description

Character segmentation method and device based on character features and computer storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a character segmentation method and apparatus based on character features, and a computer storage medium.

Background

Character segmentation is the basis and precondition of image character information extraction, and reasonable and correct segmentation must be performed on characters.

For character segmentation, methods proposed earlier are projection-based segmentation and Connected domain-based segmentation (Vertical projection, Connected domain). The two methods do not aim at the condition of conglutination characters when being proposed, so that each character cannot be well divided aiming at the conditions of unclear, missing and conglutination of more strokes in the actual scene picture. Meanwhile, due to the morphological characteristics of the upper-lower structure and the left-right structure of the Chinese character, a plurality of characters are divided into a plurality of parts. Although morphological characteristics of characters are considered in the Water drop and Clustering-based segmentation method (Water drop and Clustering), local characteristics of the characters are only optimized, the adhered strokes are simply divided in a gravity-like or Clustering mode, improvement is achieved to a certain extent, and the result of division of the complicated adhered strokes is not ideal.

Therefore, an effective character cutting method is lacking in the prior art.

Disclosure of Invention

In view of the above-mentioned shortcomings of the prior art, it is an object of the present invention to provide a character segmentation method and apparatus based on character features, which can solve the shortcomings of the conventional method by adding character information such as word width and word number as the basis of segmentation. Meanwhile, the convolutional neural network is adopted to extract character information, and the full convolutional network is applied to carry out semantic segmentation, so that the problems of stroke missing and stroke adhesion character segmentation can be effectively solved.

To achieve the above and other related objects, the present invention provides a character segmentation method based on character features, the method comprising:

acquiring an image to be processed;

carrying out binarization processing on the image to be processed to obtain a binarized image;

extracting the features of the binary image by adopting a basic feature extraction network;

performing feature extraction on the forms of the characters according to the extracted features to acquire first features, and performing feature extraction on the number of the characters to acquire second features;

fusing the first characteristic and the second characteristic by adopting a semantic segmentation network to generate a semantic segmentation graph;

and determining the segmentation position of the character according to the semantic segmentation graph.

In one implementation manner, the step of performing binarization processing on the image to be processed to obtain a binarized image includes:

generating a gray level histogram according to the image to be processed;

acquiring a foreground peak and a background peak corresponding to the gray level histogram;

acquiring gray values corresponding to the troughs of the foreground peak and the background peak;

the acquired gray value is taken as a binarization threshold value.

In one implementation manner, the step of performing feature extraction on the binarized image by using a basic feature extraction network includes:

and carrying out feature extraction on the binary image by adopting a Convolutional Neural Network (CNN).

In one implementation, the step of fusing the first feature and the second feature by using a semantic segmentation network to generate a semantic segmentation map includes:

receiving the first feature and the second feature, and restoring the size of the data through deconvolution and upsampling operations until the size of the image to be processed is reached;

and carrying out Softmax classification on the restored image, and taking the classified image as a semantic segmentation map.

In one implementation, the step of performing Softmax classification on the restored image and using the classified image as a semantic segmentation map includes:

classifying each pixel point in the restored image;

acquiring the probability of each pixel point corresponding to the character class;

and performing segmentation according to the obtained probability.

In one implementation, the step of training the convolutional neural network CNN includes:

constructing a training data set, wherein the data set comprises 30000 pictures with adhesion conditions prepared by adopting 3755 Chinese characters specified in GB2312 primary national standard, the size of the pictures is 512 x 512, the size of characters in the pictures is between [70px and 80px ], and the number of the characters is between [2 and 5 ];

randomly adding white noise and interference textures to the data set to obtain an enhanced image;

and performing Convolutional Neural Network (CNN) training based on the enhanced image.

The invention also discloses a character segmentation device based on character characteristics, which comprises a processor and a memory connected with the processor through a communication bus; wherein, the first and the second end of the pipe are connected with each other,

the memory is used for storing a character segmentation program based on character features;

the processor is configured to execute the character feature-based character segmentation program to implement any one of the character feature-based character segmentation steps.

And a computer storage medium storing one or more programs executable by one or more processors to cause the one or more processors to perform any of the character feature based character segmentation steps is also disclosed.

As described above, embodiments of the present invention provide a character segmentation method, apparatus and computer-readable storage medium based on character features, which can solve the shortcomings of the conventional methods by adding character information such as word width and number of words as the basis of segmentation. Meanwhile, the convolutional neural network is adopted to extract character information, and the full convolutional network is applied to carry out semantic segmentation, so that the problems of stroke missing and stroke adhesion character segmentation can be effectively solved.

Drawings

Fig. 1 is a schematic flow chart of a character segmentation method based on character features according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of an application of a character segmentation method based on character features according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of an application of a character segmentation method based on character features according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of an application of a character segmentation method based on character features according to an embodiment of the present invention.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention.

Please refer to fig. 1-4. It should be noted that the drawings provided in the present embodiment are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.

As shown in fig. 1, an embodiment of the present invention provides a character segmentation method based on character features, where the method includes:

and S101, acquiring an image to be processed.

The image to be processed is an image including text characters and requiring segmentation processing.

And S102, carrying out binarization processing on the image to be processed to obtain a binarized image.

On a gray level histogram generated by the image, two peaks respectively represent a target image and a background image, and the trough position between the two peaks is selected as a binarization threshold value T.

Wherein f (x, y) is the gray scale value of the gray scale image, and g (x, y) is the gray scale image after binarization.

Specifically, other methods for generating the binarized image by selecting the threshold value may be used, such as a P parameter method, a maximum entropy threshold method, a maximum inter-class variance method, and the like.

And S103, extracting the features of the binary image by adopting a basic feature extraction network.

Specifically, the basic feature extraction network is a Convolutional Neural Network (CNN), and the trained CNN is required to be used for feature billiards.

In the CNN training process, firstly, a training set picture meeting requirements is adopted, and 30000 pictures with adhesion conditions are formulated by adopting 3755 Chinese characters specified in GB2312 first-level national standard in a data set. The size of the picture adopted in the experiment is 512 x 512, the size of the characters in the picture is between [70px and 80px ], and the number of the characters is between [2 and 5 ]. White noise and interference textures are artificially and randomly added for data enhancement when the character picture is constructed. And generating a character information label, namely character width and character number information, by performing semantic marking manually or semi-manually according to the generated sample image. The key parameter configuration table in the specific process of constructing the data set is shown in fig. 2. The data set portion picture pattern is shown, for example, in fig. 3.

In one embodiment of the present invention, the preprocessed image is subjected to a plurality of convolution and pooling operations, and the basic structure of each convolution operation unit includes 3 convolution layers, 3 active layers and 1 pooling layer. Through a plurality of convolution and pooling processes, multi-dimensional data is converted into one-dimensional data through the flat layer, so that feature fusion is facilitated, forward propagation data volume is reduced through the Dropout layer, and a final output result is obtained through the full connection layer.

Or converting multidimensional data into one-dimensional data through a flat layer through multiple convolution and pooling processing so as to facilitate feature fusion, performing concatenate operation through a fusion layer to fuse character morphological features and character number features, reducing forward propagation data amount through a Drapout layer, and obtaining a final output result through a full connection layer.

For layer I neurons in a neural network, the output of the neuron is represented as y ^l . For the ith neuron in the l +1 layer neural network, the method is used

Express their corresponding weights by

Indicating its corresponding bias. The common two-dimensional convolutional neural network calculation formula is as follows:

wherein the content of the first and second substances,

represents the calculated value of the ith neuron in the l +1 layer neural network,

represents the output result corresponding to the neuron after the calculated value is processed by the activation function f ().

The neural network calculation formula adopting the Dropout mechanism is as follows:

r _i ^l ＝Bernoulli(p) _i

wherein, Bernoulli (p) _i Means for randomly generating a function for the ith neuron in the l-th neural network with a probability p

The 0, 1 vectors represented. Then processing the output of the ith neuron in the original l-th layer neural network through the generated vector

For results use

To indicate.

After the neural network training adopting the Dropout mechanism is completed, in a stage of performing prediction processing by using the neural network, a calculation formula of a prediction result of an ith neuron in the l +1 layer neural network is as follows:

s104, extracting the features of the characters according to the extracted features to obtain first features, and extracting the features of the number of the characters to obtain second features.

In one embodiment of the invention, the image to be semantically segmented is preprocessed, 1) the sub-network of character morphological feature extraction: and (3) marking a feature diagram obtained by passing the result of the preprocessing through 1 convolutional layer and 1 pooling layer as Fa 1_1, a feature diagram obtained by passing Fa 1_1 through 3 convolutional layers and 1 pooling layer as Fa 1_2, a feature diagram obtained by passing Fa 1_2 through 3 convolutional layers and 1 pooling layer as Fa 1_3, a feature diagram obtained by passing Fa 1_3 through 1 convolutional layer and 1 tiled layer as Fa 1_4, and an output result of the character morphological feature extraction network obtained by passing Fa 1_4 through 4 dense layers and 3 dropout layers, wherein the segmentation result is shown in FIG. 4.

2) Character number feature extraction sub-network: and marking a feature diagram obtained by passing the result of the preprocessing through 1 convolutional layer and 1 pooling layer as Fa 2_1, marking a feature diagram obtained by passing Fa 2_1 through 3 convolutional layers and 1 pooling layer as Fa 2_2, marking a feature diagram obtained by passing Fa 2_2 through 3 convolutional layers and 1 pooling layer as Fa 2_3, marking a feature diagram obtained by passing Fa 2_3 through 1 convolutional layer and 1 tiling layer as Fa 2_4, processing Fa 2_4 and Fa 1_4 through one fusion layer, and passing the fusion result through 4 dense layers and 3 drop layers to obtain an output result of the character number feature extraction network.

In a specific embodiment of the present invention, the basic feature extraction network and the character information feature extraction network are used as the basic network, the results output by the two networks are processed through 1 fusion layer, the fusion result is processed through 3 deconvolution and upsampling operations, each operation unit includes 3 deconvolution layers and one upsampling layer, and finally, the output result of the network is obtained through 4 deconvolution layers.

The neural network training described above uses adam (adaptive motion estimation) as an optimizer. The loss weight set to the neural network output of the character information extraction section is 0.5, and the loss weight set to the neural network output of the semantic segmentation section is 1.0. In addition, the neural network output of the character information extraction part adopts a loss calculation method of 'geographic _ crosssensory' and the neural network output of the semantic segmentation part adopts a loss calculation method of 'binary _ crosssensory'.

The initial learning rate learning _ rate in the Adam parameter is set to 0.0001(1e-4), the exponential decay rate of the first order moment estimate beta _1 is set to 0.9, and the exponential decay rate of the second order moment estimate beta _2 is set to 0.999. Also to prevent division by zero in the calculation, epsilon is set to 1e-08 while decade is set to 0.0.

And S105, fusing the first characteristic and the second characteristic by adopting a semantic segmentation network to generate a semantic segmentation graph.

Multiple deconvolution and upsampling operations are then employed to reduce the data size to the same size as the input image. And classifying each pixel point by using the obtained data and adopting a softmax classification function, wherein the value of each pixel point represents the probability of the pixel point being classified into the character class, and generating a semantic segmentation graph.

softmax function:

the K-dimensional real vector of the obtained image data is mapped to a K-dimensional real vector σ (z) whose components are 0 to 1.

And S106, determining the segmentation position of the character according to the semantic segmentation graph.

The semantic segmentation graph can be subjected to graying and binarization processing, the processing result is subjected to opening and closing operation, a minimum closure area is found, a maximum rectangle is found in the area, and the coordinates of the rectangle are obtained. And determining the segmentation position of the character according to the obtained word width and the number of the characters in S140 and S150.

A convolutional neural network: convolutional Neural Networks (CNNs) were proposed inspired by the visual neuroscience. The structure mainly comprises a convolution layer and a pooling layer. The earliest convolutional neural network model was the LeNet-5 model proposed by LeCun Y in 1998. In the model, the original image is converted into several feature maps by a convolution layer and a sampling layer. These feature maps map low-level local region features onto higher-level global features through convolution operations using the action of convolution kernels. Since then, there are continuous methods based on convolutional neural network infrastructure improvement to get good results in ImageNet competitions.

Based on projection segmentation: and scanning the image transversely and longitudinally, counting the number of pixel black points in two directions, wherein the columns and the rows without the pixel black points are the segmentation positions.

Based on clustering segmentation: according to the characteristics of the gray scale, the color, the texture, the shape and the like of the image, the image is divided into a plurality of non-overlapping regions, the characteristics are similar in the same region, and obvious difference exists between different regions. I.e. a clustering algorithm is used to group pixels to achieve image segmentation.

The full convolution neural network (FCN) is a neural network structure proposed by Long Jonathan et al in 2015, and mainly aims at the segmentation task of images. The method converts a full connection layer in a traditional Convolutional Neural Network (CNN) into a convolutional layer, and classifies an image at a pixel level in an end-to-end mode, so that the problem of semantic level image segmentation is solved.

In the field of engineering application, the task of character recognition has a bottleneck that the recognition accuracy rate is difficult to improve. This is because it is difficult to obtain accurate segmentation for characters in an image. Some of the problems that are difficult to deal with are, for example, that a single character is often cut apart for a left-right structure of a chinese character, that a single character is cut into a plurality of parts for a missing stroke of a chinese character, and that a plurality of characters are cut into a character area for a more common adhesion of strokes of a chinese character. Therefore, the method of adding character information such as word width and number of words as a basis for division can solve the disadvantages of the conventional method. Meanwhile, the convolutional neural network is adopted to extract character information, and the full convolutional network is applied to carry out semantic segmentation, so that the problems of stroke missing and stroke adhesion character segmentation can be effectively solved.

The invention also discloses a character segmentation device based on character characteristics, which comprises a processor and a memory connected with the processor through a communication bus; wherein the content of the first and second substances,

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Those skilled in the art can modify or change the above-described embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims

1. A character segmentation method based on character features is characterized by comprising the following steps:

acquiring an image to be processed;

2. The character feature-based character segmentation method according to claim 1, wherein the step of performing binarization processing on the image to be processed to obtain a binarized image comprises:

generating a gray level histogram according to the image to be processed;

the acquired gray value is taken as a binarization threshold value.

3. The character feature-based character segmentation method according to claim 1 or 2, wherein the step of performing feature extraction on the binarized image by using a basic feature extraction network comprises:

4. The character segmentation method based on character features according to claim 3, wherein the step of fusing the first features and the second features by using a semantic segmentation network to generate a semantic segmentation graph comprises:

5. The character segmentation method based on character features according to claim 4, wherein the step of performing Softmax classification on the restored image and taking the classified image as a semantic segmentation map comprises the steps of:

classifying each pixel point in the restored image;

and performing segmentation according to the obtained probability.

6. The character feature-based character segmentation method according to claim 5, wherein the step of training the convolutional neural network CNN comprises:

7. An apparatus for character segmentation based on character features, the apparatus comprising a processor, and a memory coupled to the processor via a communication bus; wherein the content of the first and second substances,

the processor, configured to execute the character feature-based character segmentation program to implement the steps of the character feature-based character segmentation method according to any one of claims 1 to 6.

8. A computer storage medium storing one or more programs, the one or more programs being executable by one or more processors to cause the one or more processors to perform the steps of the character feature based character segmentation method as claimed in any one of claims 1 to 6.