CN109271882B

CN109271882B - Method for extracting color-distinguished handwritten Chinese characters

Info

Publication number: CN109271882B
Application number: CN201810984203.7A
Authority: CN
Inventors: 彭艺; 尹玉梅; 祁俊辉
Original assignee: Kunming University of Science and Technology
Current assignee: Kunming University of Science and Technology
Priority date: 2018-08-28
Filing date: 2018-08-28
Publication date: 2020-05-15
Anticipated expiration: 2038-08-28
Also published as: CN109271882A

Abstract

The invention relates to a method for extracting color-distinguished handwritten Chinese characters, belonging to the technical field of image and character processing. After the picture of the handwritten Chinese character with various colors is obtained, graying and binarization processing are firstly carried out to remove redundant traces; then, removing Gaussian additive white noise in the picture by using mean filtering, and removing background information in the picture by using binarization again; then, performing operations such as row segmentation, column segmentation and the like on the Chinese characters by using a threshold approximation method, performing single character normalization processing and thinning processing on the Chinese characters, and extracting the characteristics of the Chinese characters; and finally, solving the distance between the feature vector of the character to be recognized and the feature vector of the character in the standard handwriting sample database by using a Mahalanobis distance formula, and selecting the character with the minimum corresponding distance for recognition and output. The invention increases the effectiveness and accuracy of recognizing the handwritten Chinese characters with various colors by depending on a computer at present.

Description

Method for extracting color-distinguished handwritten Chinese characters

Technical Field

The invention relates to a method for extracting color-distinguished handwritten Chinese characters, belonging to the technical field of image and character processing.

Background

When the students write at ordinary times, the teacher changes the good composition in batches, the parents think that the composition has the storage value, the composition is made into an electronic file, and if the composition is manually operated, the efficiency is low and mistakes are easy to make. If the composition after the teacher's correction trace can be removed by using the computer technology and recognized as the electronic file for storage, the method has great practical significance and practical significance.

At present, the removal of redundant traces on pictures mainly depends on Photoshop technology, which is time-consuming and tedious and has strict requirements on the operation capability of individuals, so that the wide-range popularization is not available. These problems can be completely avoided if an intelligent approach is used. The existing technology for recognizing handwritten Chinese characters is gradually mature, the application field is more and more extensive, and the technology is applied to daily life, so that the technology can undoubtedly bring great convenience to the life of people.

Disclosure of Invention

The invention aims to solve the technical problems of limitation and deficiency of the prior art, provides a method for extracting color-distinguished handwritten Chinese characters, aims to solve the problems of poor pertinence, low efficiency and the like when the prior art identifies the handwritten Chinese characters with various colors, and aims to increase the effectiveness and accuracy of identifying the handwritten Chinese characters with various colors by a computer at present.

The technical scheme of the invention is as follows: a method for extracting color-differentiated handwritten Chinese characters is characterized by comprising the following steps: after acquiring a picture of the handwritten Chinese character with various colors, firstly carrying out graying and binarization processing to remove redundant traces; then, removing Gaussian additive white noise in the picture by using mean filtering, and removing background information in the picture by using binarization again; then, performing operations such as row segmentation, column segmentation and the like on the Chinese characters by using a threshold approximation method, performing single character normalization processing and thinning processing on the Chinese characters, and extracting the characteristics of the Chinese characters; and finally, solving the distance between the feature vector of the character to be recognized and the feature vector of the character in the standard handwriting sample database by using a Mahalanobis distance formula, and selecting the character with the minimum corresponding distance for recognition and output.

The method comprises the following specific steps:

step 1: collecting handwritten Chinese character composition picture { P) after teacher's correction₁,P₂,…,P_NThe database P is generated and stored in the local computer;

step 2: for picture P_i,i∈[1,N]After the graying processing, the binary threshold value processing is carried out by utilizing a threshold value M1, the correction trace of the teacher is removed, and a new picture P 'is obtained'_i,i∈[1,N]And generating a new database P';

step 3: to picture P'_i,i∈[1,N]Carrying out picture preprocessing, including smooth denoising, binarization, line character segmentation, normalization and refinement, specifically as shown in Step3.1-Step3.5;

step3.1: smoothing and denoising; adopting a mean value filtering method to obtain a picture P'_i,i∈[1,N]Filtering out medium and high frequency components;

step3.2: binaryzation; picture P'_i,i∈[1,N]Performing binarization threshold processing by using a threshold value M2 so as to retain Chinese characters in the picture and remove the background in the picture;

step3.3: line character segmentation; the threshold approximation algorithm is adopted for segmentation, character segmentation is carried out on the basis, and a segmentation line thinning algorithm, an over-segmentation elimination algorithm and an overlapped character stroke breaking processing algorithm are adopted in consideration of the occurrence of overlapping, disjunctive and other situations of handwritingThe three algorithms are used for realizing word segmentation to obtain single character data Q: { Q₁,q₂,…,q_n,…,q_m}；

Step3.4: normalization; after Chinese characters are segmented into line characters, the obtained feature blocks are different in size, so that the sizes of the feature block images are unified by adopting normalization operation to obtain normalized single character data Q': Q: { Q₁′,q₂′,…,q_n′,…,q_m′}；

Step3.5: thinning; refining the data Q 'by a table look-up method to obtain refined single character data Q': Q ″₁″,q₂″,…,q_n″,…,q_m″}；

Step 4: performing feature extraction based on elastic grids on the Chinese characters in the data Q', wherein the feature extraction comprises coarse peripheral feature extraction, outer contour feature extraction, inner contour extraction and direction pixel feature extraction, and the feature extraction is specifically shown as Step4.1-Step4.4;

step4.1: extracting coarse peripheral features; firstly, filling an internal area of a skeleton image in Q', if a pixel point P is white, detecting whether strokes exist in the upper, lower, left and right directions of the pixel point P, if the strokes exist, determining the point as the internal area, setting the white as black, and sequentially processing all white points; then dividing the filling graph into 16 small blocks of 4 multiplied by 4, and counting the number of black points of each small block to form a 16-dimensional coarse peripheral feature;

step4.2: extracting outer contour features; scanning the image in Q' along the upper, lower, left and right 4 directions, dividing the scanning area by using an elastic grid, counting the area of each area which is firstly contacted with the stroke, dividing each direction into 4 areas, wherein the shadow area of each area is a 1-dimensional feature, and obtaining a 4 multiplied by 4-16-dimensional feature after processing;

step4.3: extracting an inner contour; scanning the image in Q' along the upper, lower, left and right 4 directions, extracting the scanning mode and the outer contour features, and counting the area between the first time of passing through the strokes and the second time of touching the strokes again to form 16-dimensional inner contour features;

step4.4: extracting directional pixel characteristics; performing first-order differential operation on the image in Q' to obtain an external contour line image of the Chinese character; dividing the outer contour line image into 8 × 8-64 areas, and counting the direction line pixel cumulative sum of effective pixels in each small block;

step 5: identifying an output; in the feature space, a minimum distance classifier takes a reference template as a representative of a certain pattern class, takes the distance between a feature vector of a sample to be identified and the reference template as a basis for realizing classification judgment, and takes the minimum distance between the sample to be identified and a class I reference sample as an identification result; using the Mahalanobis distance d_m(X,μ_i) Measuring the distance, wherein the specific expression is shown as formula (1);

wherein X is (X)₁,x₂,…,x_n)^TThe feature vector, μ, representing the sample A to be identified_i＝(μ_i1,μ_i2,…,μ_in)^TMean vector, σ, representing type I mode_i＝(σ_i1,σ_i2,…,σ_in)^TRepresents the mean square error of the type i mode.

Further, in Step2, adjusting a threshold parameter M1 according to a binarization threshold method, and binarizing the teacher's handwriting and the student's handwriting so that the pixel value of the teacher's handwriting is 0 and the pixel value of the student's handwriting is 255; wherein the threshold parameter M1 may be selected according to the actual implementation.

Further, in the step step3.2, a threshold parameter M2 is adjusted according to a binarization threshold method, so that the Chinese characters in the picture are retained and the background in the picture is removed; wherein the threshold parameter M2 may be selected according to the actual implementation.

Further, in the step step3.4, the image is sequentially subjected to coordinate centering, X-sharpening normalization, scaling normalization and rotation normalization, and finally a Chinese character lattice with the size of 64 × 64 is obtained.

Further, in step step3.5, image refinement refers to removing some points from the original image, but still keeping the original shape; judging whether one point can be removed or not takes the condition of 8 adjacent points as a criterion, namely, the conditions that an internal point cannot be removed, an isolated point cannot be removed, a straight line end point cannot be removed and if P is a boundary point and after P is removed, if the connected component is not increased, P can be removed are met.

Further, in Step4, the elastic mesh is a non-uniform mesh obtained by dividing the Chinese characters by non-uniform mesh lines according to the stroke distribution of the Chinese character image; and the non-uniform reticle is determined according to the histogram projection of the Chinese character image in the horizontal direction and the vertical direction.

Further, in the Step5, the selected template standard library is a national '863' standard handwritten Chinese character sample database-HCL 2000.

The invention has the beneficial effects that: the method mainly solves the problems of poor pertinence, low efficiency and the like when the prior art identifies the handwritten Chinese characters with various colors, and increases the effectiveness and the accuracy of identifying the handwritten Chinese characters with various colors by depending on a computer at present.

Drawings

FIG. 1 is a schematic of the overall flow of the present invention;

FIG. 2 is a schematic diagram of image pre-processing according to the present invention;

FIG. 3 is a schematic diagram of a character feature extraction process according to the present invention;

FIG. 4 is a schematic diagram of character recognition according to the present invention.

Detailed Description

The invention is further described with reference to the following drawings and detailed description.

Example 1: as shown in fig. 1-4, a method for extracting color-differentiated handwritten Chinese characters is characterized in that: after acquiring a picture of the handwritten Chinese character with various colors, firstly carrying out graying and binarization processing to remove redundant traces; then, removing Gaussian additive white noise in the picture by using mean filtering, and removing background information in the picture by using binarization again; then, performing operations such as row segmentation, column segmentation and the like on the Chinese characters by using a threshold approximation method, performing single character normalization processing and thinning processing on the Chinese characters, and extracting the characteristics of the Chinese characters; and finally, solving the distance between the feature vector of the character to be recognized and the feature vector of the character in the standard handwriting sample database by using a Mahalanobis distance formula, and selecting the character with the minimum corresponding distance for recognition and output.

The method comprises the following specific steps:

step3.3: line character segmentation; the method adopts a threshold approximation algorithm to carry out segmentation, carries out character segmentation on the basis, specifically realizes character segmentation by adopting three algorithms of a segmentation line thinning algorithm, an over-segmentation elimination algorithm and an overlapped character stroke breaking processing algorithm in consideration of the occurrence of overlapping, disjunctive and other situations of handwriting, and obtains single character data Q: { Q₁,q₂,…,q_n,…,q_m}；

Step3.5: thinning; subdividing the data Q' by using a table look-up methodObtaining the refined single character data Q': Q ″₁″,q₂″,…,q_n″,…,q_m″}；

While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit and scope of the present invention.

Claims

1. A method for extracting color-differentiated handwritten Chinese characters is characterized by comprising the following steps: after acquiring a picture of the handwritten Chinese character with various colors, firstly carrying out graying and binarization processing to remove redundant traces; then, removing Gaussian additive white noise in the picture by using mean filtering, and removing background information in the picture by using binarization again; then, the threshold approximation method is used for carrying out row segmentation and column segmentation on the Chinese characters, single character normalization processing and thinning processing are carried out on the Chinese characters, and feature extraction is carried out on the Chinese characters; finally, the distance between the feature vector of the character to be recognized and the feature vector of the character in the standard handwriting sample database is calculated by applying a Mahalanobis distance formula, and the character with the minimum corresponding distance is selected for recognition and output;

the method comprises the following specific steps:

step3.1: smoothing and denoising; adopting a mean value filtering method to obtain a picture P'_i,i∈[1,N]Middle and high frequency componentFiltering;

step3.3: line character segmentation; the method adopts a threshold approximation algorithm to carry out segmentation, carries out character segmentation on the basis, specifically realizes character segmentation by adopting three algorithms of a segmentation line thinning algorithm, an over-segmentation elimination algorithm and an overlapped character stroke breaking processing algorithm in consideration of overlapping and disjunctive appearance of handwriting, and obtains single character data Q: { Q₁,q₂,…,q_n,…,q_m}；

2. The method for extracting color-differentiated handwritten Chinese characters as claimed in claim 1, characterized in that: in Step2, adjusting a threshold parameter M1 according to a binarization threshold method, and binarizing the teacher's handwriting and the student's handwriting so that the pixel value of the teacher's handwriting is 0 and the pixel value of the student's handwriting is 255.

3. The method for extracting color-differentiated handwritten Chinese characters as claimed in claim 1, characterized in that: in the step Step3.2, the threshold parameter M2 is adjusted according to a binarization threshold method, so that the Chinese characters in the picture are reserved and the background in the picture is removed.

4. The method for extracting color-differentiated handwritten Chinese characters as claimed in claim 1, characterized in that: in the step Step3.4, the images are sequentially subjected to coordinate centering, X-sharpening normalization, scaling normalization and rotation normalization, and finally a Chinese character lattice with the size of 64 multiplied by 64 is obtained.

5. The method for extracting color-differentiated handwritten Chinese characters as claimed in claim 1, characterized in that: in the step Step3.5, image thinning refers to removing some points from the original image, but still keeping the original shape; judging whether one point can be removed or not takes the condition of 8 adjacent points as a criterion, namely, the conditions that an internal point cannot be removed, an isolated point cannot be removed, a straight line end point cannot be removed and if P is a boundary point and after P is removed, if the connected component is not increased, P can be removed are met.

6. The method for extracting color-differentiated handwritten Chinese characters as claimed in claim 1, characterized in that: in Step4, the elastic grid is a non-uniform grid obtained by dividing Chinese characters by non-uniform grid lines according to the stroke distribution of the Chinese character image; and the non-uniform reticle is determined according to the histogram projection of the Chinese character image in the horizontal direction and the vertical direction.

7. The method for extracting color-differentiated handwritten Chinese characters as claimed in claim 1, characterized in that: in the Step5, the selected template standard library is a national 863 standard handwritten Chinese character sample database-HCL 2000.