CN110232337A - Chinese character image stroke extraction based on full convolutional neural networks, system - Google Patents

Chinese character image stroke extraction based on full convolutional neural networks, system Download PDF

Info

Publication number
CN110232337A
CN110232337A CN201910454930.7A CN201910454930A CN110232337A CN 110232337 A CN110232337 A CN 110232337A CN 201910454930 A CN201910454930 A CN 201910454930A CN 110232337 A CN110232337 A CN 110232337A
Authority
CN
China
Prior art keywords
stroke
overlapping region
chinese character
character image
extraction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910454930.7A
Other languages
Chinese (zh)
Other versions
CN110232337B (en
Inventor
刘成林
王铁强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201910454930.7A priority Critical patent/CN110232337B/en
Publication of CN110232337A publication Critical patent/CN110232337A/en
Application granted granted Critical
Publication of CN110232337B publication Critical patent/CN110232337B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/22Character recognition characterised by the type of writing
    • G06V30/226Character recognition characterised by the type of writing of cursive writing
    • G06V30/2268Character recognition characterised by the type of writing of cursive writing using stroke segmentation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Character Discrimination (AREA)

Abstract

The invention belongs to computer vision and area of pattern recognition, and in particular to a kind of Chinese character image stroke extraction based on full convolutional neural networks, system, it is intended to it is difficult to solve the problems, such as that the handwritten character strokes of Free Writing are extracted.The method of the present invention includes: to carry out extracted region to the Chinese character image of acquisition;Skeletonizing operation is carried out to overlapping region, non-overlapping region;The coherent degree between any stroke section of overlapping region after calculating skeletonizing;The stroke section for belonging to same stroke in overlapping region is connected, is merged into complete matrix morphology stroke with the stroke section being connected directly in non-overlapping region.One aspect of the present invention is in the case where the overlapping of the handwritten Chinese character stroke of Free Writing, the Strokes extraction of handwritten Chinese character still may be implemented, on the other hand it uses character synthetic method and obtains training sample, and its subsidiary different labeled information in different task, greatly save human cost.

Description

Chinese character image stroke extraction based on full convolutional neural networks, system
Technical field
The invention belongs to computer vision and area of pattern recognition, and in particular to a kind of based on full convolutional neural networks Chinese character image stroke extraction, system.
Background technique
The Strokes extraction of Chinese character image has weight in the Text region research and related application based on structural analysis Want status.Chinese handwritten/print character individual character classification based on depth learning technology has been achieved for quite high accuracy, But in many applications, people are not only concerned about the classification of character, also concern stroke explanation, writing quality evaluation, shape beauty The problems such as change, font design, and this just needs that the stroke in character image is split and is extracted.
For the Strokes extraction problem of off line Chinese character, the past has algorithm, and there are two main classes: direct extraction method and base In the extracting method of character skeleton.Wherein, the method directly extracted is mainly used for printed character, has smoothly in character picture Edge, simple stroke shape, the stroke width of fixation and clearly between stroke when relationship, the effects of this kind of methods compared with Well, such as the researchers such as Tseng and Chuang [1] sum up the rule of some versatilities from the charcter topology of a variety of printing type faces Rule, carries out Strokes extraction by heuristic rule;Print character is pen according to similar regular cutting by [2] such as Cao and Tan Section (total 3 seed types) are drawn, these stroke sections are screened again later, are reassembled as independent stroke;Lee and Wu [3] is by print character Image is expressed as the form of figure (Graph), and the connection between stroke section is inferred according to contour feature in the overlapping region of stroke Relationship;Chen et al. [4] then learns two-dimensional manifold out from standard letter, then with template character (wherein pen corresponding with manifold Draw to have extracted and finish) instruct the Strokes extraction of true printing specimen.When handling off-line handwritten character image, due to free hand Write characters diversity and complexity with higher in relationship between stroke shape and stroke, are directly extracted using heuristic rule Stroke is extremely difficult to ideal effect.Therefore, existing major part carries out the work of Strokes extraction all for off-line handwritten character It is to be operated on character skeleton, the Strokes extraction task in this operation handlebar connected region grade is reduced in lines rank Extraction [5].When carrying out Strokes extraction on character skeleton, rule and direct extraction method used by existing major part method In relevant portion be similar.Strokes extraction based on skeletonizing faces backbone distortion (especially stroke overlapping region), pen The problem of drawing section connection ambiguity, does not well solve method still so far.
Generally speaking, although researchers propose much about Strokes extraction in Chinese printing/hand-written character image Method, but the character for still comparing specification being primarily upon.For the hand-written character of Free Writing, due to stroke form and position It sets changeable, in addition the case where stroke overlapping region is extremely complex, brings huge challenge to Strokes extraction, existing method is not yet Provide satisfactory result.
Following documents is technical background data related to the present invention:
[1]Lin Yu Tseng and Chen-Tsun Chuang."An efficient knowledge-based stroke extraction method for multi-font Chinese characters."Pattern Recognition,25(12):1445-1458,1992.
[2]Ruini Cao and Chew Lim Tan."A model of stroke extraction from Chinese character images."In.Proceedings of the 15th International Conference on Pattern Recognition,2000.
[3]Chungnan Lee and Bohom Wu."A Chinese-character-stroke-extraction algorithm based on contour information."Pattern Recognition,31(6):651-663, 1998.
[4]Xudong Chen,Zhouhui Lian,Yingmin Tang,and Jianguo Xiao."An automatic stroke extraction method using manifold learning."In.Proceedings of Eurographics,2017.
[5]Cheng-Lin Liu,In-Jung Kim,and Jin H.Kim."Model-based stroke extraction and matching for handwritten Chinese character recognition." Pattern Recognition,34(12):2339-2352,2001.
[6]Tie-Qiang Wang and Cheng-Lin Liu,"Fully convolutional network based skeletonization for handwritten Chinese characters."AAAI Conference on Artificial Intelligence,2018.
[7]Byungsoo Kim,Oliver Wang,A.Cengizand Markus Gross." Semantic segmentation for line drawing vectorization using neural networks." In Proceedings of Eurographics,2017.
Summary of the invention
In order to solve the above problem in the prior art, i.e. the handwritten character strokes of Free Writing extract difficult problem, The present invention provides a kind of Chinese character image stroke extraction based on full convolutional neural networks, changing extracting method includes:
Step S10 obtains Chinese character image as input picture;
Step S20 extracts the overlapping region figure of stroke in the input picture;The input picture removes the friendship Folded region part is non-overlapping administrative division map;
Step S30 carries out skeletonizing operation to the overlapping region figure, non-overlapping administrative division map respectively, obtains overlapping region Matrix morphology stroke section set, non-overlapping region framework form stroke section set;
Step S40 is based on the overlapping region matrix morphology stroke section set, calculates and links up between any two stroke section Spend matrix;All elements are all larger than or belong to same stroke equal to two stroke sections of preset threshold in the coherent degree matrix;
The stroke section for belonging to same stroke in the overlapping region is connected by step S50, and by the stroke section with it is described The stroke section being connected directly in non-overlapping region is merged into complete matrix morphology stroke.
In some preferred embodiments, " Chinese character image is obtained as input picture " in step S10, method Are as follows:
Collected Chinese character image is obtained, global Binarization methods or local auto-adaptive based on OTSU method are passed through Binarization methods remove the background of the Chinese character image of acquisition, obtain the foreground image of Chinese character image, and by the prospect Image is as input picture.
In some preferred embodiments, " overlapping region of stroke in the input picture is extracted in step S20 Figure;It is non-overlapping administrative division map that the input picture, which removes the overlapping region part, ", method are as follows:
Step S201 is based on the input picture, extracts network constricted path by overlapping region and extracts the input figure The feature of picture;
Step S202, it is symmetrical by extracting network constricted path with overlapping region based on the feature of the input picture Path expander is inversely generated, and stroke overlapping region figure is obtained;The input picture removes the overlapping region part For non-overlapping administrative division map;
Wherein, it is to be constructed based on full convolutional neural networks for extracting the input figure that the overlapping region, which extracts network, The network of the overlapping region figure of stroke as in.
In some preferred embodiments, " the degree matrix that links up between any two stroke section is calculated " in step S40, side Method are as follows:
Step S401 chooses any two stroke section in the overlapping region matrix morphology stroke section set, is denoted as respectively S1、S2
Step S402, in the stroke section S1、S2On uniformly choose N number of point, be denoted as set respectively
Step S403 calculates the set using the full convolutional network of conditionalWithMiddle any two point belongs to the probability of same stroke, obtains N × N number of probability, constitutes stroke section S1、S2 Between coherent degree matrix.
In some preferred embodiments, the training sample of the skeletal extraction network, acquisition methods are as follows:
Step B10 using the stroke coordinate point sequence of hand script Chinese input equipment character as the skeleton of composite characters image, and sets pen Draw width;
Step B20, based on the skeleton of the composite characters image, according to the stroke width of setting, by the composite characters The skeleton expansion of image is to have the stroke of width, obtains composite characters image;The corresponding skeleton of the composite characters image For the training sample of the skeletal extraction network.
In some preferred embodiments, the overlapping region extracts the training sample of network, acquisition methods are as follows:
Step G10, the step of using the above-mentioned Chinese character image stroke extraction based on full convolutional neural networks The method of B10- step B20 obtains composite characters image;
It is corresponding to calculate the composite characters image for step G20, the stroke coordinate point sequence information based on composite characters image Stroke overlapping region;The composite characters image extracts the instruction of network with corresponding stroke overlapping region for the overlapping region Practice sample.
In some preferred embodiments, " stroke overlapping region matrix morphology stroke section collection is obtained in step S30 Close, the non-overlapping region framework form stroke section set of stroke " it is additionally provided with the Optimization Steps of matrix morphology stroke section later, Its method are as follows:
The center of gravity for calculating each overlapping region in the overlapping region figure obtains the adjacent institute in the center of gravity corresponding region There is skeletal point, overlapping region center of gravity is connect in skeleton drawing one by one with adjacent skeletal point, the overlapping region bone after being optimized Frame form stroke section set;
Based on the non-overlapping region of the stroke, Skeleton pixel point is recalled by the method for cluster, after being optimized Non-overlapping region framework form stroke section set.
In some preferred embodiments, in step S50 " the stroke section for belonging to same stroke in overlapping region is connected, And the stroke section being connected directly in the stroke section and non-overlapping region is merged into complete matrix morphology stroke " also set later It is equipped with the step of raw stroke form is restored, method are as follows:
Pixel in the input picture is associated with the complete matrix morphology stroke, obtains Chinese character image pen It draws, is the raw stroke form of input picture.
Another aspect of the present invention proposes a kind of Chinese character image Strokes extraction system based on full convolutional neural networks System, including input module, region extraction module, skeletonizing module, stroke judgment module, skeleton link block, output module;
The input module is configured to obtain Chinese character image as input picture and input;
The region extraction module is configured to extract the overlapping region figure of stroke in the input picture;It is described defeated Entering image to remove the overlapping region part is non-overlapping administrative division map;
The skeletonizing module is configured to carry out skeletonizing operation to the overlapping region figure, non-overlapping administrative division map, obtain Overlapping region matrix morphology stroke section set, non-overlapping region framework form stroke section set;
The stroke judgment module is configured to the overlapping region matrix morphology stroke section set, calculates any two Link up degree matrix between a stroke section;It is described it is coherent degree matrix in all elements be all larger than or equal to preset threshold two strokes Section belongs to same stroke;
The skeleton link block is configured to for the stroke section for belonging to same stroke in the overlapping region being connected, and will The stroke section being connected directly in the stroke section and the non-overlapping region is merged into complete matrix morphology stroke;
The output module is configured to the complete matrix morphology stroke that will acquire output.
The third aspect of the present invention proposes a kind of storage device, wherein be stored with a plurality of program, described program be suitable for by Processor is loaded and is executed to realize the above-mentioned Chinese character image stroke extraction based on full convolutional neural networks.
The fourth aspect of the present invention proposes a kind of processing unit, including processor, storage device;The processor is fitted In each program of execution;The storage device is suitable for storing a plurality of program;Described program be suitable for loaded by processor and executed with Realize the above-mentioned Chinese character image stroke extraction based on full convolutional neural networks.
Beneficial effects of the present invention:
(1) the present invention is based on the Chinese character image stroke extraction of full convolutional neural networks, the conditional of use is complete Convolutional neural networks can fully describe the spatial relationship between stroke form, position and stroke in Chinese print character, nothing Need additional post-processing approach.
(2) the present invention is based on the Chinese character image stroke extractions of full convolutional neural networks, for Chinese handwritten word The changeable problem of the writing style and stroke width of symbol carries out stroke on single pixel width skeleton using skeleton extraction module It extracts, this operation had not only maintained the structure of hand-written character, but also can significantly save calculation amount.
(3) the present invention is based on the Chinese character image stroke extractions of full convolutional neural networks, by Chinese character image Middle stroke overlapping region detected, and individually be handled, and all stroke section set for converging at same overlapping region are obtained, right Stroke section in this set is analyzed two-by-two, describes the two stroke sections with the coherent degree matrix of the stroke between two stroke sections Between relationship, judge that the two stroke sections connect into a stroke, avoid handwritten character strokes extract omit or redundancy.
(4) the present invention is based on the Chinese character image stroke extraction of full convolutional neural networks, a kind of training is provided With character picture synthetic method, millions of off line Chinese handwritten characters images can be automatically generated, and is attached to it not With the different labeled information in task, human cost has greatly been saved.
Detailed description of the invention
By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other Feature, objects and advantages will become more apparent upon:
Fig. 1 is the flow diagram of the Chinese character image stroke extraction the present invention is based on full convolutional neural networks;
Fig. 2 is a kind of friendship of embodiment of Chinese character image stroke extraction the present invention is based on full convolutional neural networks The full convolutional neural networks structural schematic diagram of folded extracted region network and skeletonizing Web vector graphic;
Fig. 3 is a kind of area of embodiment of Chinese character image stroke extraction the present invention is based on full convolutional neural networks Extract network and skeletal extraction network training data training process schematic diagram in domain;
Fig. 4 is a kind of hand of embodiment of Chinese character image stroke extraction the present invention is based on full convolutional neural networks The post-processing approach schematic diagram of write characters skeletonizing;
Fig. 5 is a kind of item of embodiment of Chinese character image stroke extraction the present invention is based on full convolutional neural networks The full convolutional network training method schematic diagram of part formula;
Fig. 6 is a kind of base of embodiment of Chinese character image stroke extraction the present invention is based on full convolutional neural networks In the Strokes extraction exemplary diagram of the coherent degree of stroke.
Specific embodiment
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is only used for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to just Part relevant to related invention is illustrated only in description, attached drawing.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
A kind of Chinese character image stroke extraction based on full convolutional neural networks of the invention, comprising:
Step S10 obtains Chinese character image as input picture;
Step S20 extracts the overlapping region figure of stroke in the input picture;The input picture removes the friendship Folded region part is non-overlapping administrative division map;
Step S30 carries out skeletonizing operation to the overlapping region figure, non-overlapping administrative division map, obtains overlapping region skeleton Form stroke section set, non-overlapping region framework form stroke section set;
Step S40 is based on the overlapping region matrix morphology stroke section set, calculates and links up between any two stroke section Spend matrix;All elements are all larger than or belong to same stroke equal to two stroke sections of preset threshold in the coherent degree matrix;
The stroke section for belonging to same stroke in the overlapping region is connected by step S50, and by the stroke section with it is described The stroke section being connected directly in non-overlapping region is merged into complete matrix morphology stroke.
In order to more clearly to the present invention is based on the progress of the Chinese character image stroke extraction of full convolutional neural networks Illustrate, step each in embodiment of the present invention method is unfolded to be described in detail below with reference to Fig. 1.
The Chinese character image stroke extraction based on full convolutional neural networks of an embodiment of the present invention, including step Rapid S10- step S50, each step are described in detail as follows:
Step S10 obtains Chinese character image as input picture.
Text region is an important branch of CRT technology, and the most difficult problem in identification field it One.The Strokes extraction of Chinese character has critical role in the Text region research and related application based on structural analysis.Text Word identification is divided into Machine printed character recognition and handwritten text identification two major classes again, and Machine printed character recognition is advised due to text formatting Model, strokes sharp, current research have had a great development, and the variation of handwriting font is more, and person's handwriting is connected more, and because Cause the word that similarity is high in handwritten text more for handwriting deformation, therefore there are many identification relative difficulty of handwriting.
" Chinese character image is obtained as input picture " in step S10, method are as follows:
Collected Chinese character image is obtained, global Binarization methods or local auto-adaptive based on OTSU method are passed through Binarization methods remove the background of the Chinese character image of acquisition, obtain the foreground image of Chinese character image, and by the prospect Image is as input picture.The present invention is based on the Chinese character image stroke extractions of full convolutional neural networks, not only can be with The Strokes extraction for carrying out handwritten Chinese character image, equally has well the Strokes extraction of block letter Chinese character image Effect.
The purpose of image binaryzation is the influence eliminating the gray value of stroke pixel and extracting to subsequent stroke, is based on OTSU The global Binarization methods of method are generally used for the more uniform Chinese character image of light application ratio, and local auto-adaptive binaryzation is calculated Method is generally used for the even image of uneven illumination.The method of image binaryzation can be selected in conjunction with the characteristics of image there are also very much Suitable image binaryzation method, this is no longer going to repeat them.
Step S20 extracts the overlapping region figure of stroke in the input picture;The input picture removes the friendship Folded region part is non-overlapping administrative division map.
In the embodiment of the present invention, network is extracted using the overlapping region constructed based on full convolutional neural networks, is extracted The region removed outside overlapping region after the overlapping region of input picture, in input picture is non-overlapping region.Overlapping region mentions Network is taken symmetrically partially to be formed by two: constricted path and path expander.Constricted path is used to extract characteristics of image, is characterized in A certain class object is different from corresponding (essence) feature of other class objects or the set of characteristic or these features and characteristic, special Sign is by measuring or handling the data that can be extracted.Feature of the path expander based on acquisition is inversely generated, and character is obtained Stroke overlapping region figure.
Step S201 is based on the input picture, extracts the input by the constricted path that overlapping region extracts network The feature of image.
Step S202, it is symmetrical by extracting network constricted path with overlapping region based on the feature of the input picture Path expander is inversely generated, and stroke overlapping region figure is obtained;The input picture removes the overlapping region part For non-overlapping administrative division map.
Wherein, it is to be constructed based on full convolutional neural networks for extracting the input figure that the overlapping region, which extracts network, The network of the overlapping region figure of stroke as in.
Step S30 carries out skeletonizing operation to the overlapping region figure, non-overlapping administrative division map, obtains overlapping region skeleton Form stroke section set, non-overlapping region framework form stroke section set.
In the embodiment of the present invention, stroke overlapping region figure, non-overlapping area are carried out using based on full convolutional neural networks The skeletonizing of domain figure operates.
Fig. 2 is that the present invention is based on a kind of friendships of embodiment of Chinese character image stroke extraction of full convolutional neural networks The full convolutional neural networks structural schematic diagram of extracted region Web vector graphic is folded, picture of the upper left corner with text is the defeated of network in figure Entering picture, picture of the upper right corner with two o'clock is the overlapping region image of network output, remaining block represents convolutional calculation unit, in order to It is easy to understand and describes, in Fig. 2 of the present invention and example, each computing unit is made of three-layer coil lamination, the downward arrow in left side Head represents the down-sampling process in constricted path, and the upward arrow in right side represents the upper sampling process in path expander, and two right Claim the even depth position in path, i.e., arrow to the right represents the connection of residual error formula.
Overlapping region extracts the training of the full convolutional neural networks of network and skeletal extraction Web vector graphic, it is desirable that millions of The hand-written character sample data of meter, each sample labeling goes out stroke overlapping region, and carries out people by obtaining hand-written character image Work label is difficult to obtain the training sample of such quantity.The present invention is by with hand script Chinese input equipment character composite characters image, because of connection Machine hand-written character has stroke track information (stroke coordinate point sequence), it is easy to calculate stroke overlapping region, avoid manually marking The burden of note, so as to obtain a large amount of training sample in a short time.By the stroke point sequence of hand script Chinese input equipment character as The skeleton of character picture, and to each stroke assign a stroke width, skeleton is extended to the stroke of width, just obtain with The consistent synthesis character image of true character image, wherein the intersection of the pixel of different strokes is exactly stroke overlapping region.
The training sample of the skeletal extraction network, acquisition methods are as follows:
Step B10 using the stroke coordinate point sequence of hand script Chinese input equipment character as the skeleton of composite characters image, and sets pen Draw width.
Step B20, based on the skeleton of the composite characters image, according to the stroke width of setting, by the composite characters The skeleton expansion of image is to have the stroke of width, obtains composite characters image;The corresponding skeleton of the composite characters image For the training sample of the skeletal extraction network.
The overlapping region extracts the training sample of network, acquisition methods are as follows:
Step G10, the step of using the above-mentioned Chinese character image stroke extraction based on full convolutional neural networks The method of B10- step B20 obtains composite characters image;
It is corresponding to calculate the composite characters image for step G20, the stroke coordinate point sequence information based on composite characters image Stroke overlapping region;The composite characters image extracts the instruction of network with corresponding stroke overlapping region for the overlapping region Practice sample.
Fig. 3 is that the present invention is based on a kind of areas of embodiment of Chinese character image stroke extraction of full convolutional neural networks Network and skeletal extraction network training data training process schematic diagram are extracted in domain, and the module of first row is that training sample is defeated Enter, secondary series represents four groups of convolution units, and third column, the 4th column, the 5th column, the 6th each convolution unit arranged are different scales Under conventional convolution branch, the 7th be classified as network training output result.When being played a role jointly due to convolution and pondization operation, meeting Reduce picture size at double, so when need to up-sample layer to restore picture size, using can learn in the embodiment of the present invention It up-samples layer and up-samples operation instead of bilinear interpolation, model is enabled to recover more image details.Final Multi-scale feature fusion stage, the present invention are directly merged last forecast image using convolution operation and obtain final output figure Picture, this operation, which can make full use of in bigger receptive field more local messages come the central point for inferring current receptive field, is It is no to be judged as skeletal point.
" overlapping region matrix morphology stroke section set, non-overlapping region framework form stroke section collection are obtained in step S30 Close " it is additionally provided with the Optimization Steps of matrix morphology stroke, method later are as follows:
The center of gravity for calculating each overlapping region in the overlapping region figure obtains the adjacent institute in the center of gravity corresponding region There is skeletal point, overlapping region center of gravity is connect in skeleton drawing one by one with adjacent skeletal point, the overlapping region bone after being optimized Frame form stroke section set;
Based on the non-overlapping region, Skeleton pixel point is recalled by the method for cluster, the non-overlapping area after being optimized Domain matrix morphology stroke section set.
As shown in figure 4, for the present invention is based on a kind of realities of Chinese character image stroke extraction of full convolutional neural networks The post-processing approach schematic diagram of the hand-written character skeletonizing of example is applied, K-means is K mean cluster method, bwmorph_thin@ Matlab is traditional thinning algorithm, and sigmoid represents sigmoid function, also referred to as S sigmoid growth curve.It is poly- by K-means Class, by pixel in image be generally divided into the transition point between skeletal point, non-skeleton point, skeletal point and non-skeleton point these three Classification.By this operation, most skeletal points can be called back, but inevitably bring the superfluous of a small amount of non-skeleton Yu Dian, in this regard, it is superfluous to use the simple rule in tradition refinement (Thinning) algorithm to leave out these in an example of the invention Yu Dian guarantees that the lines in skeleton drawing are single pixel width.Next, for the overlapping region outlined in figure with rectangle, The region (the as starting point of four arrows in figure) is indicated using the center of gravity in the region, this region produces four with skeletal point Abutment points (terminal of four arrows in figure), then focus point and four abutment points are connected in skeleton drawing, it can be to this region Obtain an ideal skeletonizing result.
Step S40 is based on the overlapping region matrix morphology stroke section set, calculates and links up between any two stroke section Spend matrix;All elements are all larger than or belong to same stroke equal to two stroke sections of preset threshold in the coherent degree matrix.
In the embodiment of the present invention, the degree square that links up between any two stroke section is carried out using the full convolutional neural networks of conditional Battle array calculates.The full convolutional network of conditional is constructed based on full convolutional neural networks, using the conditional method training of basic point guidance:
The input of network is binary channels form, is exported as single channel form, and first channel of input is hand-written character figure Picture, second channel are a mask of basic point, this mask keeps identical with character picture size, only in retrieval pixel Numerical value is 1 at the coordinate of basic point, and other positions numerical value is 0.Mask serve as full convolutional network conditional input, network it is defeated It is out the complete stroke comprising basic point in mask.Wherein, VDSR unit is Standard convolution unit, and the structure of itself is very simple, It is only made of convolutional layer, active coating and batch normalization layer, in this department pattern of the invention, the quantity of VDSR unit is depended on Convolution kernel size and the size of image the two variables, i.e. VDSR unit need successively to stack until the receptive field of model covers completely Until covering whole picture.
As shown in figure 5, for the present invention is based on a kind of realities of Chinese character image stroke extraction of full convolutional neural networks The full convolutional network training method schematic diagram of conditional of example is applied, left side is that the binary channels of network inputs, hand-written character image, basic point A mask, right side be network single channel export, for the complete stroke comprising basic point in mask, intermediate VDSR UNITS For Standard convolution unit, conv represents convolutional layer, and ReLU represents active coating, and BatchNorm represents batch normalization layer.
Step S401 chooses any two stroke section in the overlapping region matrix morphology stroke section set, is denoted as respectively S1、S2
Step S402, in the stroke section S1、S2On uniformly choose N number of point, be denoted as set respectively
Step S403 calculates the set using the full convolutional network of conditionalWithMiddle any two point belongs to the probability of same stroke, obtains N × N number of probability, constitutes stroke section S1、S2 Between coherent degree matrix.
The probability that any two point in two set belongs to same stroke is calculated, shown in method such as formula (1):
Wherein, pnnForWithBelong to the probability of same stroke, fvdsr() represents the full convolutional network of conditional.
S1、S2Between coherent degree matrix, as shown in formula (2):
In one embodiment of the invention, set the value of matrix all greater than or be equal to preset threshold value 0.5, then stroke Section S1And S2Belong to the same stroke.
As shown in fig. 6, for the present invention is based on a kind of realities of Chinese character image stroke extraction of full convolutional neural networks The Strokes extraction exemplary diagram based on the coherent degree of stroke of example is applied, the upper left corner is handwritten Chinese character sample instance, and the upper right corner is hand The skeleton drawing of Chinese character sample instance is write, the point in the figure of the lower left corner is reconnaissance example in overlapping region stroke section, and the lower right corner is The result of handwritten Chinese character sample extraction stroke exports.
The stroke section for belonging to same stroke in the overlapping region is connected by step S50, and by the stroke section with it is described The stroke section being connected directly in non-overlapping region is merged into complete matrix morphology stroke.
In step S50 " the stroke section for belonging to same stroke in overlapping region is connected, and by the stroke section with it is non-overlapping The stroke section being connected directly in region is merged into complete matrix morphology stroke " it is additionally provided with the recovery of raw stroke form later Step, method are as follows:
Pixel in the input picture is associated with the complete matrix morphology stroke, obtains Chinese character image pen It draws, is the raw stroke form of input picture.
The Chinese character image Strokes extraction system based on full convolutional neural networks of second embodiment of the invention, including it is defeated Enter module, region extraction module, skeletonizing module, stroke judgment module, skeleton link block, output module;
The input module is configured to obtain Chinese character image as input picture and input;
The region extraction module is configured to extract the overlapping region figure of stroke in the input picture;It is described defeated Entering image to remove the overlapping region part is non-overlapping administrative division map;
The skeletonizing module is configured to carry out skeletonizing operation to the overlapping region figure, non-overlapping administrative division map, obtain Overlapping region matrix morphology stroke section set, non-overlapping region framework form stroke section set;
The stroke judgment module is configured to the overlapping region matrix morphology stroke section set, calculates any two Link up degree matrix between a stroke section;It is described it is coherent degree matrix in all elements be all larger than or equal to preset threshold two strokes Section belongs to same stroke;
The skeleton link block is configured to for the stroke section for belonging to same stroke in the overlapping region being connected, and will The stroke section being connected directly in the stroke section and the non-overlapping region is merged into complete matrix morphology stroke;
The output module is configured to the complete matrix morphology stroke that will acquire output.
Person of ordinary skill in the field can be understood that, for convenience and simplicity of description, foregoing description The specific work process of system and related explanation, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.
It should be noted that the Chinese character image Strokes extraction provided by the above embodiment based on full convolutional neural networks System only the example of the division of the above functional modules in practical applications, can according to need and by above-mentioned function Can distribution completed by different functional modules, i.e., by the embodiment of the present invention module or step decompose or combine again, For example, the module of above-described embodiment can be merged into a module, multiple submodule can also be further split into, with complete with The all or part of function of upper description.For module involved in the embodiment of the present invention, the title of step, it is only for area Divide modules or step, is not intended as inappropriate limitation of the present invention.
A kind of storage device of third embodiment of the invention, wherein being stored with a plurality of program, described program is suitable for by handling Device is loaded and is executed to realize the above-mentioned Chinese character image stroke extraction based on full convolutional neural networks.
A kind of processing unit of fourth embodiment of the invention, including processor, storage device;Processor is adapted for carrying out each Program;Storage device is suitable for storing a plurality of program;Described program is suitable for being loaded by processor and being executed to realize above-mentioned base In the Chinese character image stroke extraction of full convolutional neural networks.
Person of ordinary skill in the field can be understood that, for convenience and simplicity of description, foregoing description The specific work process and related explanation of storage device, processing unit, can refer to corresponding processes in the foregoing method embodiment, Details are not described herein.
Those skilled in the art should be able to recognize that, mould described in conjunction with the examples disclosed in the embodiments of the present disclosure Block, method and step, can be realized with electronic hardware, computer software, or a combination of the two, software module, method and step pair The program answered can be placed in random access memory (RAM), memory, read-only memory (ROM), electrically programmable ROM, electric erasable and can compile Any other form of storage well known in journey ROM, register, hard disk, moveable magnetic disc, CD-ROM or technical field is situated between In matter.In order to clearly demonstrate the interchangeability of electronic hardware and software, in the above description according to function generally Describe each exemplary composition and step.These functions are executed actually with electronic hardware or software mode, depend on technology The specific application and design constraint of scheme.Those skilled in the art can carry out using distinct methods each specific application Realize described function, but such implementation should not be considered as beyond the scope of the present invention.
Term " includes " or any other like term are intended to cover non-exclusive inclusion, so that including a system Process, method, article or equipment/device of column element not only includes those elements, but also including being not explicitly listed Other elements, or further include the intrinsic element of these process, method, article or equipment/devices.
So far, it has been combined preferred embodiment shown in the drawings and describes technical solution of the present invention, still, this field Technical staff is it is easily understood that protection scope of the present invention is expressly not limited to these specific embodiments.Without departing from this Under the premise of the principle of invention, those skilled in the art can make equivalent change or replacement to the relevant technologies feature, these Technical solution after change or replacement will fall within the scope of protection of the present invention.

Claims (11)

1. a kind of Chinese character image stroke extraction based on full convolutional neural networks, which is characterized in that the extracting method Include:
Step S10 obtains Chinese character image as input picture;
Step S20 extracts the overlapping region figure of stroke in the input picture;The input picture removes the crossover region Domain part is non-overlapping administrative division map;
Step S30 carries out skeletonizing operation to the overlapping region figure, non-overlapping administrative division map, obtains overlapping region matrix morphology Stroke section set, non-overlapping region framework form stroke section set;
Step S40 is based on the overlapping region matrix morphology stroke section set, calculates the degree square that links up between any two stroke section Battle array;All elements are all larger than or belong to same stroke equal to two stroke sections of preset threshold in the coherent degree matrix;
The stroke section for belonging to same stroke in the overlapping region is connected by step S50, and by the stroke section and the non-friendship The stroke section being connected directly in folded region is merged into complete matrix morphology stroke.
2. the Chinese character image stroke extraction according to claim 1 based on full convolutional neural networks, feature It is, " obtains Chinese character image as input picture " in step S10, method are as follows:
Collected Chinese character image is obtained, global Binarization methods or local auto-adaptive two-value based on OTSU method are passed through Change algorithm, remove the background of the Chinese character image of acquisition, obtain the foreground image of Chinese character image, and by the foreground image As input picture.
3. the Chinese character image stroke extraction according to claim 1 based on full convolutional neural networks, feature It is, " extracts the overlapping region figure of stroke in the input picture in step S20;The input picture removes the friendship Folded region part is non-overlapping administrative division map ", method are as follows:
Step S201 is based on the input picture, extracts network constricted path by overlapping region and extracts the input picture Feature;
Step S202 is symmetrically expanded based on the feature of the input picture by extracting network constricted path with overlapping region Path is inversely generated, and overlapping region figure is obtained;It is non-overlapping region that the input picture, which removes the overlapping region part, Figure;
Wherein, it is to be constructed based on full convolutional neural networks for extracting in the input picture that the overlapping region, which extracts network, The network of the overlapping region figure of stroke.
4. the Chinese character image stroke extraction according to claim 1 based on full convolutional neural networks, feature It is, " calculates the degree matrix that links up between any two stroke section " in step S40, method are as follows:
Step S401 chooses any two stroke section in the overlapping region matrix morphology stroke section set, is denoted as S respectively1、S2
Step S402, in the stroke section S1、S2On uniformly choose N number of point, be denoted as set respectively
Step S403 calculates the set using the full convolutional network of conditionalWithMiddle any two point belongs to the probability of same stroke, obtains N × N number of probability, constitutes stroke section S1、S2 Between coherent degree matrix.
5. the Chinese character image stroke extraction according to claim 1 based on full convolutional neural networks, feature It is, the training sample of the skeletal extraction network, acquisition methods are as follows:
Step B10 using the stroke coordinate point sequence of hand script Chinese input equipment character as the skeleton of composite characters image, and sets stroke width Degree;
Step B20, based on the skeleton of the composite characters image, according to the stroke width of setting, by the composite characters image Skeleton expansion be to have the stroke of width, obtain composite characters image;The corresponding skeleton of the composite characters image is institute State the training sample of skeletal extraction network.
6. the Chinese character image stroke extraction according to claim 5 based on full convolutional neural networks, the friendship The training sample of folded extracted region network, acquisition methods are as follows:
Step G10, using the Chinese character image stroke extraction based on full convolutional neural networks described in claim 5 The method of step B10- step B20 obtains composite characters image;
Step G20, the stroke coordinate point sequence information based on composite characters image calculate the corresponding pen of the composite characters image Draw overlapping region;The composite characters image extracts the training sample of network with corresponding stroke overlapping region for the overlapping region This.
7. the Chinese character image stroke extraction according to claim 1 based on full convolutional neural networks, step S30 In " obtain overlapping region matrix morphology stroke section set, non-overlapping region framework form stroke section set " after be additionally provided with bone The Optimization Steps of frame form stroke section, method are as follows:
The center of gravity for calculating each overlapping region in the stroke overlapping region obtains the center of gravity corresponding region adjoining Overlapping region center of gravity is connect in skeleton drawing, the overlapping region after being optimized by all skeletal points one by one with adjacent skeletal point Matrix morphology stroke section set;
Based on the non-overlapping region of the stroke, Skeleton pixel point is recalled by the method for cluster, the non-friendship after being optimized Folded region framework form stroke.
8. the Chinese character image Strokes extraction side according to claim 1-7 based on full convolutional neural networks Method, in step S50 " the stroke section for belonging to same stroke in the overlapping region is connected, and by the stroke section with it is described non- The stroke section being connected directly in overlapping region is merged into complete matrix morphology stroke " after to be additionally provided with raw stroke form extensive Multiple step, method are as follows:
Pixel in the character picture of the acquisition is associated with the complete matrix morphology stroke, obtains Chinese character image pen It draws, is the raw stroke form of input picture.
9. a kind of Chinese character image Strokes extraction system based on full convolutional neural networks, which is characterized in that including inputting mould Block, region extraction module, skeletonizing module, stroke judgment module, skeleton link block, output module;
The input module is configured to obtain Chinese character image as input picture and input;
The region extraction module is configured to extract the overlapping region figure of stroke in the input picture;The input figure It is non-overlapping administrative division map as removing the overlapping region part;
The skeletonizing module is configured to carry out skeletonizing operation to the stroke overlapping region figure, non-overlapping administrative division map, Obtain overlapping region matrix morphology stroke section set, non-overlapping region framework form stroke section set;
The stroke judgment module is configured to the overlapping region matrix morphology stroke section set, calculates any two pen Draw the degree matrix that links up between section;It is described it is coherent degree matrix in all elements be all larger than or equal to preset threshold value two stroke sections Belong to same stroke;
The skeleton link block is configured to for the stroke section for belonging to same stroke in the overlapping region being connected, and will be described The stroke section being connected directly in stroke section and the non-overlapping region is merged into complete matrix morphology stroke;
The output module is configured to the complete matrix morphology stroke that will acquire output.
10. a kind of storage device, wherein being stored with a plurality of program, which is characterized in that described program is suitable for by processor load simultaneously It executes to realize the described in any item Chinese character image Strokes extraction sides based on full convolutional neural networks claim 1-8 Method.
11. a kind of processing unit, including
Processor is adapted for carrying out each program;And
Storage device is suitable for storing a plurality of program;
It is characterized in that, described program is suitable for being loaded by processor and being executed to realize:
The described in any item Chinese character image stroke extractions based on full convolutional neural networks of claim 1-8.
CN201910454930.7A 2019-05-29 2019-05-29 Chinese character image stroke extraction method and system based on full convolution neural network Active CN110232337B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910454930.7A CN110232337B (en) 2019-05-29 2019-05-29 Chinese character image stroke extraction method and system based on full convolution neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910454930.7A CN110232337B (en) 2019-05-29 2019-05-29 Chinese character image stroke extraction method and system based on full convolution neural network

Publications (2)

Publication Number Publication Date
CN110232337A true CN110232337A (en) 2019-09-13
CN110232337B CN110232337B (en) 2021-02-02

Family

ID=67858809

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910454930.7A Active CN110232337B (en) 2019-05-29 2019-05-29 Chinese character image stroke extraction method and system based on full convolution neural network

Country Status (1)

Country Link
CN (1) CN110232337B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110660019A (en) * 2019-09-29 2020-01-07 华北电力大学 Small data set simplified stroke generation method based on BPL
CN110969681A (en) * 2019-11-29 2020-04-07 山东浪潮人工智能研究院有限公司 Method for generating handwriting characters based on GAN network
CN112862025A (en) * 2021-03-08 2021-05-28 成都字嗅科技有限公司 Chinese character stroke filling method, system, terminal and medium based on computer

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1271140A (en) * 1999-04-21 2000-10-25 中国科学院自动化研究所 Handwriting identifying method based on grain analysis
CN101853126A (en) * 2010-05-12 2010-10-06 中国科学院自动化研究所 Real-time identification method for on-line handwriting sentences
CN105205448A (en) * 2015-08-11 2015-12-30 中国科学院自动化研究所 Character recognition model training method based on deep learning and recognition method thereof
US20170017835A1 (en) * 2013-06-09 2017-01-19 Apple Inc. Multi-script handwriting recognition using a universal recognizer
CN108229397A (en) * 2018-01-04 2018-06-29 华南理工大学 Method for text detection in image based on Faster R-CNN
CN108345850A (en) * 2018-01-23 2018-07-31 哈尔滨工业大学 The scene text detection method of the territorial classification of stroke feature transformation and deep learning based on super-pixel
CN108345833A (en) * 2018-01-11 2018-07-31 深圳中兴网信科技有限公司 The recognition methods of mathematical formulae and system and computer equipment
CN106446896B (en) * 2015-08-04 2020-02-18 阿里巴巴集团控股有限公司 Character segmentation method and device and electronic equipment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1271140A (en) * 1999-04-21 2000-10-25 中国科学院自动化研究所 Handwriting identifying method based on grain analysis
CN101853126A (en) * 2010-05-12 2010-10-06 中国科学院自动化研究所 Real-time identification method for on-line handwriting sentences
US20170017835A1 (en) * 2013-06-09 2017-01-19 Apple Inc. Multi-script handwriting recognition using a universal recognizer
CN106446896B (en) * 2015-08-04 2020-02-18 阿里巴巴集团控股有限公司 Character segmentation method and device and electronic equipment
CN105205448A (en) * 2015-08-11 2015-12-30 中国科学院自动化研究所 Character recognition model training method based on deep learning and recognition method thereof
CN108229397A (en) * 2018-01-04 2018-06-29 华南理工大学 Method for text detection in image based on Faster R-CNN
CN108345833A (en) * 2018-01-11 2018-07-31 深圳中兴网信科技有限公司 The recognition methods of mathematical formulae and system and computer equipment
CN108345850A (en) * 2018-01-23 2018-07-31 哈尔滨工业大学 The scene text detection method of the territorial classification of stroke feature transformation and deep learning based on super-pixel

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BAOTIAN HU 等: "Stroke Sequence-Dependent Deep Convolutional Neural", 《ARXIV:1610.04057V1》 *
TIE-QIANG WANG 等: "DeepAD: A Deep Learning Based Approach to Stroke-Level Abnormality Detection in Handwritten Chinese Character Recognition", 《2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM)》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110660019A (en) * 2019-09-29 2020-01-07 华北电力大学 Small data set simplified stroke generation method based on BPL
CN110969681A (en) * 2019-11-29 2020-04-07 山东浪潮人工智能研究院有限公司 Method for generating handwriting characters based on GAN network
CN110969681B (en) * 2019-11-29 2023-08-29 山东浪潮科学研究院有限公司 Handwriting word generation method based on GAN network
CN112862025A (en) * 2021-03-08 2021-05-28 成都字嗅科技有限公司 Chinese character stroke filling method, system, terminal and medium based on computer

Also Published As

Publication number Publication date
CN110232337B (en) 2021-02-02

Similar Documents

Publication Publication Date Title
Chaudhry et al. Leaf-qa: Locate, encode & attend for figure question answering
Naz et al. Urdu Nastaliq recognition using convolutional–recursive deep learning
Alwzwazy et al. Handwritten digit recognition using convolutional neural networks
Rahman et al. Bangla handwritten character recognition using convolutional neural network
CN105426919B (en) The image classification method of non-supervisory feature learning is instructed based on conspicuousness
CN101377854B (en) Method for simulating Chinese characters hand-written handwriting by a computer
CN110232337A (en) Chinese character image stroke extraction based on full convolutional neural networks, system
CN104834941A (en) Offline handwriting recognition method of sparse autoencoder based on computer input
Abandah et al. Novel moment features extraction for recognizing handwritten Arabic letters
CN110334724B (en) Remote sensing object natural language description and multi-scale correction method based on LSTM
Asthana et al. Handwritten multiscript numeral recognition using artificial neural networks
CN109685065A (en) Printed page analysis method, the system of paper automatic content classification
CN101599180B (en) Automatic generation method of imitative computer calligraphy based on handwriting style
CN108805223A (en) A kind of recognition methods of seal character text and system based on Incep-CapsNet networks
Islam et al. A CNN based approach for garments texture design classification
Chen et al. Page segmentation for historical handwritten document images using conditional random fields
Kim et al. The structure of deep neural network for interpretable transfer learning
Indira et al. Classification and Recognition of Printed Hindi Characters Using Artificial Neural Networks
Dokare et al. Recognition of handwritten devanagari character using convolutional neural network
CN112837332A (en) Creative design generation method, device, terminal, storage medium and processor
Liu et al. Online handwritten Mongolian word recognition using MWRCNN and position maps
Jung et al. On-line recognition of cursive Korean characters using graph representation
Khan et al. Isolated handwritten pashto characters recognition using KNN classifier
Ghaleb et al. Graph modeling based segmentation of handwritten Arabic text into constituent sub-words
Hasan et al. A new state of art deep learning approach for Bangla handwritten digit recognition using SVM classifier

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant