CN109299726A - A kind of Chinese character pattern Similarity algorithm based on feature vector and stroke order coding - Google Patents

A kind of Chinese character pattern Similarity algorithm based on feature vector and stroke order coding Download PDF

Info

Publication number
CN109299726A
CN109299726A CN201810860010.0A CN201810860010A CN109299726A CN 109299726 A CN109299726 A CN 109299726A CN 201810860010 A CN201810860010 A CN 201810860010A CN 109299726 A CN109299726 A CN 109299726A
Authority
CN
China
Prior art keywords
chinese
character
sim
similarity
chinese character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810860010.0A
Other languages
Chinese (zh)
Inventor
龙华
祁俊辉
邵玉斌
彭艺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN201810860010.0A priority Critical patent/CN109299726A/en
Publication of CN109299726A publication Critical patent/CN109299726A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/63Scene text, e.g. street names
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/28Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet
    • G06V30/287Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet of Kanji, Hiragana or Katakana characters

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Controls And Circuits For Display Device (AREA)
  • Document Processing Apparatus (AREA)
  • Character Discrimination (AREA)

Abstract

The present invention relates to a kind of Chinese character pattern Similarity algorithm based on feature vector and stroke order coding, belongs to Chinese information processing technical field.The present invention utilizes Hanzi structure, profile, stroke, the features such as sequential write, establish Hanzi features vector data library and Chinese-character order of strokes coded data library, its Hanzi features vector sum Chinese-character order of strokes coded string is transferred to any two Chinese character, the font similarity based on Hanzi features vector is calculated by difference arithmetic, the font similarity based on Chinese-character order of strokes coding is calculated by Jaro-Winkler Distance algorithm, two similarities reflect the similarity degree of Chinese character from different aspect respectively, the advantage for drawing two kinds of algorithms merges it, obtain final similarity.Compared with prior art, the present invention mainly solving phenomena such as prior art poor accuracy, flexibility is poor, the accuracy for carrying out Chinese character pattern similarity calculation by computer at present is increased.

Description

A kind of Chinese character pattern Similarity algorithm based on feature vector and stroke order coding
Technical field
The present invention relates to a kind of Chinese character pattern Similarity algorithm based on feature vector and stroke order coding, belongs at Chinese information Manage technical field.
Background technique
Text is the main tool that the mankind carry out information interchange, but due to many Chinese characters there are body it is similar cause wrong knowledge, Mistake is known, so correctly distinguishing these confusing nearly word forms to Chinese teaching, Chinese editor, typesetting, Chinese machine recognition, the Chinese The business such as language broadcast are of great significance.
Currently, being broadly divided into two classes for the similar algorithm of Chinese character pattern: one kind is the basic information for obtaining Chinese character, such as word These data are generated mathematic(al) representation according to certain coding rule, recycled special by shape structure, stroke number, stroke order etc. Determine the font similarity that algorithm obtains Chinese character and then the processing to mathematic(al) representation;Another kind of is using image processing techniques Extract Hanzi features, contrast differences alienation feature.But these two kinds of methods have respective defect, if being needed using first kind method Some coefficients are set to balance final output result;If using the second class method, for the similarity calculation of some compounds As a result poor.
Summary of the invention
The technical problem to be solved by the present invention is to be directed to the limitation and deficiency of the prior art, provide a kind of based on feature vector It is dedicated to increasing to solve prior art poor accuracy, phenomena such as flexibility is poor with the Chinese character pattern Similarity algorithm of stroke order coding Add the accuracy for carrying out Chinese character pattern similarity calculation by computer at present.
The technical scheme is that a kind of Chinese character pattern Similarity algorithm based on feature vector and stroke order coding, specifically Step are as follows:
Step0.1: extracting picture corresponding to each Chinese character from TTC font file, i.e., Chinese character picture size be l × W (unit is pixel), amounts to N number of pixel;Using Chinese character picture as input source, character matrix corresponding to the Chinese character is generated Il×w, the element value in the matrix is the gray value of the pixel;Definition ξ is binarization of gray value threshold value, carries out formula to matrix (1) binary conversion treatment shown in, later by matrix Il×wIt is generated corresponding to the Chinese character according to rule from left to right, from top to bottom Feature vector { x1,x2,…,xN};All Chinese characters and the Hanzi features vector of generation are stored in database, set up Hanzi features to Measure database;
Step0.2: according to Chinese character five-stroke sequential write rule, being encoded to alphabetical a, b, c, d, e for horizontal, vertical, left, flick, folding, Generate stroke order coding character string x corresponding to the Chinese character1x2…xz, wherein z is the stroke number of the Chinese character, xiIt is the Chinese character i-th Stroke, and xi∈{a,b,c,d,e},i∈[1,z];All Chinese characters and the Chinese-character order of strokes coded string of generation are stored in Database sets up Chinese-character order of strokes coded data library;
Step1: note X, Y are two Chinese characters that will calculate font similarity, from Hanzi features vector data library respectively Transfer Hanzi features vector X:{ x corresponding to the two Chinese characters1,x2,…,xNAnd Y:{ y1,y2,…,yN, it is compiled from Chinese-character order of strokes Chinese-character order of strokes coded string str corresponding to the two Chinese characters is transferred respectively in code databasexAnd stry
Step2: by Hanzi features vector X:{ x1,x2,…,xNAnd Y:{ y1,y2,…,yNAs input, by difference arithmetic Acquire the font similarity Sim between Chinese character X, Y based on feature vector1(X,Y);
Step2.1: z is definedi=xi-yi, i ∈ [1, N], generate Chinese character X, Y corresponding to difference feature vector
Step2.2: the font similarity between Chinese character X, Y based on feature vector is acquired by difference calculation formula (2) Sim1(X,Y);
Step3: by Chinese-character order of strokes coded string strxAnd stryAs input, calculated by Jaro-Winkler Distance Method acquires the font similarity Sim between Chinese character X, Y based on stroke order coding2(X,Y);
Step3.1: Chinese-character order of strokes coded string str is obtainedxAnd stryLength lenxAnd leny, and generate detection square Battle array I (X, Y)lenx×leny
Step3.2: match window value MW is calculated according to formula (3);
Step3.3: by detection matrixAnd match window value MW calculates matching character according to dependency rule Number m and matching character transposition number n, and Chinese-character order of strokes coded string str is calculated according to formula (4)xAnd stryBetween Jaro Distance;
Step3.4: Chinese-character order of strokes coded string str is obtainedxAnd stryLongest Common Substring strxy, and obtain its length Spend lenxy, Chinese-character order of strokes coded string str is further calculated according to formula (5)xAnd stryBetween Jaro-Winkler Distance, the value are the font similarity Sim between Chinese character X, Y based on stroke order coding2(X,Y);
Wherein, btFor the threshold value for whether needing to further calculate, p is zoom factor;
Step4: setting the calculated similarity of Step2, Step3 step institute and corresponding to weight is respectively α, β, and weight α, β meet α The requirement of+β=1, by the font similarity Sim based on feature vector1(X, Y) and weight α, the font based on stroke order coding are similar Spend Sim2(X, Y) and weight β, by similarity blending algorithm, i.e. it is similar to calculate the final font between Chinese character X, Y for formula (6) It spends Sim (X, Y);
Sim (X, Y)=Sim1(X,Y)·α+Sim2(X,Y)·β (6)
Further, in the step Step0.1, Chinese character picture size l × w is the Chinese Character by extracting in font file Body size determines;And character matrix Il×wIn element value I (i, j), binarization of gray value threshold xi meet the requirements of formula (7).
0≤I(i,j),ξ≤255,i∈[1,l],j∈[1,w] (7)
Further, Chinese-character order of strokes coded string str in the step Step3.1x、stryLength lenx、lenyIt answers Meet the requirement of formula (8).
lenx,leny∈N+ (8)
Further, the calculating that number of characters m is matched in the step Step3.3, if Chinese-character order of strokes coded string strx And stryMiddle identical characters difference distance is less than match window value MW, then is considered as the character match.It should be noted that in matching process In, the character being matched need to be excluded, if finding matching character, needs to jump out this time matching, carries out the matching of next character.And Calculating for matching character transposition number n, then need to see Chinese-character order of strokes coded string strxAnd stryIn for matching character set Whether sequence is consistent, if inconsistent, the half for the number that replaces is to match character transposition number n.In addition, matching number of characters m and Matching character transposition number n ought to meet the requirement of formula (9).
Further, threshold value b is further calculated described in step Step3.4t, usual value is 0.7, can be according to practical inspection It surveys result to adjust by a small margin, primarily to improving detection accuracy;The zoom factor p, usual value are 0.1, can root Factually border testing result is done adjusts by a small margin, primarily to the case where avoiding final calculation result from being greater than 1 generation, but this method Newly-increased coded string strxAnd stryThe inverse of middle longest distanceImprove calculation formula hereinSo the value of zoom factor p on final calculation result influence and it is little.
Further, the font similarity Sim obtained in the step Step2 based on Hanzi features vector1(X,Y)、 Font similarity Sim based on Chinese-character order of strokes coding obtained in the step Step32In (X, Y), the step Step4 The final font similarity Sim (X, Y) arrived, should meet the requirement of formula (10), i.e. font similarity Sim1(X,Y)、Sim2(X, Y), Sim (X, Y) reflects the similarity degree between two Chinese characters with the numerical value between one [0,1], and the bigger expression of numerical value is similar Degree is higher.
0≤Sim1(X,Y),Sim2(X,Y),Sim(X,Y)≤1 (10)
The beneficial effects of the present invention are: solving phenomena such as prior art poor accuracy, flexibility is poor, increase at present The accuracy of Chinese character pattern similarity calculation is carried out by computer.
Detailed description of the invention
Fig. 1 is flow diagram of the present invention;
Fig. 2 is that the present invention establishes database flow diagram;
Fig. 3 is the present invention refined surplus body Chinese character picture schematic diagram of Microsoft generated.
Specific embodiment
With reference to the accompanying drawings and detailed description, the invention will be further described.
Embodiment 1: as shown in Figure 1-3, a kind of Chinese character pattern Similarity algorithm based on feature vector and stroke order coding, utilizes The features such as Hanzi structure, profile, stroke, sequential write establish Hanzi features vector data library and Chinese-character order of strokes coded data library, Its Hanzi features vector sum Chinese-character order of strokes coded string is transferred to any two Chinese character, is calculated by difference arithmetic based on the Chinese The font similarity of word feature vector is calculated by Jaro-Winkler Distance algorithm based on Chinese-character order of strokes coding Font similarity, two similarities reflect the similarity degree of Chinese character from different aspect respectively, draw the advantage pair of two kinds of algorithms It is merged, and final similarity is obtained.
Specifically includes the following steps:
Step0.1: extracting picture corresponding to each Chinese character from TTC font file, i.e., Chinese character picture size be l × W (unit is pixel), amounts to N number of pixel;Using Chinese character picture as input source, character matrix corresponding to the Chinese character is generated Il×w, the element value in the matrix is the gray value of the pixel;Definition ξ is binarization of gray value threshold value, carries out formula to matrix (1) binary conversion treatment shown in, later by matrix Il×wIt is generated corresponding to the Chinese character according to rule from left to right, from top to bottom Feature vector { x1,x2,…,xN};All Chinese characters and the Hanzi features vector of generation are stored in database, set up Hanzi features to Measure database;
Specific: using the refined black TTC font of Microsoft as input source, the Chinese character picture size extracted is 64 × 64 pixels, Amount to N=4096 pixel, and takes binarization of gray value threshold xi=1;
Step0.2: according to Chinese character five-stroke sequential write rule, being encoded to alphabetical a, b, c, d, e for horizontal, vertical, left, flick, folding, Generate stroke order coding character string x corresponding to the Chinese character1x2…xz, wherein z is the stroke number of the Chinese character, xiIt is the Chinese character i-th Stroke, and xi∈{a,b,c,d,e},i∈[1,z];All Chinese characters and the Chinese-character order of strokes coded string of generation are stored in Database sets up Chinese-character order of strokes coded data library;
Step1: note X, Y are two Chinese characters that will calculate font similarity, from Hanzi features vector data library respectively Transfer Hanzi features vector X:{ x corresponding to the two Chinese characters1,x2,…,xNAnd Y:{ y1,y2,…,yN, it is compiled from Chinese-character order of strokes Chinese-character order of strokes coded string str corresponding to the two Chinese characters is transferred respectively in code databasexAnd stry
Specific: note Chinese character X is " steel ", and Chinese character Y is " indium ", transfers the two respectively from Hanzi features vector data library Hanzi features vector corresponding to Chinese character, i.e.,
X=0,0,0 ..., 1,0,0 ..., 1,1,0 ..., 0,0,0 }
Y=0,0,0 ..., 0,1,0 ..., 1,0,1 ..., 0,0,0 }
In addition, transferring Chinese-character order of strokes code character corresponding to the two Chinese characters respectively from Chinese-character order of strokes coded data library String strx=caaaebecd, stry=caaaebeacda;
Step2: by Hanzi features vector X:{ x1,x2,…,xNAnd Y:{ y1,y2,…,yNAs input, by difference arithmetic Acquire the font similarity Sim between Chinese character X, Y based on feature vector1(X,Y);
Step2.1: z is definedi=xi-yi, i ∈ [1, N], generate Chinese character X, Y corresponding to difference feature vector
It is specific:
Step2.2: the font similarity between Chinese character X, Y based on feature vector is acquired by difference calculation formula (2) Sim1(X,Y);
It is specific:
Step3: by Chinese-character order of strokes coded string strxAnd stryAs input, calculated by Jaro-Winkler Distance Method acquires the font similarity Sim between Chinese character X, Y based on stroke order coding2(X,Y);
Step3.1: Chinese-character order of strokes coded string str is obtainedxAnd stryLength lenxAnd leny, and generate detection square Battle array I (X, Y)lenx×leny
It is specific:
Step3.2: match window value MW is calculated according to formula (3);
It is specific:
Step3.3: by detection matrixAnd match window value MW calculates matching character according to dependency rule Number m and matching character transposition number n, and Chinese-character order of strokes coded string str is calculated according to formula (4)xAnd stryBetween Jaro Distance;
It is specific:
Disj=0.9394
Step3.4: Chinese-character order of strokes coded string str is obtainedxAnd stryLongest Common Substring strxy, and obtain its length Spend lenxy, Chinese-character order of strokes coded string str is further calculated according to formula (5)xAnd stryBetween Jaro-Winkler Distance, the value are the font similarity Sim between Chinese character X, Y based on stroke order coding2(X,Y);
Wherein, btFor the threshold value for whether needing to further calculate, p is zoom factor;
It is specific: to take bt=0.7, p=0.1, then Longest Common Substring lenxy=caaaebe, length lenxy=7;
Sim2(X, Y)=Disjw=0.9779
Step4: setting the calculated similarity of Step2, Step3 step institute and corresponding to weight is respectively α, β, and weight α, β meet α The requirement of+β=1, by the font similarity Sim based on feature vector1(X, Y) and weight α, the font based on stroke order coding are similar Spend Sim2(X, Y) and weight β, by similarity blending algorithm, i.e. it is similar to calculate the final font between Chinese character X, Y for formula (6) It spends Sim (X, Y);
Sim (X, Y)=Sim1(X,Y)·α+Sim2(X,Y)·β (6)
Weighting value α=0.5, β=0.5, fused rear final similarity are as follows:
By result above it can be shown that font similarity obtained by the final calculating of Chinese character " steel " and " indium " is 0.9188, phase The similarity (0.8596) obtained for feature vector is used alone, neither seems coarse, and relatively reasonable;Relative to individually making The similarity (0.9779) obtained with stroke order coding neither seems less boastful, and relatively meets based on human visual judgement Effect.
In addition, about similarity Sim1(X,Y)、Sim2Value α, β of (X, Y) corresponding weight should be carried out more with actual conditions Reasonable value after secondary detection, appropriate adjustment.
In conjunction with attached drawing, the embodiment of the present invention is explained in detail above, but the present invention is not limited to above-mentioned Embodiment within the knowledge of a person skilled in the art can also be before not departing from present inventive concept Put that various changes can be made.

Claims (5)

1. a kind of Chinese character pattern Similarity algorithm based on feature vector and stroke order coding, it is characterised in that:
Step0.1: extracting picture corresponding to each Chinese character from TTC font file, i.e. Chinese character picture size is l × w, single Position is pixel, amounts to N number of pixel;Using Chinese character picture as input source, character matrix I corresponding to the Chinese character is generatedl×w, Element value in the matrix is the gray value of the pixel;Definition ξ is binarization of gray value threshold value, carries out formula (1) to matrix Shown binary conversion treatment, later by matrix Il×wFeature corresponding to the Chinese character is generated according to rule from left to right, from top to bottom Vector { x1,x2,…,xN};All Chinese characters and the Hanzi features vector of generation are stored in database, set up Hanzi features vector number According to library;
Step0.2: according to Chinese character five-stroke sequential write rule, horizontal, vertical, left, flick, folding is encoded to alphabetical a, b, c, d, e, is generated Stroke order coding character string x corresponding to the Chinese character1x2…xz, wherein z is the stroke number of the Chinese character, xiFor i-th pen of the Chinese character It draws, and xi∈{a,b,c,d,e},i∈[1,z];All Chinese characters and the Chinese-character order of strokes coded string of generation are stored in data Chinese-character order of strokes coded data library is set up in library;
Step1: note X, Y are two Chinese characters that will calculate font similarity, are transferred respectively from Hanzi features vector data library Hanzi features vector X:{ x corresponding to the two Chinese characters1,x2,…,xNAnd Y:{ y1,y2,…,yN, from Chinese-character order of strokes coded number According to Chinese-character order of strokes coded string str corresponding to the two Chinese characters is transferred in library respectivelyxAnd stry
Step2: by Hanzi features vector X:{ x1,x2,…,xNAnd Y:{ y1,y2,…,yNAs input, it is acquired by difference arithmetic Font similarity Sim between Chinese character X, Y based on feature vector1(X,Y);
Step2.1: z is definedi=xi-yi, i ∈ [1, N], generate Chinese character X, Y corresponding to difference feature vector
Step2.2: the font similarity Sim between Chinese character X, Y based on feature vector is acquired by difference calculation formula (2)1(X, Y);
Step3: by Chinese-character order of strokes coded string strxAnd stryAs input, asked by Jaro-Winkler Distance algorithm Obtain the font similarity Sim between Chinese character X, Y based on stroke order coding2(X,Y);
Step3.1: Chinese-character order of strokes coded string str is obtainedxAnd stryLength lenxAnd leny, and generate detection matrix
Step3.2: match window value MW is calculated according to formula (3);
Step3.3: by detection matrixAnd match window value MW, according to dependency rule, calculate matching number of characters m and Character transposition number n is matched, and calculates Chinese-character order of strokes coded string str according to formula (4)xAnd stryBetween Jaro Distance;
Step3.4: Chinese-character order of strokes coded string str is obtainedxAnd stryLongest Common Substring strxy, and obtain its length lenxy, Chinese-character order of strokes coded string str is further calculated according to formula (5)xAnd stryBetween Jaro-Winkler Distance, the value are the font similarity Sim between Chinese character X, Y based on stroke order coding2(X,Y);
Wherein, btFor the threshold value for whether needing to further calculate, p is zoom factor;
Step4: setting the calculated similarity of Step2, Step3 step institute and corresponding to weight is respectively α, β, weight α, β meet alpha+beta= 1 requirement, by the font similarity Sim based on feature vector1(X, Y) and weight α, the font similarity based on stroke order coding Sim2(X, Y) and weight β, by similarity blending algorithm, i.e. formula (6) calculates the final font similarity between Chinese character X, Y Sim(X,Y);
Sim (X, Y)=Sim1(X,Y)·α+Sim2(X,Y)·β (6)。
2. the Chinese character pattern Similarity algorithm according to claim 1 based on feature vector and stroke order coding, it is characterised in that: In the step Step0.1, Chinese character picture size l × w is determined by the Chinese character style size extracted in font file;And the Chinese Word matrix Il×wIn element value I (i, j), binarization of gray value threshold xi meet the requirements of formula (7);
0≤I(i,j),ξ≤255,i∈[1,l],j∈[1,w] (7)。
3. the Chinese character pattern Similarity algorithm according to claim 1 based on feature vector and stroke order coding, it is characterised in that: Chinese-character order of strokes coded string str in the step Step3.1x、stryLength lenx、lenyWanting for formula (8) should be met It asks:
lenx,leny∈N+ (8)。
4. the Chinese character pattern Similarity algorithm according to claim 1 based on feature vector and stroke order coding, it is characterised in that: The calculating that number of characters m is matched in the step Step3.3, if Chinese-character order of strokes coded string strxAnd stryMiddle identical characters phase Gap is then considered as the character match from match window value MW is less than, and matching number of characters m and matching character transposition number n ought to meet The requirement of formula (9):
5. the Chinese character pattern Similarity algorithm according to claim 1 based on feature vector and stroke order coding, it is characterised in that: Font similarity Sim obtained in the step Step2 based on Hanzi features vector1In (X, Y), the step Step3 The font similarity Sim based on Chinese-character order of strokes coding arrived2Final font similarity obtained in (X, Y), the step Step4 Sim (X, Y) should meet the requirement of formula (10), i.e. font similarity Sim1(X,Y)、Sim2(X, Y), Sim (X, Y) are with one Numerical value between [0,1] reflects the similarity degree between two Chinese characters, and the bigger expression similarity degree of numerical value is higher;
0≤Sim1(X,Y),Sim2(X,Y),Sim(X,Y)≤1 (10)。
CN201810860010.0A 2018-08-01 2018-08-01 A kind of Chinese character pattern Similarity algorithm based on feature vector and stroke order coding Pending CN109299726A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810860010.0A CN109299726A (en) 2018-08-01 2018-08-01 A kind of Chinese character pattern Similarity algorithm based on feature vector and stroke order coding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810860010.0A CN109299726A (en) 2018-08-01 2018-08-01 A kind of Chinese character pattern Similarity algorithm based on feature vector and stroke order coding

Publications (1)

Publication Number Publication Date
CN109299726A true CN109299726A (en) 2019-02-01

Family

ID=65172733

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810860010.0A Pending CN109299726A (en) 2018-08-01 2018-08-01 A kind of Chinese character pattern Similarity algorithm based on feature vector and stroke order coding

Country Status (1)

Country Link
CN (1) CN109299726A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097002A (en) * 2019-04-30 2019-08-06 北京达佳互联信息技术有限公司 Nearly word form determines method, apparatus, computer equipment and storage medium
CN111160369A (en) * 2019-12-25 2020-05-15 携程旅游信息技术(上海)有限公司 Method, system, electronic device and storage medium for cracking Chinese character verification code
CN112990353A (en) * 2021-04-14 2021-06-18 中南大学 Chinese character confusable set construction method based on multi-mode model

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103428307A (en) * 2013-08-09 2013-12-04 中国科学院计算机网络信息中心 Method and equipment for detecting counterfeit domain names
CN106815593A (en) * 2015-11-27 2017-06-09 北京国双科技有限公司 The determination method and apparatus of Chinese text similarity
US20180114097A1 (en) * 2015-10-06 2018-04-26 Adobe Systems Incorporated Font Attributes for Font Recognition and Similarity
CN108154167A (en) * 2017-12-04 2018-06-12 昆明理工大学 A kind of Chinese character pattern similarity calculating method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103428307A (en) * 2013-08-09 2013-12-04 中国科学院计算机网络信息中心 Method and equipment for detecting counterfeit domain names
US20180114097A1 (en) * 2015-10-06 2018-04-26 Adobe Systems Incorporated Font Attributes for Font Recognition and Similarity
CN106815593A (en) * 2015-11-27 2017-06-09 北京国双科技有限公司 The determination method and apparatus of Chinese text similarity
CN108154167A (en) * 2017-12-04 2018-06-12 昆明理工大学 A kind of Chinese character pattern similarity calculating method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097002A (en) * 2019-04-30 2019-08-06 北京达佳互联信息技术有限公司 Nearly word form determines method, apparatus, computer equipment and storage medium
CN111160369A (en) * 2019-12-25 2020-05-15 携程旅游信息技术(上海)有限公司 Method, system, electronic device and storage medium for cracking Chinese character verification code
CN111160369B (en) * 2019-12-25 2024-03-05 携程旅游信息技术(上海)有限公司 Method, system, electronic equipment and storage medium for cracking Chinese character verification code
CN112990353A (en) * 2021-04-14 2021-06-18 中南大学 Chinese character confusable set construction method based on multi-mode model

Similar Documents

Publication Publication Date Title
Wick et al. Fully convolutional neural networks for page segmentation of historical document images
CN108154167A (en) A kind of Chinese character pattern similarity calculating method
Zitnick Handwriting beautification using token means
Saady et al. Amazigh handwritten character recognition based on horizontal and vertical centerline of character
CN107610200B (en) Character library rapid generation method based on characteristic template
He et al. Configuration-transition-based connected-component labeling
CN110162789B (en) Word representation method and device based on Chinese pinyin
CN109299726A (en) A kind of Chinese character pattern Similarity algorithm based on feature vector and stroke order coding
Sundaram et al. Attention-feedback based robust segmentation of online handwritten isolated Tamil words
CN117058266B (en) Handwriting word generation method based on skeleton and outline
CN112784531A (en) Chinese font and word stock generation method based on deep learning and part splicing
Liu et al. Handwritten text generation via disentangled representations
Zhang et al. SSNet: Structure-Semantic Net for Chinese typography generation based on image translation
Noubigh et al. Densely connected layer to improve VGGnet-based CRNN for Arabic handwriting text line recognition
CN111738167A (en) Method for recognizing unconstrained handwritten text image
Inunganbi et al. Recognition of handwritten Meitei Mayek script based on texture feature
Liu et al. FontTransformer: Few-shot high-resolution Chinese glyph image synthesis via stacked transformers
Baltatzis et al. Neural Sign Actors: A diffusion model for 3D sign language production from text
CN113408418A (en) Calligraphy font and character content synchronous identification method and system
CN115620314A (en) Text recognition method, answer text verification method, device, equipment and medium
Zhang et al. Visual analysis of inscriptions in the Tang Dynasty: a case study on the calligraphy style of Wang Xizhi
Neri et al. A methodology for character recognition and revision of the linear equations solving procedure
Zhu et al. Visual normalization of handwritten Chinese characters based on generative adversarial networks
Sampath Quantifying scribal behavior: a novel approach to digital paleography
Zhuo et al. A Novel Data Augmentation Method for Chinese Character Spatial Structure Recognition by Normalized Deformable Convolutional Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190201