CN109299726A - A kind of Chinese character pattern Similarity algorithm based on feature vector and stroke order coding - Google Patents
A kind of Chinese character pattern Similarity algorithm based on feature vector and stroke order coding Download PDFInfo
- Publication number
- CN109299726A CN109299726A CN201810860010.0A CN201810860010A CN109299726A CN 109299726 A CN109299726 A CN 109299726A CN 201810860010 A CN201810860010 A CN 201810860010A CN 109299726 A CN109299726 A CN 109299726A
- Authority
- CN
- China
- Prior art keywords
- chinese
- character
- sim
- similarity
- chinese character
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004364 calculation method Methods 0.000 claims abstract description 10
- 239000011159 matrix material Substances 0.000 claims description 18
- 238000001514 detection method Methods 0.000 claims description 8
- 230000017105 transposition Effects 0.000 claims description 7
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000002156 mixing Methods 0.000 claims description 3
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 claims description 2
- 230000010365 information processing Effects 0.000 abstract 1
- 238000000034 method Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 3
- 229910000831 Steel Inorganic materials 0.000 description 2
- 229910052738 indium Inorganic materials 0.000 description 2
- APFVFJFRJDLVQX-UHFFFAOYSA-N indium atom Chemical compound [In] APFVFJFRJDLVQX-UHFFFAOYSA-N 0.000 description 2
- 239000010959 steel Substances 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
- G06V10/751—Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
- G06V20/63—Scene text, e.g. street names
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/28—Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet
- G06V30/287—Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet of Kanji, Hiragana or Katakana characters
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Controls And Circuits For Display Device (AREA)
- Document Processing Apparatus (AREA)
- Character Discrimination (AREA)
Abstract
The present invention relates to a kind of Chinese character pattern Similarity algorithm based on feature vector and stroke order coding, belongs to Chinese information processing technical field.The present invention utilizes Hanzi structure, profile, stroke, the features such as sequential write, establish Hanzi features vector data library and Chinese-character order of strokes coded data library, its Hanzi features vector sum Chinese-character order of strokes coded string is transferred to any two Chinese character, the font similarity based on Hanzi features vector is calculated by difference arithmetic, the font similarity based on Chinese-character order of strokes coding is calculated by Jaro-Winkler Distance algorithm, two similarities reflect the similarity degree of Chinese character from different aspect respectively, the advantage for drawing two kinds of algorithms merges it, obtain final similarity.Compared with prior art, the present invention mainly solving phenomena such as prior art poor accuracy, flexibility is poor, the accuracy for carrying out Chinese character pattern similarity calculation by computer at present is increased.
Description
Technical field
The present invention relates to a kind of Chinese character pattern Similarity algorithm based on feature vector and stroke order coding, belongs at Chinese information
Manage technical field.
Background technique
Text is the main tool that the mankind carry out information interchange, but due to many Chinese characters there are body it is similar cause wrong knowledge,
Mistake is known, so correctly distinguishing these confusing nearly word forms to Chinese teaching, Chinese editor, typesetting, Chinese machine recognition, the Chinese
The business such as language broadcast are of great significance.
Currently, being broadly divided into two classes for the similar algorithm of Chinese character pattern: one kind is the basic information for obtaining Chinese character, such as word
These data are generated mathematic(al) representation according to certain coding rule, recycled special by shape structure, stroke number, stroke order etc.
Determine the font similarity that algorithm obtains Chinese character and then the processing to mathematic(al) representation;Another kind of is using image processing techniques
Extract Hanzi features, contrast differences alienation feature.But these two kinds of methods have respective defect, if being needed using first kind method
Some coefficients are set to balance final output result;If using the second class method, for the similarity calculation of some compounds
As a result poor.
Summary of the invention
The technical problem to be solved by the present invention is to be directed to the limitation and deficiency of the prior art, provide a kind of based on feature vector
It is dedicated to increasing to solve prior art poor accuracy, phenomena such as flexibility is poor with the Chinese character pattern Similarity algorithm of stroke order coding
Add the accuracy for carrying out Chinese character pattern similarity calculation by computer at present.
The technical scheme is that a kind of Chinese character pattern Similarity algorithm based on feature vector and stroke order coding, specifically
Step are as follows:
Step0.1: extracting picture corresponding to each Chinese character from TTC font file, i.e., Chinese character picture size be l ×
W (unit is pixel), amounts to N number of pixel;Using Chinese character picture as input source, character matrix corresponding to the Chinese character is generated
Il×w, the element value in the matrix is the gray value of the pixel;Definition ξ is binarization of gray value threshold value, carries out formula to matrix
(1) binary conversion treatment shown in, later by matrix Il×wIt is generated corresponding to the Chinese character according to rule from left to right, from top to bottom
Feature vector { x1,x2,…,xN};All Chinese characters and the Hanzi features vector of generation are stored in database, set up Hanzi features to
Measure database;
Step0.2: according to Chinese character five-stroke sequential write rule, being encoded to alphabetical a, b, c, d, e for horizontal, vertical, left, flick, folding,
Generate stroke order coding character string x corresponding to the Chinese character1x2…xz, wherein z is the stroke number of the Chinese character, xiIt is the Chinese character i-th
Stroke, and xi∈{a,b,c,d,e},i∈[1,z];All Chinese characters and the Chinese-character order of strokes coded string of generation are stored in
Database sets up Chinese-character order of strokes coded data library;
Step1: note X, Y are two Chinese characters that will calculate font similarity, from Hanzi features vector data library respectively
Transfer Hanzi features vector X:{ x corresponding to the two Chinese characters1,x2,…,xNAnd Y:{ y1,y2,…,yN, it is compiled from Chinese-character order of strokes
Chinese-character order of strokes coded string str corresponding to the two Chinese characters is transferred respectively in code databasexAnd stry;
Step2: by Hanzi features vector X:{ x1,x2,…,xNAnd Y:{ y1,y2,…,yNAs input, by difference arithmetic
Acquire the font similarity Sim between Chinese character X, Y based on feature vector1(X,Y);
Step2.1: z is definedi=xi-yi, i ∈ [1, N], generate Chinese character X, Y corresponding to difference feature vector
Step2.2: the font similarity between Chinese character X, Y based on feature vector is acquired by difference calculation formula (2)
Sim1(X,Y);
Step3: by Chinese-character order of strokes coded string strxAnd stryAs input, calculated by Jaro-Winkler Distance
Method acquires the font similarity Sim between Chinese character X, Y based on stroke order coding2(X,Y);
Step3.1: Chinese-character order of strokes coded string str is obtainedxAnd stryLength lenxAnd leny, and generate detection square
Battle array I (X, Y)lenx×leny;
Step3.2: match window value MW is calculated according to formula (3);
Step3.3: by detection matrixAnd match window value MW calculates matching character according to dependency rule
Number m and matching character transposition number n, and Chinese-character order of strokes coded string str is calculated according to formula (4)xAnd stryBetween Jaro
Distance;
Step3.4: Chinese-character order of strokes coded string str is obtainedxAnd stryLongest Common Substring strxy, and obtain its length
Spend lenxy, Chinese-character order of strokes coded string str is further calculated according to formula (5)xAnd stryBetween Jaro-Winkler
Distance, the value are the font similarity Sim between Chinese character X, Y based on stroke order coding2(X,Y);
Wherein, btFor the threshold value for whether needing to further calculate, p is zoom factor;
Step4: setting the calculated similarity of Step2, Step3 step institute and corresponding to weight is respectively α, β, and weight α, β meet α
The requirement of+β=1, by the font similarity Sim based on feature vector1(X, Y) and weight α, the font based on stroke order coding are similar
Spend Sim2(X, Y) and weight β, by similarity blending algorithm, i.e. it is similar to calculate the final font between Chinese character X, Y for formula (6)
It spends Sim (X, Y);
Sim (X, Y)=Sim1(X,Y)·α+Sim2(X,Y)·β (6)
Further, in the step Step0.1, Chinese character picture size l × w is the Chinese Character by extracting in font file
Body size determines;And character matrix Il×wIn element value I (i, j), binarization of gray value threshold xi meet the requirements of formula (7).
0≤I(i,j),ξ≤255,i∈[1,l],j∈[1,w] (7)
Further, Chinese-character order of strokes coded string str in the step Step3.1x、stryLength lenx、lenyIt answers
Meet the requirement of formula (8).
lenx,leny∈N+ (8)
Further, the calculating that number of characters m is matched in the step Step3.3, if Chinese-character order of strokes coded string strx
And stryMiddle identical characters difference distance is less than match window value MW, then is considered as the character match.It should be noted that in matching process
In, the character being matched need to be excluded, if finding matching character, needs to jump out this time matching, carries out the matching of next character.And
Calculating for matching character transposition number n, then need to see Chinese-character order of strokes coded string strxAnd stryIn for matching character set
Whether sequence is consistent, if inconsistent, the half for the number that replaces is to match character transposition number n.In addition, matching number of characters m and
Matching character transposition number n ought to meet the requirement of formula (9).
Further, threshold value b is further calculated described in step Step3.4t, usual value is 0.7, can be according to practical inspection
It surveys result to adjust by a small margin, primarily to improving detection accuracy;The zoom factor p, usual value are 0.1, can root
Factually border testing result is done adjusts by a small margin, primarily to the case where avoiding final calculation result from being greater than 1 generation, but this method
Newly-increased coded string strxAnd stryThe inverse of middle longest distanceImprove calculation formula hereinSo the value of zoom factor p on final calculation result influence and it is little.
Further, the font similarity Sim obtained in the step Step2 based on Hanzi features vector1(X,Y)、
Font similarity Sim based on Chinese-character order of strokes coding obtained in the step Step32In (X, Y), the step Step4
The final font similarity Sim (X, Y) arrived, should meet the requirement of formula (10), i.e. font similarity Sim1(X,Y)、Sim2(X,
Y), Sim (X, Y) reflects the similarity degree between two Chinese characters with the numerical value between one [0,1], and the bigger expression of numerical value is similar
Degree is higher.
0≤Sim1(X,Y),Sim2(X,Y),Sim(X,Y)≤1 (10)
The beneficial effects of the present invention are: solving phenomena such as prior art poor accuracy, flexibility is poor, increase at present
The accuracy of Chinese character pattern similarity calculation is carried out by computer.
Detailed description of the invention
Fig. 1 is flow diagram of the present invention;
Fig. 2 is that the present invention establishes database flow diagram;
Fig. 3 is the present invention refined surplus body Chinese character picture schematic diagram of Microsoft generated.
Specific embodiment
With reference to the accompanying drawings and detailed description, the invention will be further described.
Embodiment 1: as shown in Figure 1-3, a kind of Chinese character pattern Similarity algorithm based on feature vector and stroke order coding, utilizes
The features such as Hanzi structure, profile, stroke, sequential write establish Hanzi features vector data library and Chinese-character order of strokes coded data library,
Its Hanzi features vector sum Chinese-character order of strokes coded string is transferred to any two Chinese character, is calculated by difference arithmetic based on the Chinese
The font similarity of word feature vector is calculated by Jaro-Winkler Distance algorithm based on Chinese-character order of strokes coding
Font similarity, two similarities reflect the similarity degree of Chinese character from different aspect respectively, draw the advantage pair of two kinds of algorithms
It is merged, and final similarity is obtained.
Specifically includes the following steps:
Step0.1: extracting picture corresponding to each Chinese character from TTC font file, i.e., Chinese character picture size be l ×
W (unit is pixel), amounts to N number of pixel;Using Chinese character picture as input source, character matrix corresponding to the Chinese character is generated
Il×w, the element value in the matrix is the gray value of the pixel;Definition ξ is binarization of gray value threshold value, carries out formula to matrix
(1) binary conversion treatment shown in, later by matrix Il×wIt is generated corresponding to the Chinese character according to rule from left to right, from top to bottom
Feature vector { x1,x2,…,xN};All Chinese characters and the Hanzi features vector of generation are stored in database, set up Hanzi features to
Measure database;
Specific: using the refined black TTC font of Microsoft as input source, the Chinese character picture size extracted is 64 × 64 pixels,
Amount to N=4096 pixel, and takes binarization of gray value threshold xi=1;
Step0.2: according to Chinese character five-stroke sequential write rule, being encoded to alphabetical a, b, c, d, e for horizontal, vertical, left, flick, folding,
Generate stroke order coding character string x corresponding to the Chinese character1x2…xz, wherein z is the stroke number of the Chinese character, xiIt is the Chinese character i-th
Stroke, and xi∈{a,b,c,d,e},i∈[1,z];All Chinese characters and the Chinese-character order of strokes coded string of generation are stored in
Database sets up Chinese-character order of strokes coded data library;
Step1: note X, Y are two Chinese characters that will calculate font similarity, from Hanzi features vector data library respectively
Transfer Hanzi features vector X:{ x corresponding to the two Chinese characters1,x2,…,xNAnd Y:{ y1,y2,…,yN, it is compiled from Chinese-character order of strokes
Chinese-character order of strokes coded string str corresponding to the two Chinese characters is transferred respectively in code databasexAnd stry;
Specific: note Chinese character X is " steel ", and Chinese character Y is " indium ", transfers the two respectively from Hanzi features vector data library
Hanzi features vector corresponding to Chinese character, i.e.,
X=0,0,0 ..., 1,0,0 ..., 1,1,0 ..., 0,0,0 }
Y=0,0,0 ..., 0,1,0 ..., 1,0,1 ..., 0,0,0 }
In addition, transferring Chinese-character order of strokes code character corresponding to the two Chinese characters respectively from Chinese-character order of strokes coded data library
String strx=caaaebecd, stry=caaaebeacda;
Step2: by Hanzi features vector X:{ x1,x2,…,xNAnd Y:{ y1,y2,…,yNAs input, by difference arithmetic
Acquire the font similarity Sim between Chinese character X, Y based on feature vector1(X,Y);
Step2.1: z is definedi=xi-yi, i ∈ [1, N], generate Chinese character X, Y corresponding to difference feature vector
It is specific:
Step2.2: the font similarity between Chinese character X, Y based on feature vector is acquired by difference calculation formula (2)
Sim1(X,Y);
It is specific:
Step3: by Chinese-character order of strokes coded string strxAnd stryAs input, calculated by Jaro-Winkler Distance
Method acquires the font similarity Sim between Chinese character X, Y based on stroke order coding2(X,Y);
Step3.1: Chinese-character order of strokes coded string str is obtainedxAnd stryLength lenxAnd leny, and generate detection square
Battle array I (X, Y)lenx×leny;
It is specific:
Step3.2: match window value MW is calculated according to formula (3);
It is specific:
Step3.3: by detection matrixAnd match window value MW calculates matching character according to dependency rule
Number m and matching character transposition number n, and Chinese-character order of strokes coded string str is calculated according to formula (4)xAnd stryBetween Jaro
Distance;
It is specific:
Disj=0.9394
Step3.4: Chinese-character order of strokes coded string str is obtainedxAnd stryLongest Common Substring strxy, and obtain its length
Spend lenxy, Chinese-character order of strokes coded string str is further calculated according to formula (5)xAnd stryBetween Jaro-Winkler
Distance, the value are the font similarity Sim between Chinese character X, Y based on stroke order coding2(X,Y);
Wherein, btFor the threshold value for whether needing to further calculate, p is zoom factor;
It is specific: to take bt=0.7, p=0.1, then Longest Common Substring lenxy=caaaebe, length lenxy=7;
Sim2(X, Y)=Disjw=0.9779
Step4: setting the calculated similarity of Step2, Step3 step institute and corresponding to weight is respectively α, β, and weight α, β meet α
The requirement of+β=1, by the font similarity Sim based on feature vector1(X, Y) and weight α, the font based on stroke order coding are similar
Spend Sim2(X, Y) and weight β, by similarity blending algorithm, i.e. it is similar to calculate the final font between Chinese character X, Y for formula (6)
It spends Sim (X, Y);
Sim (X, Y)=Sim1(X,Y)·α+Sim2(X,Y)·β (6)
Weighting value α=0.5, β=0.5, fused rear final similarity are as follows:
By result above it can be shown that font similarity obtained by the final calculating of Chinese character " steel " and " indium " is 0.9188, phase
The similarity (0.8596) obtained for feature vector is used alone, neither seems coarse, and relatively reasonable;Relative to individually making
The similarity (0.9779) obtained with stroke order coding neither seems less boastful, and relatively meets based on human visual judgement
Effect.
In addition, about similarity Sim1(X,Y)、Sim2Value α, β of (X, Y) corresponding weight should be carried out more with actual conditions
Reasonable value after secondary detection, appropriate adjustment.
In conjunction with attached drawing, the embodiment of the present invention is explained in detail above, but the present invention is not limited to above-mentioned
Embodiment within the knowledge of a person skilled in the art can also be before not departing from present inventive concept
Put that various changes can be made.
Claims (5)
1. a kind of Chinese character pattern Similarity algorithm based on feature vector and stroke order coding, it is characterised in that:
Step0.1: extracting picture corresponding to each Chinese character from TTC font file, i.e. Chinese character picture size is l × w, single
Position is pixel, amounts to N number of pixel;Using Chinese character picture as input source, character matrix I corresponding to the Chinese character is generatedl×w,
Element value in the matrix is the gray value of the pixel;Definition ξ is binarization of gray value threshold value, carries out formula (1) to matrix
Shown binary conversion treatment, later by matrix Il×wFeature corresponding to the Chinese character is generated according to rule from left to right, from top to bottom
Vector { x1,x2,…,xN};All Chinese characters and the Hanzi features vector of generation are stored in database, set up Hanzi features vector number
According to library;
Step0.2: according to Chinese character five-stroke sequential write rule, horizontal, vertical, left, flick, folding is encoded to alphabetical a, b, c, d, e, is generated
Stroke order coding character string x corresponding to the Chinese character1x2…xz, wherein z is the stroke number of the Chinese character, xiFor i-th pen of the Chinese character
It draws, and xi∈{a,b,c,d,e},i∈[1,z];All Chinese characters and the Chinese-character order of strokes coded string of generation are stored in data
Chinese-character order of strokes coded data library is set up in library;
Step1: note X, Y are two Chinese characters that will calculate font similarity, are transferred respectively from Hanzi features vector data library
Hanzi features vector X:{ x corresponding to the two Chinese characters1,x2,…,xNAnd Y:{ y1,y2,…,yN, from Chinese-character order of strokes coded number
According to Chinese-character order of strokes coded string str corresponding to the two Chinese characters is transferred in library respectivelyxAnd stry;
Step2: by Hanzi features vector X:{ x1,x2,…,xNAnd Y:{ y1,y2,…,yNAs input, it is acquired by difference arithmetic
Font similarity Sim between Chinese character X, Y based on feature vector1(X,Y);
Step2.1: z is definedi=xi-yi, i ∈ [1, N], generate Chinese character X, Y corresponding to difference feature vector
Step2.2: the font similarity Sim between Chinese character X, Y based on feature vector is acquired by difference calculation formula (2)1(X,
Y);
Step3: by Chinese-character order of strokes coded string strxAnd stryAs input, asked by Jaro-Winkler Distance algorithm
Obtain the font similarity Sim between Chinese character X, Y based on stroke order coding2(X,Y);
Step3.1: Chinese-character order of strokes coded string str is obtainedxAnd stryLength lenxAnd leny, and generate detection matrix
Step3.2: match window value MW is calculated according to formula (3);
Step3.3: by detection matrixAnd match window value MW, according to dependency rule, calculate matching number of characters m and
Character transposition number n is matched, and calculates Chinese-character order of strokes coded string str according to formula (4)xAnd stryBetween Jaro
Distance;
Step3.4: Chinese-character order of strokes coded string str is obtainedxAnd stryLongest Common Substring strxy, and obtain its length
lenxy, Chinese-character order of strokes coded string str is further calculated according to formula (5)xAnd stryBetween Jaro-Winkler
Distance, the value are the font similarity Sim between Chinese character X, Y based on stroke order coding2(X,Y);
Wherein, btFor the threshold value for whether needing to further calculate, p is zoom factor;
Step4: setting the calculated similarity of Step2, Step3 step institute and corresponding to weight is respectively α, β, weight α, β meet alpha+beta=
1 requirement, by the font similarity Sim based on feature vector1(X, Y) and weight α, the font similarity based on stroke order coding
Sim2(X, Y) and weight β, by similarity blending algorithm, i.e. formula (6) calculates the final font similarity between Chinese character X, Y
Sim(X,Y);
Sim (X, Y)=Sim1(X,Y)·α+Sim2(X,Y)·β (6)。
2. the Chinese character pattern Similarity algorithm according to claim 1 based on feature vector and stroke order coding, it is characterised in that:
In the step Step0.1, Chinese character picture size l × w is determined by the Chinese character style size extracted in font file;And the Chinese
Word matrix Il×wIn element value I (i, j), binarization of gray value threshold xi meet the requirements of formula (7);
0≤I(i,j),ξ≤255,i∈[1,l],j∈[1,w] (7)。
3. the Chinese character pattern Similarity algorithm according to claim 1 based on feature vector and stroke order coding, it is characterised in that:
Chinese-character order of strokes coded string str in the step Step3.1x、stryLength lenx、lenyWanting for formula (8) should be met
It asks:
lenx,leny∈N+ (8)。
4. the Chinese character pattern Similarity algorithm according to claim 1 based on feature vector and stroke order coding, it is characterised in that:
The calculating that number of characters m is matched in the step Step3.3, if Chinese-character order of strokes coded string strxAnd stryMiddle identical characters phase
Gap is then considered as the character match from match window value MW is less than, and matching number of characters m and matching character transposition number n ought to meet
The requirement of formula (9):
5. the Chinese character pattern Similarity algorithm according to claim 1 based on feature vector and stroke order coding, it is characterised in that:
Font similarity Sim obtained in the step Step2 based on Hanzi features vector1In (X, Y), the step Step3
The font similarity Sim based on Chinese-character order of strokes coding arrived2Final font similarity obtained in (X, Y), the step Step4
Sim (X, Y) should meet the requirement of formula (10), i.e. font similarity Sim1(X,Y)、Sim2(X, Y), Sim (X, Y) are with one
Numerical value between [0,1] reflects the similarity degree between two Chinese characters, and the bigger expression similarity degree of numerical value is higher;
0≤Sim1(X,Y),Sim2(X,Y),Sim(X,Y)≤1 (10)。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810860010.0A CN109299726A (en) | 2018-08-01 | 2018-08-01 | A kind of Chinese character pattern Similarity algorithm based on feature vector and stroke order coding |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810860010.0A CN109299726A (en) | 2018-08-01 | 2018-08-01 | A kind of Chinese character pattern Similarity algorithm based on feature vector and stroke order coding |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109299726A true CN109299726A (en) | 2019-02-01 |
Family
ID=65172733
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810860010.0A Pending CN109299726A (en) | 2018-08-01 | 2018-08-01 | A kind of Chinese character pattern Similarity algorithm based on feature vector and stroke order coding |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109299726A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110097002A (en) * | 2019-04-30 | 2019-08-06 | 北京达佳互联信息技术有限公司 | Nearly word form determines method, apparatus, computer equipment and storage medium |
CN111160369A (en) * | 2019-12-25 | 2020-05-15 | 携程旅游信息技术(上海)有限公司 | Method, system, electronic device and storage medium for cracking Chinese character verification code |
CN112990353A (en) * | 2021-04-14 | 2021-06-18 | 中南大学 | Chinese character confusable set construction method based on multi-mode model |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103428307A (en) * | 2013-08-09 | 2013-12-04 | 中国科学院计算机网络信息中心 | Method and equipment for detecting counterfeit domain names |
CN106815593A (en) * | 2015-11-27 | 2017-06-09 | 北京国双科技有限公司 | The determination method and apparatus of Chinese text similarity |
US20180114097A1 (en) * | 2015-10-06 | 2018-04-26 | Adobe Systems Incorporated | Font Attributes for Font Recognition and Similarity |
CN108154167A (en) * | 2017-12-04 | 2018-06-12 | 昆明理工大学 | A kind of Chinese character pattern similarity calculating method |
-
2018
- 2018-08-01 CN CN201810860010.0A patent/CN109299726A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103428307A (en) * | 2013-08-09 | 2013-12-04 | 中国科学院计算机网络信息中心 | Method and equipment for detecting counterfeit domain names |
US20180114097A1 (en) * | 2015-10-06 | 2018-04-26 | Adobe Systems Incorporated | Font Attributes for Font Recognition and Similarity |
CN106815593A (en) * | 2015-11-27 | 2017-06-09 | 北京国双科技有限公司 | The determination method and apparatus of Chinese text similarity |
CN108154167A (en) * | 2017-12-04 | 2018-06-12 | 昆明理工大学 | A kind of Chinese character pattern similarity calculating method |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110097002A (en) * | 2019-04-30 | 2019-08-06 | 北京达佳互联信息技术有限公司 | Nearly word form determines method, apparatus, computer equipment and storage medium |
CN111160369A (en) * | 2019-12-25 | 2020-05-15 | 携程旅游信息技术(上海)有限公司 | Method, system, electronic device and storage medium for cracking Chinese character verification code |
CN111160369B (en) * | 2019-12-25 | 2024-03-05 | 携程旅游信息技术(上海)有限公司 | Method, system, electronic equipment and storage medium for cracking Chinese character verification code |
CN112990353A (en) * | 2021-04-14 | 2021-06-18 | 中南大学 | Chinese character confusable set construction method based on multi-mode model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wick et al. | Fully convolutional neural networks for page segmentation of historical document images | |
CN108154167A (en) | A kind of Chinese character pattern similarity calculating method | |
Zitnick | Handwriting beautification using token means | |
Saady et al. | Amazigh handwritten character recognition based on horizontal and vertical centerline of character | |
CN107610200B (en) | Character library rapid generation method based on characteristic template | |
He et al. | Configuration-transition-based connected-component labeling | |
CN110162789B (en) | Word representation method and device based on Chinese pinyin | |
CN109299726A (en) | A kind of Chinese character pattern Similarity algorithm based on feature vector and stroke order coding | |
Sundaram et al. | Attention-feedback based robust segmentation of online handwritten isolated Tamil words | |
CN117058266B (en) | Handwriting word generation method based on skeleton and outline | |
CN112784531A (en) | Chinese font and word stock generation method based on deep learning and part splicing | |
Liu et al. | Handwritten text generation via disentangled representations | |
Zhang et al. | SSNet: Structure-Semantic Net for Chinese typography generation based on image translation | |
Noubigh et al. | Densely connected layer to improve VGGnet-based CRNN for Arabic handwriting text line recognition | |
CN111738167A (en) | Method for recognizing unconstrained handwritten text image | |
Inunganbi et al. | Recognition of handwritten Meitei Mayek script based on texture feature | |
Liu et al. | FontTransformer: Few-shot high-resolution Chinese glyph image synthesis via stacked transformers | |
Baltatzis et al. | Neural Sign Actors: A diffusion model for 3D sign language production from text | |
CN113408418A (en) | Calligraphy font and character content synchronous identification method and system | |
CN115620314A (en) | Text recognition method, answer text verification method, device, equipment and medium | |
Zhang et al. | Visual analysis of inscriptions in the Tang Dynasty: a case study on the calligraphy style of Wang Xizhi | |
Neri et al. | A methodology for character recognition and revision of the linear equations solving procedure | |
Zhu et al. | Visual normalization of handwritten Chinese characters based on generative adversarial networks | |
Sampath | Quantifying scribal behavior: a novel approach to digital paleography | |
Zhuo et al. | A Novel Data Augmentation Method for Chinese Character Spatial Structure Recognition by Normalized Deformable Convolutional Networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190201 |