CN110097002B - Shape and proximity word determining method and device, computer equipment and storage medium - Google Patents
Shape and proximity word determining method and device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN110097002B CN110097002B CN201910359360.3A CN201910359360A CN110097002B CN 110097002 B CN110097002 B CN 110097002B CN 201910359360 A CN201910359360 A CN 201910359360A CN 110097002 B CN110097002 B CN 110097002B
- Authority
- CN
- China
- Prior art keywords
- character
- similarity
- image
- determining
- stroke
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 240000004282 Grewia occidentalis Species 0.000 claims description 85
- 230000015654 memory Effects 0.000 claims description 14
- 238000012545 processing Methods 0.000 description 12
- 238000010586 diagram Methods 0.000 description 11
- 230000006870 function Effects 0.000 description 10
- 230000002093 peripheral effect Effects 0.000 description 10
- 230000001133 acceleration Effects 0.000 description 9
- 108091026890 Coding region Proteins 0.000 description 6
- 238000004891 communication Methods 0.000 description 6
- 230000004927 fusion Effects 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 5
- 238000009877 rendering Methods 0.000 description 4
- 239000002131 composite material Substances 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000013519 translation Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 239000000919 ceramic Substances 0.000 description 2
- 239000003795 chemical substances by application Substances 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 239000002689 soil Substances 0.000 description 2
- 230000001131 transforming effect Effects 0.000 description 2
- 241000219068 Actinidia Species 0.000 description 1
- 241000282461 Canis lupus Species 0.000 description 1
- 206010014198 Eczema infantile Diseases 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 240000007594 Oryza sativa Species 0.000 description 1
- 235000007164 Oryza sativa Nutrition 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 201000008937 atopic dermatitis Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 201000011510 cancer Diseases 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 239000010977 jade Substances 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 125000001400 nonyl group Chemical group [H]C([*])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])[H] 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 235000009566 rice Nutrition 0.000 description 1
- 230000006641 stabilisation Effects 0.000 description 1
- 238000011105 stabilization Methods 0.000 description 1
- 238000012916 structural analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/32—Digital ink
- G06V30/333—Preprocessing; Feature extraction
- G06V30/347—Sampling; Contour coding; Stroke extraction
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The disclosure relates to a method and a device for determining shape and word, computer equipment and a storage medium, which relate to the technical field of networks, and the method comprises the following steps: acquiring a first character and a second character; acquiring a first structural feature of the first character and a second structural feature of the second character; determining a structural similarity between the first character and the second character according to the first structural feature and the second structural feature; determining an image similarity between the first character and the second character based on a first character image of the first character and a second character image of the second character; determining the similarity of the first character and the second character according to the structural similarity and the image similarity; and acquiring the shape and proximity word determination results of the first character and the second character according to the similarity between the first character and the second character. The similarity between the characters is determined by integrating the character structure and the image display multi-angle, so that the accuracy of determining the shape of the character is improved.
Description
Technical Field
The present disclosure relates to the field of network technologies, and in particular, to a method and an apparatus for determining a shape and a word, a computer device, and a storage medium.
Background
With the development of network technology, in many scenarios, the terminal performs near-word recognition based on the similarity between characters. For example, to recognize variant words in a web review, a user handwriting a text scene, text recognition in an image, and the like, a terminal generally needs to find a character's shape-similar word based on the similarity of two characters.
In the related art, the process of determining the character similarity may include: and the terminal performs binarization processing on the two pictures corresponding to the two characters with the determined similarity, and then determines the similarity of the two characters by counting the maximum pixel point matching number of the two pictures.
In the above process, the similarity determination is actually performed based on the picture where the character is located, however, the font style and the rendering mode of the picture are different in different device systems, and therefore, even pictures corresponding to the same character may be different, resulting in lower accuracy in determining the similarity of the character.
Disclosure of Invention
The present disclosure provides a method, an apparatus, a computer device and a storage medium for determining a shape and a word, which can solve the technical problem of low accuracy in determining a similarity of characters in the related art.
According to a first aspect of the embodiments of the present disclosure, there is provided a method for determining a shape-near word, including:
acquiring a first character and a second character;
acquiring a first structural feature of the first character and a second structural feature of the second character;
determining a structural similarity between the first character and the second character according to the first structural feature and the second structural feature;
determining an image similarity between the first character and the second character based on a first character image of the first character and a second character image of the second character;
determining the similarity between the first character and the second character according to the structural similarity and the image similarity;
and acquiring a shape-similar word determination result of the first character and the second character according to the similarity between the first character and the second character, wherein the shape-similar word determination result is used for indicating whether the first character and the second character are shape-similar words or not.
In one possible implementation manner, the structural features of the character include a stroke, a stroke order, a structural type and a quadrangle code of the character, and the obtaining the first structural feature of the first character and the second structural feature of the second character includes:
inquiring a storage address of the first character and a storage address of the second character from a character information base according to the character identifier of the first character and the character identifier of the second character respectively;
and acquiring a first stroke, a first stroke sequence, a first structure type and a first four-corner code of the first character from the storage address of the first character, and acquiring a second stroke, a second stroke sequence, a second structure type and a second four-corner code of the second character from the storage address of the second character.
In one possible implementation, the determining the structural similarity between the first character and the second character according to the first structural feature and the second structural feature includes:
respectively counting the number of first strokes of the first character and the number of second strokes of the second character according to the first strokes and the second strokes, and determining the stroke number similarity between the first character and the second character according to the number of the first strokes and the number of the second strokes;
determining stroke order similarity between the first character and the second character according to the first stroke order and the second stroke order;
determining the structure type similarity between the first character and the second character according to the first structure type and the second structure type;
determining the four-corner coding similarity between the first character and the second character according to the first four-corner coding and the second four-corner coding;
determining the structural similarity between the first character and the second character according to the stroke number similarity, the stroke sequence similarity, the structural type similarity, the four-corner coding similarity, the first weight, the second weight, the third weight and the fourth weight between the first character and the second character;
the first weight is the weight of stroke number similarity, the second weight is the weight of stroke sequence similarity, the third weight is the weight of structure type similarity, and the fourth weight is the weight of four-corner coding similarity.
In one possible implementation, the determining stroke order similarity between the first character and the second character according to the first stroke order and the second stroke order includes:
determining a stroke sequence editing distance between the first character and the second character according to a plurality of stroke sequence codes included in the first stroke sequence and a plurality of stroke sequence codes included in the second stroke sequence, wherein the stroke sequence editing distance is the minimum operation unit number required for converting strokes of the first character into the second character through editing operation;
determining stroke order similarity between the first character and the second character according to the stroke order edit distance between the first character and the second character, the first stroke number of the first character and the second stroke number of the second character.
In one possible implementation, the determining an image similarity between the first character and the second character based on a first character image of the first character and a second character image of the second character includes:
acquiring a first character image of the first character and a second character image of the second character, wherein the first character image and the second character image are respectively used for representing screen display styles of the first character and the second character;
and determining the image similarity between the first character and the second character according to the first character image and the second character image.
In one possible implementation, the determining, from the first character image and the second character image, an image similarity between the first character and the second character includes:
extracting a plurality of first feature points in the first character image and a plurality of second feature points in the second character image, and acquiring first descriptors of the plurality of first feature points and second descriptors of the plurality of second feature points;
determining a second feature point matched with each first feature point according to the first descriptors of the plurality of first feature points and the second descriptors of the plurality of second feature points to obtain a plurality of matched feature point pairs which are matched with each other;
and determining the image similarity between the first character image and the second character image according to the number of the plurality of matched characteristic point pairs and the number of the first characteristic points.
In one possible implementation, the acquiring the first character image of the first character and the second character image of the second character includes any one of:
acquiring a first character image when the first character is displayed and a second character image when the second character is displayed on equipment with the same system; and
the method comprises the steps of obtaining a plurality of display images of a first character on a plurality of devices, determining a fused image of the display images of the first character as the first character image, obtaining a plurality of display images of a second character on the plurality of devices, and determining a fused image of the display images of the second character as the second character image, wherein device systems configured on the devices are different.
In one possible implementation manner, the obtaining, according to the similarity between the first character and the second character, a determination result of the shape-similar word of the first character and the second character includes:
when the similarity of the first character and the second character is larger than a first preset threshold value, acquiring a first result of the first character and the second character, wherein the first result is used for indicating that the first character and the second character are similar characters;
when the similarity between the first character and the second character is not smaller than the first preset threshold, acquiring a first root character to which the first character belongs and a second root character to which the second character belongs, and acquiring a second result of the first character and the second character according to the first root character, the second root character, the structural similarity between the first character and the second character, and the image similarity, wherein the second result is used for indicating whether the first character and the second character are similar characters or not.
In one possible implementation manner, the obtaining, according to the first root character, the second root character, and the structural similarity and the image similarity between the first character and the second character, a second result of the first character and the second character includes:
when the first root character and the second root character are the same, the structural similarity is larger than a third preset threshold value, and the image similarity is larger than a fourth preset threshold value, determining that the first character and the second character are similar characters;
when the first root character and the second root character are the same and the structural similarity is not larger than a fifth preset threshold value, determining that the first character and the second character are not similar to a word;
and when the first root character and the second root character are the same, the structural similarity is greater than a fifth preset threshold, the image similarity is greater than a sixth preset threshold, and the similarity between the first character and the second character is greater than a seventh preset threshold, determining that the first character and the second character are similar characters.
According to a second aspect of the embodiments of the present disclosure, there is provided a shape near word determination apparatus, including:
a first obtaining module configured to obtain a first character and a second character;
a second obtaining module configured to obtain a first structural feature of the first character and a second structural feature of the second character;
a first determination module configured to determine a structural similarity between the first character and the second character based on the first structural feature and the second structural feature;
a second determination module configured to determine an image similarity between a first character image of the first character and a second character image of the second character based on the first character image and the second character image;
a third determination module configured to determine a similarity between the first character and the second character according to the structural similarity and the image similarity;
a fourth determining module configured to obtain a shape-similar word determining result of the first character and the second character according to the similarity between the first character and the second character, wherein the shape-similar word determining result is used for indicating whether the first character and the second character are shape-similar words.
In a possible implementation manner, the structural features of the character include a stroke, a stroke order, a structural type, and a four-corner code of the character, and the second obtaining module is configured to query, from a character information base, a storage address of the first character and a storage address of the second character according to the character identifier of the first character and the character identifier of the second character, respectively; and acquiring a first stroke, a first stroke sequence, a first structure type and a first four-corner code of the first character from the storage address of the first character, and acquiring a second stroke, a second stroke sequence, a second structure type and a second four-corner code of the second character from the storage address of the second character.
In one possible implementation manner, the first determining module is configured to count a first stroke number of the first character and a second stroke number of the second character according to the first stroke and the second stroke, respectively, and determine a stroke number similarity between the first character and the second character according to the first stroke number and the second stroke number; determining stroke order similarity between the first character and the second character according to the first stroke order and the second stroke order; determining the structure type similarity between the first character and the second character according to the first structure type and the second structure type; determining the four-corner coding similarity between the first character and the second character according to the first four-corner coding and the second four-corner coding; determining the structural similarity between the first character and the second character according to the stroke number similarity, the stroke sequence similarity, the structural type similarity, the four-corner coding similarity, the first weight, the second weight, the third weight and the fourth weight between the first character and the second character;
the first weight is the weight of stroke number similarity, the second weight is the weight of stroke sequence similarity, the third weight is the weight of structure type similarity, and the fourth weight is the weight of four-corner coding similarity.
In one possible implementation of the method according to the invention,
the first determining module is configured to determine a stroke order edit distance between the first character and the second character according to a plurality of stroke order encodings included in the first stroke order and a plurality of stroke order encodings included in the second stroke order, where the stroke order edit distance is a minimum number of operation units required for transforming the stroke of the first character into the second character by performing an edit operation on the stroke of the first character; determining stroke order similarity between the first character and the second character according to the stroke order edit distance between the first character and the second character, the first stroke number of the first character and the second stroke number of the second character.
In one possible implementation manner, the second determining module includes:
an acquisition unit configured to acquire a first character image of the first character and a second character image of the second character, the first character image and the second character image being used to represent screen display styles of the first character and the second character, respectively;
a determination unit configured to determine an image similarity between the first character and the second character from the first character image and the second character image.
In a possible implementation manner, the determining unit is configured to extract a plurality of first feature points in the first character image and a plurality of second feature points in the second character image, and obtain a first descriptor of the plurality of first feature points and a second descriptor of the plurality of second feature points; determining a second feature point matched with each first feature point according to the first descriptors of the plurality of first feature points and the second descriptors of the plurality of second feature points to obtain a plurality of matched feature point pairs which are matched with each other; and determining the image similarity between the first character image and the second character image according to the number of the plurality of matched characteristic point pairs and the number of the first characteristic points.
In one possible implementation, the obtaining unit is further configured to any one of:
acquiring a first character image when the first character is displayed and a second character image when the second character is displayed on equipment with the same system; and
acquiring a plurality of display images of the first character on a plurality of devices, and determining a fused image of the display images of the first character as the first character image; and acquiring a plurality of display images of the second character on the plurality of devices, and determining a fused image of the plurality of display images of the second character as the second character image, wherein the plurality of devices are configured with different systems.
In one possible implementation, the fourth determining module is further configured to: when the similarity of the first character and the second character is larger than a first preset threshold value, acquiring a first result of the first character and the second character, wherein the first result is used for indicating that the first character and the second character are similar characters; when the similarity of the first character and the second character is not smaller than the first preset threshold, acquiring a first root character to which the first character belongs and a second root character to which the second character belongs, and determining a second result of the first character and the second character according to the first root character, the second root character, the structural similarity and the image similarity between the first character and the second character, wherein the second result is used for indicating whether the first character and the second character are similar characters or not.
In one possible implementation manner, the fourth determining module is further configured to determine that the first character and the second character are similar characters when the first root character and the second root character are the same, the structural similarity is greater than a third preset threshold, and the image similarity is greater than a fourth preset threshold; when the first root character and the second root character are the same and the structural similarity is not larger than a fifth preset threshold value, determining that the first character and the second character are not similar to a word; and when the first root character and the second root character are the same, the structural similarity is greater than a fifth preset threshold, the image similarity is greater than a sixth preset threshold, and the similarity between the first character and the second character is greater than a seventh preset threshold, determining that the first character and the second character are similar characters.
According to a third aspect of embodiments of the present disclosure, there is provided a computer device comprising one or more processors and one or more memories having stored therein at least one instruction, the at least one instruction being loaded and executed by the one or more processors to implement the operations performed by the method for determining a near-word shape as described in the first aspect above.
According to a fourth aspect of embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium having stored therein at least one instruction, which is loaded and executed by a processor to implement the operations performed by the near word determining method according to the first aspect.
According to a fifth aspect of embodiments of the present disclosure, there is provided an application program comprising one or more instructions which, when executed by a processor of a computer device, enable the computer device to perform the operations performed by the method for determining a near-word as described in the first aspect above.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:
the method comprises the steps of obtaining the first structural feature and the second structural feature, determining structural similarity between the first character and the second character, evaluating the similarity of the two characters from the structural angle of the characters, and determining the image similarity between the first character and the second character based on a first character image of the first character and a second character image of the second character so as to determine the similarity from the image display angle. Then, the computer device integrates the character structure and the similarity degree of two angles displayed by the image, accurately evaluates the similarity degree of the first character and the second character, and further determines whether the first character and the second character are similar characters, so that the accuracy of determining the similar characters is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
FIG. 1 is a flow diagram illustrating a method for determining a near word according to an example embodiment;
FIG. 2 is a flow diagram illustrating a method of near word determination in accordance with an exemplary embodiment;
FIG. 3 is a schematic diagram of a character image shown in accordance with an exemplary embodiment;
FIG. 4 is a diagram illustrating a feature point matching in accordance with an exemplary embodiment;
FIG. 5 is a flow diagram illustrating a similarity-based near-word determination method in accordance with an exemplary embodiment;
FIG. 6 is a block diagram illustrating a near word determining apparatus in accordance with one illustrative embodiment;
fig. 7 shows a block diagram of a terminal provided in an exemplary embodiment of the present disclosure;
fig. 8 is a schematic structural diagram of a server according to an embodiment of the present disclosure.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
Fig. 1 is a flow chart illustrating a method for determining a near word, as shown in fig. 1, for use in a computer device, according to an exemplary embodiment, including the following steps.
101. Acquiring a first character and a second character;
102. acquiring a first structural feature of the first character and a second structural feature of the second character;
103. determining a structural similarity between the first character and the second character according to the first structural feature and the second structural feature;
104. determining an image similarity between the first character and the second character based on a first character image of the first character and a second character image of the second character;
105. and determining the similarity of the first character and the second character according to the structural similarity and the image similarity.
106. And acquiring a shape-similar word determination result of the first character and the second character according to the similarity between the first character and the second character, wherein the shape-similar word determination result is used for indicating whether the first character and the second character are shape-similar words or not.
In one possible implementation, the structural features of the character include a stroke, a stroke order, a structural type, and a quadrangle code of the character, and the obtaining the first structural feature of the first character and the second structural feature of the second character includes:
inquiring the storage address of the first character and the storage address of the second character from a character information base according to the character identifier of the first character and the character identifier of the second character respectively;
and acquiring a first stroke, a first stroke sequence, a first structure type and a first four-corner code of the first character from the storage address of the first character, and acquiring a second stroke, a second stroke sequence, a second structure type and a second four-corner code of the second character from the storage address of the second character.
In one possible implementation, the determining the structural similarity between the first character and the second character based on the first structural feature and the second structural feature includes:
respectively counting the first stroke number of the first character and the second stroke number of the second character according to the first stroke and the second stroke, and determining the stroke number similarity between the first character and the second character according to the first stroke number and the second stroke number;
determining stroke order similarity between the first character and the second character according to the first stroke order and the second stroke order;
determining the structure type similarity between the first character and the second character according to the first structure type and the second structure type;
determining the four-corner coding similarity between the first character and the second character according to the first four-corner coding and the second four-corner coding;
determining the structural similarity between the first character and the second character according to the stroke number similarity, the stroke sequence similarity, the structural type similarity, the four-corner coding similarity, the first weight, the second weight, the third weight and the fourth weight between the first character and the second character;
the first weight is the weight of stroke number similarity, the second weight is the weight of stroke order similarity, the third weight is the weight of structure type similarity, and the fourth weight is the weight of four-corner coding similarity.
In one possible implementation, the determining stroke order similarity between the first character and the second character based on the first stroke order and the second stroke order includes:
determining a stroke sequence editing distance between the first character and the second character according to a plurality of stroke sequence codes included in the first stroke sequence and a plurality of stroke sequence codes included in the second stroke sequence, wherein the stroke sequence editing distance is the minimum operation unit number required for converting the stroke of the first character into the second character through editing operation;
determining stroke order similarity between the first character and the second character according to a stroke order edit distance between the first character and the second character, a first stroke number of the first character and a second stroke number of the second character.
In one possible implementation, the determining an image similarity between the first character and the second character based on a first character image of the first character and a second character image of the second character includes:
acquiring a first character image of the first character and a second character image of the second character, wherein the first character image and the second character image are respectively used for representing screen display styles of the first character and the second character;
and determining the image similarity between the first character and the second character according to the first character image and the second character image.
In one possible implementation, the determining the image similarity between the first character and the second character according to the first character image and the second character image includes:
extracting a plurality of first feature points in the first character image and a plurality of second feature points in the second character image, and acquiring a first descriptor of the plurality of first feature points and a second descriptor of the plurality of second feature points;
determining a second feature point matched with each first feature point according to the first descriptors of the plurality of first feature points and the second descriptors of the plurality of second feature points to obtain a plurality of matched feature point pairs which are matched with each other;
and determining the image similarity between the first character image and the second character image according to the number of the plurality of matched characteristic point pairs and the number of the first characteristic points.
In one possible implementation, the acquiring the first character image of the first character and the second character image of the second character includes any one of:
acquiring a first character image when the first character is displayed and a second character image when the second character is displayed on equipment with the same system; and
the method comprises the steps of obtaining a plurality of display images of a first character on a plurality of devices, determining a fused image of the display images of the first character as the first character image, obtaining a plurality of display images of a second character on the plurality of devices, and determining a fused image of the display images of the second character as the second character image, wherein device systems configured on the plurality of devices are different.
In one possible implementation manner, the obtaining the determination result of the shape-similar word of the first character and the second character according to the similarity between the first character and the second character includes:
when the similarity of the first character and the second character is larger than a first preset threshold value, acquiring a first result of the first character and the second character, wherein the first result is used for indicating that the first character and the second character are similar characters;
when the similarity between the first character and the second character is not smaller than the first preset threshold, acquiring a first root character to which the first character belongs and a second root character to which the second character belongs, and acquiring a second result of the first character and the second character according to the first root character, the second root character, the structural similarity and the image similarity between the first character and the second character, wherein the second result is used for indicating whether the first character and the second character are similar characters or not.
In one possible implementation manner, the obtaining a second result of the first character and the second character according to the first root character, the second root character, and the structural similarity and the image similarity between the first character and the second character includes:
when the first root character and the second root character are the same, the structural similarity is larger than a third preset threshold value, and the image similarity is larger than a fourth preset threshold value, determining that the first character and the second character are similar characters;
when the first root character and the second root character are the same and the structural similarity is not larger than a fifth preset threshold value, determining that the first character and the second character are not similar to a character;
and when the first root character and the second root character are the same, the structural similarity is greater than a fifth preset threshold value, the image similarity is greater than a sixth preset threshold value, and the similarity between the first character and the second character is greater than a seventh preset threshold value, determining that the first character and the second character are similar characters.
In the embodiment of the present disclosure, the computer device may acquire the first structural feature and the second structural feature, determine a structural similarity between the first character and the second character, evaluate a degree of similarity between the two characters from a structural angle of the characters themselves, and further determine an image similarity between the first character and the second character based on a first character image of the first character and a second character image of the second character to determine the degree of similarity from an image display angle. Then, the computer device integrates the character structure and the similarity degree of two angles displayed by the image, accurately determines the similarity degree of the first character and the second character, and further determines whether the first character and the second character are similar characters, so that the accuracy of the similar character determination is improved.
Fig. 2 is a flow chart illustrating a method for determining a near word, as shown in fig. 2, for use in a computer device, according to an example embodiment, including the following steps.
201. The computer device obtains a first character and a second character.
The first and second characters may be chinese kanji characters. The computer device can determine a plurality of groups of shape-approximating words, and construct a shape-approximating character table based on the plurality of groups of shape-approximating words, wherein each group of shape-approximating words comprises a first character and a plurality of shape-approximating words that are similar to the first character in font style. In a possible implementation scenario, the first character may be a character that needs to be searched for a similar character at present, a character information base is pre-stored on the server, the character information base includes a plurality of characters, the server may obtain a plurality of second characters from the character information base, and determine whether each second character is a similar character to the first character based on similarity between the first character and each second character in sequence. Wherein, the shape-similar characters refer to that the characters have similar shapes between two characters. In this step, the server may obtain a first character of the approximate character to be searched, and obtain a second character from the character information base.
In another possible implementation, the computer device may further obtain the first character and the second character in sequence from a first character according to a character sequence of the plurality of characters in the character information base. Then the step can also be: the computer device acquires a first character and a second character, which are positioned after the two character sequences acquired last time, from the plurality of characters according to the character sequences of the plurality of characters and the two character sequences acquired last time. For example, the plurality of characters may include: ABCDEF, the first two characters taken are AB, the second two characters taken may be AC after AB endianness, the third AD, and so on.
202. The computer device obtains a first structural feature of the first character and a second structural feature of the second character.
In the disclosed embodiment, the structural features of a character include the stroke, stroke order, structural type, and four corner coding of the character.
The strokes of the character refer to one or more basic strokes composing the character, for example, the basic strokes may include 5 types of basic strokes of horizontal, vertical, left-falling, dot and turning, which are specified in the modern Chinese universal word list. Stroke order refers to the order of one or more of the basic strokes that make up the character, in the order in which the character is written. It should be noted that each basic stroke may correspond to a stroke identifier, and the stroke order of the character may be represented by a plurality of combined stroke identifiers. In one possible example, the stroke identification corresponding to each basic stroke may be a stroke order code, for example, the stroke order code corresponding to 5 types of basic strokes of horizontal, vertical, left falling, dot and turning may be the numbers 1, 2, 3, 4 and 5. In one specific example, the basic stroke sequence of the character "good" may be "フノ - フ" - ", and when the stroke sequence is expressed by using stroke order coding, the stroke sequence of the character" good "may be expressed as" 531521 ".
The Chinese characters can be structurally divided into a single-body character structure and a composite character structure by performing structural analysis on the Chinese characters, wherein the single-body character structure means that characters are composed of single components and is also called a single structure. A composite character structure refers to a character that is assembled from multiple components. Wherein, for the composite character structure, according to the orientation relation between the character-forming component and the component, the method may include: the structure comprises a left structure, a right structure, an upper structure, a lower structure, a left-middle-right structure, an upper-middle-lower structure, an upper-right surrounding structure, an upper-left surrounding structure, a lower-left surrounding structure, a half surrounding structure, a full surrounding structure and a nested structure. For example: the characters "happy" and "metaphor" are both left-right structures, and the characters "debate" and "petal" are both left-middle-right structures. Thus, the structural types of the characters may include: the character structure comprises a single character structure, a left-right structure, an upper-lower structure, a left-middle-right structure, an upper-middle-lower structure, a right-upper surrounding structure, a left-lower surrounding structure, a half surrounding structure, a full surrounding structure and a nested structure.
The four-corner coding refers to coding corresponding to the strokes of single or multiple strokes of four corners of the upper left corner, the upper right corner, the lower left corner and the lower right corner of the character respectively. In one possible example, there are 10 strokes of single or multiple strokes at the four corners of the character, and ten strokes at the four corners of the character may be represented by 0 to 9, respectively, each corner corresponding to one code, e.g., "00441" for the four corner codes of "debate" and "lobe". The fifth digit is the character 'auxiliary sign', which means that a stroke shape near the upper right lower corner is added into the four-corner code as an auxiliary sign.
The computer device can pre-store a character information base of a plurality of characters, wherein the character information base comprises information such as strokes, stroke sequences, structure types, four-corner codes and the like corresponding to each character. The computer device queries the storage address of the first character and the storage address of the second character from the character information base according to the character identifier of the first character and the character identifier of the second character, acquires the first stroke, the first stroke sequence, the first structure type and the first four-corner code of the first character from the storage address of the first character, and acquires the second stroke, the second stroke sequence, the second structure type and the second four-corner code of the second character from the storage address of the second character.
203. The computer device determines a structural similarity between the first character and the second character based on the first structural feature and the second structural feature.
The structural similarity includes stroke number similarity between the first stroke and the second stroke, stroke order similarity between the first stroke order and the second stroke order, structural type similarity between the first structural type and the first structural type, and four corner coding similarity between the first four corner coding and the second four corner coding.
For the stroke number similarity, the computer device respectively counts a first stroke number of a first character and a second stroke number of a second character according to the first stroke and the second stroke, and determines the stroke number similarity according to the first stroke number and the second stroke number.
In one possible example, the computer device may determine a stroke number similarity between the first character and the second character according to the first stroke number and the second stroke number by the following formula one;
wherein s isbhN is a first stroke number of the first character, and m is a second stroke number of the second character.
For stroke order similarity, the computer device determines stroke order similarity between the first stroke order and the second stroke order based on the first stroke order and the second stroke order. In one possible example, the computer device may measure a difference in stroke order between two characters using an edit distance. When representing the stroke sequence by using stroke sequence coding, the computer equipment determines a stroke sequence editing distance between the first character and the second character according to a plurality of stroke sequence codings included in the first stroke sequence and a plurality of stroke sequence codings included in the second stroke sequence, wherein the stroke sequence editing distance refers to the minimum operation unit number required for converting the first character into the second character by performing editing operation on the first character; the editing operation may include operations such as delete, insert, replace, or add. The minimum number of operation units means the minimum number of strokes that can be operated at a time. The computer device determines a stroke order similarity between the first character and the second character based on a stroke order edit distance between the first character and the second character.
In one possible example, the computer device may determine a stroke order edit distance between the first character and the second character according to a plurality of stroke order encodings comprised of the first stroke order and a plurality of stroke order encodings comprised of the second stroke order by equation two below;
wherein,edel(·)、edel(·)、esub(. cndot.) represents the delete, insert, and replace operation distances, respectively, which are typically 1; the stroke order of the first character is coded as a ═ a1a2…an(ii) a The stroke order of the second character is coded as b ═ b1b2…bm;dijIndicating a stroke order edit distance between the ith stroke order code of the first character and the jth stroke order code of the second character, a ═ a1a2…anAnd b is ═ b1b2…bmHas an edit distance d betweenmn;
In one possible example, the computer device may determine stroke order similarity between the first character and the second character based on a stroke order edit distance between the first character and the second character by the following formula three;
wherein s isbsFor stroke order similarity, the edit distance between the first character and the second character is dmnN is the first stroke number of the first character, and m is the second stroke number of the second character.
It should be noted that, in general, a shape similar to a word has a higher similarity of stroke number and stroke order,such as "end" and "not", sbsAnd sbhAre all 1.0; wolf and tough, sbsAnd sbhAre all 0.9. But for partial word proximity, e.g. "spin" and "travel", sbsIs 0.73; "dialectic" and "petal" sbs0.79, a higher stroke number and stroke order similarity cannot be obtained. The observation shows that for the similar characters, on one hand, the character forming structure is the same, on the other hand, the four-corner distribution of the character patterns is similar, so that the structure and the four-corner coding information of the Chinese characters need to be considered for defining the similarity of the character patterns.
For structure type similarity, the computer device may determine a structure type similarity between the first character and the second character based on a first structure type of the first character and a second structure type of the second character.
In one possible example, the computer device may determine a structure type similarity between the first character and the second character according to a first structure type of the first character and a second structure type of the second character by the following formula four;
the formula four is as follows: sjg=(|sta-stb|);
Wherein s isjgIs the structural type similarity; staA first structure type, st, representing a first characterbA second structure type, st, representing a second characterjgIndicating a structural type similarity between the first character and the second character. Where (-) is the impulse function, if sta=stbThen sjgIs 1, otherwise is 0.
For the four corner code similarity, the computer device determines a four corner code similarity between the first four corner code and the second four corner code according to the first four corner code and the second four corner code. The computer device may measure the difference in the four corner coding between two characters using the edit distance. The computer equipment determines the four-corner coding sequence editing distance between the first character and the second character according to a plurality of codes included by the first four-corner code and a plurality of codes included by the second four-corner code, wherein the four-corner coding sequence editing distance refers to the minimum operation unit number required by the first character to be converted into the second character through deletion, insertion and replacement operations; the computer equipment determines the four-corner coding similarity between the first character and the second character according to the four-corner coding sequence editing distance between the first character and the second character.
The computer equipment can determine the similarity of the four-corner codes according to the edit distance of the four-corner code sequences between the first character and the second character by the following formula five;
wherein s issjEncoding similarity for four corners; the first four corner coding of the first character may be denoted as u ═ u1u2u3u4u5And the second four-corner code of the second character is expressed as v ═ v1v2v3v4v5;d55The four-corner coding sequence editing distance is used for representing the four-corner coding sequence editing distance between the first character and the second character; ssjFor representing the four corner coding similarity between the first character and the second character.
The process of determining the edit distance of the four-corner coding sequence by the computer device is the same as the process of determining the edit distance of the stroke sequence, and is not repeated herein.
In one possible implementation, the computer device may obtain a first weight of stroke number similarity, a second weight of stroke order similarity, a third weight of structure type similarity, and a fourth weight of four corner coding similarity, respectively; the computer equipment determines the structural similarity between the first character and the second character according to the stroke number similarity, the stroke sequence similarity, the structural type similarity, the four-corner coding similarity and the corresponding first weight, second weight, third weight and fourth weight between the first character and the second character through the following formula six;
formula six: s1=w1sbh+w2sbs+w3sjg+w4ssj;
Wherein s is1Is the structural similarity of the first character and the second character, w1、w2、w3、w4Respectively a first weight, a second weight, a third weight and a fourth weight, w1+w2+w3+w4=1;sbhAs stroke number similarity, sbsIs the stroke order similarity; sjgIs the structural type similarity; ssjThe similarity is encoded for four corners.
In a possible implementation manner, the computer device may store, in advance, a first weight, a second weight, a third weight, and a fourth weight corresponding to each of the stroke number similarity, the stroke order similarity, the structure type similarity, and the four corner coding similarity, and the computer device may obtain the first weight, the second weight, the third weight, and the fourth weight in real time, and perform the determination process of the structure similarity according to the above formula four. In one possible example, w takes into account that stroke order and four corner coding have a large impact on character similarity1,w4The weight may take the value 0.3, w2,w3The weight may take the value 0.2.
It should be noted that the computer device obtains structural features of multiple angles, such as strokes, stroke orders, four-corner codes, structural types, and the like, of the first character and the second character, and comprehensively determines the structural features between the two characters according to the structural features of the multiple angles. Because the four-corner coding of the character can well represent the distribution and the structure of the character from the four corners of the character, the coding length is fixed, and the similarity of character fonts can be well represented. And the computer equipment also comprehensively considers the characteristics of stroke number, stroke order, structure and the like, so that the structural similarity obtained by weighting is more accurate and comprehensive, and the accuracy of the structural similarity is improved.
204. The computer device obtains a first character image of the first character and a second character image of the second character.
The computer equipment acquires a first character image of the first character and a second character image of the second character, wherein the first character image and the second character image are respectively used for indicating screen display styles of the first character and the second character, and the screen display styles refer to display forms of the characters on equipment screens of different systems. In one possible embodiment, the computer device may acquire a first character image when displaying a first character on the screen, and a second character image when displaying a second character on the screen.
In one possible implementation, the font rendering manner of the first character on the devices of different systems is different, and the character images of different rendering manners for the same character are also different. Therefore, the server can acquire a first character image when the first character is displayed and a second character image when the second character is displayed on the devices of the same system. In one possible example, the plurality of devices may include a first device of an Android system, a second device of an iOS system, a third device of a Windows system, and the like. As shown in fig. 3, the character "good" is used as an example for explanation, as shown in fig. 3 (a), good displayed on the device of the Windows system is generally the sons body default to the system, as shown in fig. 3 (b), good displayed on the device of the Windows system is generally the liblackness default to the system, as shown in fig. 3 (c), and good displayed on the device of the iOS system is generally the apple blackness default to the system.
In another possible implementation, in order to avoid the character image on the device of one system being nonstandard or inaccurate and eliminate the noise of the image of a single system device, the computer device can also synthesize the character images of the devices of multiple systems to obtain the standard character image of the character. That is, the computer device may acquire a plurality of display images of the first character when the first character is displayed on devices of different systems, and acquire the first character image based on the plurality of display images. This step may include: the computer equipment acquires a plurality of display images of the first character on a plurality of equipment, determines a fused image of the display images of the first character as the first character image, and the equipment systems configured on the equipment are different; the computer device acquires a plurality of display images of the second character on the plurality of devices, and determines a fused image of the plurality of display images of the second character as the second character image. Wherein, the device systems configured on the plurality of devices are different. The fusion image of the display images comprises the image characteristics of the display images. The computer device can fuse the image features of the plurality of display images based on an image fusion algorithm to obtain a fused image of the plurality of display images.
In one possible example, the computer device may send a display instruction to the plurality of devices, the display instruction carrying the first character and the second character, the plurality of devices performing display of the first character and the second character respectively, and sending a plurality of display images when the first character and the second character are displayed to the computer device, the computer device receiving the plurality of display images of the first character and the plurality of display images of the second character sent by the plurality of devices.
205. The computer device determines an image similarity between the first character and the second character based on the first character image and the second character image.
The computer device may obtain the similarity between the first character image and the second character image through a target algorithm.
In a possible implementation manner, the computer device may detect a plurality of first feature points in the first character image and a plurality of second feature points in the second character image through a target algorithm, obtain a first descriptor of the plurality of first feature points and a second descriptor of the plurality of second feature points, determine, according to the first descriptor of the plurality of first feature points and the second descriptor of the plurality of second feature points, second feature points that match the first feature points, thereby obtaining a plurality of matching feature point pairs that match each other, and determine, according to the number of the matching feature points and the number of the first feature points, an image similarity between the first character image and the second character image. The feature point refers to a point in the image where the pixel features of the surrounding pixel points are more significant, for example, a pixel point on the boundary between the region where the character is located and the background region.
In a possible example, the computer device may employ SIFT (Scale-invariant feature transform) to extract a plurality of first feature points in the first character image and a first descriptor of the plurality of first feature points, and a plurality of second feature points in the second character image and a second descriptor of the plurality of second feature points, respectively, and the process may include the following steps (1) - (4):
step (1): the computer device constructs a first scale space by the following formula seven,
the formula seven: l (x, y, σ) ═ I (x, y) × G (x, y, σ);
wherein I (x, y) represents the first image or the second image,representing a scale space gaussian filter kernel and L (x, y, σ) representing a first scale space, which may be a gaussian scale space, for example.
The computer device then constructs a second scale space by the following equation eight,
the formula eight: d (x, y, σ) ═ L (x, y, k σ) -L (x, y, σ);
where D (x, y, σ) represents a differential gaussian (Dog) scale space and k is a scale factor of adjacent images in the scale space, e.g., the second scale space may be a differential gaussian scale space. .
Step (2): the computer device detects extreme points along a plurality of preset directions in the second scale space, and obtains the coordinate positions of a plurality of first characteristic points in the first image and the coordinate positions of a plurality of second characteristic points in the second image according to the following formula nine.
The formula is nine: f (x, y, σ) ═ maximum/minimum { D (x ± 1, y ± 1, (k ± 1) σ) };
wherein, maximum/minimum represents the maximum extreme point or the minimum extreme point, and D (x + -1, y + -1, (k + -1) sigma) represents the neighborhood point of D (x, y, k sigma).
It should be noted that the plurality of preset directions may include: horizontal direction, vertical direction and dimension direction. The plurality of first feature points may include a maximum extreme point and a minimum extreme point in the first image. The plurality of second feature points may include a maximum extreme point and a minimum extreme point in the second image.
And (3): the computer device determines the amplitude of the first feature point and the amplitude of the second feature point by the following formula ten. For each feature point in the plurality of first feature points and the plurality of second feature points, the computer device counts an amplitude value histogram and a argument histogram of each pixel point in a region with a preset radius as a center and a preset radius as a radius, determines a principal direction of the feature point according to the amplitude value histogram and the argument histogram and by using the following formula eleven, and obtains the amplitude value histogram and the principal direction of each feature point in the first feature point and the second feature point.
where m (x, y) represents the magnitude of the first feature point or the second feature point. θ (x, y) a principal direction of the first feature point or the second feature point.
And (4): for each first feature point, the computer device rotates along the main direction by taking the first feature point as a center according to the amplitude and the main direction of the first feature point, so as to obtain a statistical region of a target radius by taking the first feature point as the center, divides the statistical region of the target radius into a plurality of equally spaced regions to obtain a plurality of sub-regions, counts a gradient direction histogram of pixel points in each sub-region, determines a plurality of direction gradient intensity information of the first feature point in each sub-region, and takes a plurality of direction gradient intensity information of each first feature point in the plurality of sub-regions as a descriptor of the first feature point. For example, the first feature point is divided into n × n equally spaced regions to obtain n × n sub-regions, and m pieces of directional gradient strength information in each sub-region are acquired to obtain an n × n × m dimensional feature descriptor of the first feature point. For the processing procedure of the second feature point, the same processing procedure as that of the first feature point is performed, and details are not repeated here.
Then, for each first feature point, the computer device detects whether each of a plurality of second feature points matches the first feature point, if a second feature point matching the first feature point is detected, the first feature point and a second feature point matching the first feature point are regarded as a pair of feature point pairs, and if no point matching the first feature point exists in the plurality of second features, the first feature point does not have a corresponding matching second feature point. The computer continues to detect a second feature point that matches the next first feature point. Thereby determining pairs of characteristic points.
For each first feature point and each second feature point, the computer device may obtain a third feature point in the first image that is closest to the first feature point, obtain a fourth feature point in the second image that is closest to the second feature point, and determine whether the first feature point and the second feature point are matched by using the following formula twelve;
equation twelve: | v | (V)1-v4||/||v1-v3||<ρ;
Wherein v is1Is a first feature point, v, in the first image3Is the first image and the first feature point v1The closest feature point; v. of2Is a second feature point, v, in the second image4Is the second image and the second feature point v2The closest feature point; ρ represents a second preset threshold. That is, when the relationship among the first feature point, the second feature point, the third feature point, and the fourth feature point satisfies the formula twelve, it is determined that the first feature point and the second feature point match each other. Otherwise, the first feature point and the second feature point are determined not to match. In a possible example, the second preset threshold ρ may be set based on needs, which is not particularly limited by the embodiments of the present disclosure, for example, the second preset threshold ρ may be 0.8.
In one possible embodiment, the computer device determines second feature points in the second image that match the first feature points of the first image, using the first image as a reference map, and determines first feature points in the first image that match the second feature points of the second image, using the second image as a reference map, respectively. The computer device determines the image similarity between the first image and the second image according to a second feature point matched with the first feature point of the first image in the second image, a first feature point matched with the second feature point of the second image in the first image and the following formula thirteen;
wherein s is2Representing the image similarity between the first character image and the second character image, N, M, K, L are P, Q, X, Y sets number, respectively. Set P ═ P (P)1,p2,…,pN) A set representing a plurality of first feature points in the first character image, the set Q ═ Q1,q2,…,qM) A set of a plurality of second feature points in the second character image is represented. N, M denote the number of first characteristic points and the number of second characteristic points, respectively.
And determining a first characteristic point matched with the second characteristic point in the first character image by taking the second character image as a reference image, and obtaining a plurality of characteristic point pairs, wherein the matched characteristic point pair is X ═ X (X)1,x2,…,xK) (ii) a And determining second characteristic points matched with the first characteristic points in the second character image by taking the first character image as a reference image, and obtaining a plurality of characteristic point pairs, wherein the matched characteristic point pair is Y ═ Y (Y)1,y2,…,yL) K, L indicate the number of pairs of feature points that match each other when the second character image is used as the reference image, and the number of pairs of feature points that match each other when the first character image is used as the reference image, respectively.
In a possible implementation manner, the above process is only an example of a process of extracting feature points and determining descriptors of a character image by using a SIFT algorithm, and of course, the computer device may perform the process of extracting feature points and determining descriptors of the character image by using other algorithms, which is not particularly limited in this embodiment of the disclosure, for example, the computer device may also perform the process of extracting feature points and determining descriptors of the character image by using a Speeded Up Robust Features (SURF) algorithm.
The computer device performs scale rotation invariant feature extraction on the character image through a SIFT algorithm, so the embodiment of the disclosure may be applicable to recognition of translation, scale, symmetry, and rotation shape near characters, as shown in fig. 4, taking an image of an apple square black character displayed by the device of the iOS system as an example, the second preset threshold ρ may adopt 0.8, as shown in fig. 4, feature point matching distribution maps of translation shape near characters "e" and "he", symmetry shape near characters "concave" and "convex", rotation shape near character "region" and "murder", non-shape near characters "e" and "region", respectively.
It should be noted that, in this step, the computer device may perform scale rotation invariant feature extraction on the font picture by using the SIFT algorithm, so that the embodiment of the present disclosure may be applicable to the translation, scale, symmetry, and rotation near-word as shown in fig. 4 based on the SIFT picture matching algorithm, and the similarity of the pictures is calculated by using the feature point matching number, thereby implementing more accurate matching of the character pictures and improving the accuracy of the character similarity.
206. The computer device determines a similarity between the first character and the second character according to the structural similarity and the image similarity.
The computer equipment determines the similarity between the first character and the second character according to the structural similarity and the image similarity through a fourteen following formula;
where s is the similarity between the first character and the second character, s1For structural similarity, s2Is the image similarity.
It should be noted that, a partial font has a higher structural similarity, and a partial font has a higher image similarity. In the embodiment of the present disclosure, according to the information fusion criterion, for any two metrics x and y, if the metric depends on the two metrics x and y, the symmetric sum of the two metrics may be defined as:where f (x, y) is a sum function of two metrics, and if f (x, y) is set to xy, thenIf x is 0 and y is 0, then γ (x, y) is 0; if x is 1 and y is 1, then γ (x, y) is 1; if x is 0.5 and y is 0.6, then γ (x, y) is 0.6, and γ (x, y) achieves enhanced fusion of the two metrics. Therefore, in the step, the computer equipment can fuse the structural similarity and the image similarity through the information fusion criterion, so that the structural characteristics of the characters and the display characteristics on the images are well combined, the similarity is judged, and the accuracy of character recognition is improved.
For example, as shown in fig. 4, taking an image of an apple black font displayed by a device of the iOS system as an example, the second preset threshold ρ may be 0.8, as shown in fig. 4, which are feature point matching distribution maps of translational near characters "e" and "he", symmetric near characters "concave" and "convex", rotational near character "zone" and "fierce", and non-shaped near characters "e" and "zone", respectively, and a corresponding picture similarity s20.67, 0.59, 0.61 and 0.25 respectively, when the character similarity is determined by the method provided by the embodiment of the disclosure, the similarity of the shape-similar characters is far higher than that of the non-shape-similar characters, so that the judgment process of the shape-similar characters can be accurately carried out.
And after the computer equipment obtains the similarity determination result of the first character and the second character according to the similarity between the first character and the second character. Wherein the shape-near word determination result is used to indicate whether the first character and the second character are shape-near words. In one possible implementation manner, the computer device may perform the determination of whether the first character and the second character are similar characters through the following steps 207 and 209.
207. When the similarity of the first character and the second character is larger than a first preset threshold value, the computer equipment obtains a first result of the first character and the second character.
The first result is used for indicating that the first character and the second character are similar characters. As shown in FIG. 5, the first predetermined threshold may be 0.9, and when the similarity between the first character and the second character is not greater than the first predetermined threshold, the computer device continues to perform the determination process of step 208 and step 209. Of course, the first preset threshold may also be other values, and the specific value of the first preset threshold is not specifically limited in the embodiment of the present disclosure, for example, the first preset threshold may also be 0.8, 0.95, and the like.
208. And when the similarity of the first character and the second character is smaller than a first preset threshold value, the computer equipment acquires a first root character to which the first character belongs and a second root character to which the second character belongs.
The character information base stored in the computer device further includes a root character of each character, so that the computer device can search the first root character to which the first character belongs and the second root character to which the second character belongs from the character information base.
209. And the computer equipment acquires a second result of the first character and the second character according to the first root character, the second root character, the structural similarity and the image similarity of the first character.
The second result is used to indicate whether the first character and the second character are near-word-shaped characters. As shown in fig. 5, the computer device may determine whether a first root character and a second root character are the same according to the first root character and the second root character, if the first root character and the second root character are the same, that is, the first character and the second character are not different root characters, the computer device determines whether the structural similarity is greater than a third preset threshold and whether the image similarity is greater than a fourth preset threshold, if the structural similarity is greater than the third preset threshold and the image similarity is greater than the fourth preset threshold, it determines that the first character and the second character are similar characters, otherwise, it determines that the first character and the second character are not similar characters.
If the first root character and the second root character are the same, that is, the first character and the second character are the same root character, the computer device determines whether the structural similarity is greater than a fifth preset threshold, and determines that the first character and the second character are not similar to a near character when the structural similarity is not greater than the fifth preset threshold. When the structural similarity is greater than a fifth preset threshold, the computer equipment continuously judges whether the image similarity is greater than a sixth preset threshold and whether the similarity of the first character and the second character is greater than a seventh preset threshold, and when the image similarity is greater than the sixth preset threshold and the similarity of the first character and the second character is greater than the seventh preset threshold, the computer equipment determines that the first character and the second character are similar characters; otherwise, the computer device determines that the first character and the second character do not form a near word.
In a specific example, the following table 1 is a result of recognition of a shape-approximating word based on a mixed similarity, and the shape-approximating word of some common characters listed in table 1 is detected based on the similarity between the first character and the second character determined in steps 201 and 206 in the embodiment of the present disclosure, and compared with the result of detection based on the similarity of only strokes or stroke sequences, the observation result indicates that the shape-approximating word determined by the method in the embodiment of the present disclosure is more accurate.
TABLE 1
Character font | Shaped like a word (the disclosed embodiment) | Shape word (similarity of strokes or stroke order only) |
O | Supple and graceful Actinidia yao | Supple and graceful has a main body |
King (Chinese character of 'Wang') | Dried Yushi-Niantu | Main work of nonyl jade |
Soil for soil | Shi worker ten king lower stopper | Shi Gong Yi (the cun of Shi Gong) |
Liu (traditional Chinese medicine) | Agent for treating cancer | Criminal agent carving |
Exercise with exercising function | Cunning knife | Cun-wan |
Is not limited to | Lower part | Gangster |
Is prepared from | Leishui Jumei rice | Lei Mo Dong He |
Has already got | Has already been paid | Has already had a bow |
An ancient type of spoon | Erqibi (medicine for treating infantile eczema) | Children's table |
In the embodiment of the present disclosure, the computer device may acquire the first structural feature and the second structural feature, determine a structural similarity between the first character and the second character, evaluate a degree of similarity between the two characters from a structural angle of the characters themselves, and further determine an image similarity between the first character and the second character based on a first character image of the first character and a second character image of the second character to determine the degree of similarity from an image display angle. Then, the computer device integrates the character structure and the similarity degree of two angles displayed by the image, accurately determines the similarity degree of the first character and the second character, and further determines whether the first character and the second character are similar characters, so that the accuracy of the similar character determination is improved.
FIG. 6 is a block diagram illustrating a near word determining apparatus according to an example embodiment. Referring to fig. 6, the apparatus includes a first obtaining module 601, a second obtaining module 602, a first determining module 603, a second determining module 604, and a third determining module 605.
A first obtaining module 601 configured to obtain a first character and a second character;
a second obtaining module 602 configured to obtain a first structural feature of the first character and a second structural feature of the second character;
a first determining module 603 configured to determine a structural similarity between the first character and the second character according to the first structural feature and the second structural feature;
a second determination module 604 configured to determine an image similarity between the first character and the second character based on a first character image of the first character and a second character image of the second character;
a third determining module 605 configured to determine a similarity of the first character and the second character according to the structural similarity and the image similarity;
a fourth determining module 606 configured to obtain a shape-near word determining result of the first character and the second character according to the similarity between the first character and the second character, where the shape-near word determining result is used to indicate whether the first character and the second character are shape-near words.
In a possible implementation manner, the structural features of the character include a stroke, a stroke order, a structural type, and a four-corner code of the character, and the second obtaining module is configured to query a storage address of the first character and a storage address of the second character from a character information base according to the character identifier of the first character and the character identifier of the second character, respectively; and acquiring a first stroke, a first stroke sequence, a first structure type and a first four-corner code of the first character from the storage address of the first character, and acquiring a second stroke, a second stroke sequence, a second structure type and a second four-corner code of the second character from the storage address of the second character.
In one possible implementation manner, the first determining module is configured to count a first stroke number of the first character and a second stroke number of the second character according to the first stroke and the second stroke, respectively, and determine a stroke number similarity between the first character and the second character according to the first stroke number and the second stroke number; determining stroke order similarity between the first character and the second character according to the first stroke order and the second stroke order; determining the structure type similarity between the first character and the second character according to the first structure type and the second structure type; determining the four-corner coding similarity between the first character and the second character according to the first four-corner coding and the second four-corner coding; determining the structural similarity between the first character and the second character according to the stroke number similarity, the stroke sequence similarity, the structural type similarity, the four-corner coding similarity, the first weight, the second weight, the third weight and the fourth weight between the first character and the second character;
the first weight is the weight of stroke number similarity, the second weight is the weight of stroke order similarity, the third weight is the weight of structure type similarity, and the fourth weight is the weight of four-corner coding similarity.
In one possible implementation, the first determining module is configured to determine a stroke order edit distance between the first character and the second character according to a plurality of stroke order encodings included in the first stroke order and a plurality of stroke order encodings included in the second stroke order, where the stroke order edit distance is a minimum number of operation units required for transforming a stroke of the first character into the second character by performing an edit operation on the stroke; determining stroke order similarity between the first character and the second character according to a stroke order edit distance between the first character and the second character, a first stroke number of the first character and a second stroke number of the second character.
In one possible implementation, the second determining module includes:
an acquisition unit configured to acquire a first character image of the first character and a second character image of the second character, the first character image and the second character image being used to represent screen display styles of the first character and the second character, respectively;
a determination unit configured to determine an image similarity between the first character and the second character based on the first character image and the second character image.
In one possible implementation manner, the determining unit is configured to extract a plurality of first feature points in the first character image and a plurality of second feature points in the second character image, and obtain a first descriptor of the plurality of first feature points and a second descriptor of the plurality of second feature points; determining a second feature point matched with each first feature point according to the first descriptors of the plurality of first feature points and the second descriptors of the plurality of second feature points to obtain a plurality of matched feature point pairs which are matched with each other; and determining the image similarity between the first character image and the second character image according to the number of the plurality of matched characteristic point pairs and the number of the first characteristic points.
In one possible implementation, the target algorithm is a scale rotation non-transformed SIFT image matching algorithm.
In one possible implementation, the obtaining unit is further configured to any one of:
acquiring a first character image when the first character is displayed and a second character image when the second character is displayed on equipment with the same system; and
acquiring a plurality of display images of the first character on a plurality of devices, and determining a fused image of the display images of the first character as the first character image; and acquiring a plurality of display images of the second character on the plurality of devices, and determining a fused image of the plurality of display images of the second character as the second character image, wherein the plurality of devices are configured with different systems.
In one possible implementation, the fourth determining module is further configured to: when the similarity of the first character and the second character is larger than a first preset threshold value, acquiring a first result of the first character and the second character, wherein the first result is used for indicating that the first character and the second character are similar characters; when the similarity between the first character and the second character is not smaller than the first preset threshold, acquiring a first root character to which the first character belongs and a second root character to which the second character belongs, and determining a second result of the first character and the second character according to the first root character, the second root character, the structural similarity between the first character and the second character, and the image similarity, wherein the second result is used for indicating whether the first character and the second character are similar characters or not.
In one possible implementation manner, the fourth determining module is further configured to determine that the first character and the second character are similar characters when the first root character and the second root character are the same, the structural similarity is greater than a third preset threshold, and the image similarity is greater than a fourth preset threshold; when the first root character and the second root character are the same and the structural similarity is not larger than a fifth preset threshold value, determining that the first character and the second character are not similar to a character; and when the first root character and the second root character are the same, the structural similarity is greater than a fifth preset threshold value, the image similarity is greater than a sixth preset threshold value, and the similarity between the first character and the second character is greater than a seventh preset threshold value, determining that the first character and the second character are similar characters.
In the embodiment of the present disclosure, by acquiring the first structural feature and the second structural feature, the structural similarity between the first character and the second character is determined, the degree of similarity between the two characters is evaluated from the structural angle of the characters themselves, and the image similarity between the first character and the second character is also determined based on the first character image of the first character and the second character image of the second character, so as to determine the degree of similarity from the image display angle. Then, the computer device integrates the character structure and the similarity degree of two angles displayed by the image, accurately determines the similarity degree of the first character and the second character, and further determines whether the first character and the second character are similar characters, so that the accuracy of the similar character determination is improved.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
It should be noted that: in the above embodiment, when determining the similarity of characters, the device for determining shape and proximity characters provided by the above embodiment is only illustrated by the division of the above functional modules, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the computer device is divided into different functional modules to complete all or part of the above described functions. In addition, the apparatus for determining a shape-similar word provided by the above embodiment and the method for determining a shape-similar word belong to the same concept, and the specific implementation process thereof is described in the method embodiment and is not described herein again.
Fig. 7 shows a block diagram of a terminal 700 according to an exemplary embodiment of the present disclosure. The terminal 700 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. Terminal 700 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, and so on.
In general, terminal 700 includes: a processor 701 and a memory 702.
The processor 701 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 701 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 701 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 701 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 701 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.
In some embodiments, the terminal 700 may further optionally include: a peripheral interface 703 and at least one peripheral. The processor 701, the memory 702, and the peripheral interface 703 may be connected by buses or signal lines. Various peripheral devices may be connected to peripheral interface 703 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 704, touch screen display 705, camera 706, audio circuitry 707, positioning components 708, and power source 709.
The peripheral interface 703 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 701 and the memory 702. In some embodiments, processor 701, memory 702, and peripheral interface 703 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 701, the memory 702, and the peripheral interface 703 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.
The Radio Frequency circuit 704 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 704 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 704 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 704 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 704 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 704 may also include NFC (Near Field Communication) related circuits, which are not limited in this application.
The display screen 705 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 705 is a touch display screen, the display screen 705 also has the ability to capture touch signals on or over the surface of the display screen 705. The touch signal may be input to the processor 701 as a control signal for processing. At this point, the display 705 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 705 may be one, providing the front panel of the terminal 700; in other embodiments, the display 705 can be at least two, respectively disposed on different surfaces of the terminal 700 or in a folded design; in still other embodiments, the display 705 may be a flexible display disposed on a curved surface or on a folded surface of the terminal 700. Even more, the display 705 may be arranged in a non-rectangular irregular pattern, i.e. a shaped screen. The Display 705 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), or the like.
The camera assembly 706 is used to capture images or video. Optionally, camera assembly 706 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 706 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.
The audio circuitry 707 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 701 for processing or inputting the electric signals to the radio frequency circuit 704 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones may be provided at different portions of the terminal 700. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 701 or the radio frequency circuit 704 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 707 may also include a headphone jack.
The positioning component 708 is used to locate the current geographic Location of the terminal 700 for navigation or LBS (Location Based Service). The Positioning component 708 can be a Positioning component based on the GPS (Global Positioning System) in the united states, the beidou System in china, the graves System in russia, or the galileo System in the european union.
In some embodiments, terminal 700 also includes one or more sensors 710. The one or more sensors 710 include, but are not limited to: acceleration sensor 711, gyro sensor 712, pressure sensor 713, fingerprint sensor 714, optical sensor 715, and proximity sensor 716.
The acceleration sensor 711 can detect the magnitude of acceleration in three coordinate axes of a coordinate system established with the terminal 700. For example, the acceleration sensor 711 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 701 may control the touch screen 705 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 711. The acceleration sensor 711 may also be used for acquisition of motion data of a game or a user.
The gyro sensor 712 may detect a body direction and a rotation angle of the terminal 700, and the gyro sensor 712 may cooperate with the acceleration sensor 711 to acquire a 3D motion of the terminal 700 by the user. From the data collected by the gyro sensor 712, the processor 701 may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.
Pressure sensors 713 may be disposed on a side bezel of terminal 700 and/or an underlying layer of touch display 705. When the pressure sensor 713 is disposed on a side frame of the terminal 700, a user's grip signal on the terminal 700 may be detected, and the processor 701 performs right-left hand recognition or shortcut operation according to the grip signal collected by the pressure sensor 713. When the pressure sensor 713 is disposed at a lower layer of the touch display 705, the processor 701 controls the operability control on the UI interface according to the pressure operation of the user on the touch display 705. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.
The fingerprint sensor 714 is used for collecting a fingerprint of a user, and the processor 701 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 714, or the fingerprint sensor 714 identifies the identity of the user according to the collected fingerprint. When the user identity is identified as a trusted identity, the processor 701 authorizes the user to perform relevant sensitive operations, including unlocking a screen, viewing encrypted information, downloading software, paying, changing settings, and the like. The fingerprint sensor 714 may be disposed on the front, back, or side of the terminal 700. When a physical button or a vendor Logo is provided on the terminal 700, the fingerprint sensor 714 may be integrated with the physical button or the vendor Logo.
The optical sensor 715 is used to collect the ambient light intensity. In one embodiment, the processor 701 may control the display brightness of the touch display 705 based on the ambient light intensity collected by the optical sensor 715. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 705 is increased; when the ambient light intensity is low, the display brightness of the touch display 705 is turned down. In another embodiment, processor 701 may also dynamically adjust the shooting parameters of camera assembly 706 based on the ambient light intensity collected by optical sensor 715.
A proximity sensor 716, also referred to as a distance sensor, is typically disposed on a front panel of the terminal 700. The proximity sensor 716 is used to collect the distance between the user and the front surface of the terminal 700. In one embodiment, when the proximity sensor 716 detects that the distance between the user and the front surface of the terminal 700 gradually decreases, the processor 701 controls the touch display 705 to switch from the bright screen state to the dark screen state; when the proximity sensor 716 detects that the distance between the user and the front surface of the terminal 700 gradually becomes larger, the processor 701 controls the touch display 705 to switch from the breath screen state to the bright screen state.
Those skilled in the art will appreciate that the configuration shown in fig. 7 is not intended to be limiting of terminal 700 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.
Fig. 8 is a schematic structural diagram of a server according to an embodiment of the present disclosure, where the server 800 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 801 and one or more memories 802, where the memory 802 stores at least one instruction, and the at least one instruction is loaded and executed by the processor 801 to implement the shape word determining method provided by each method embodiment. Of course, the server may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input/output, and the server may also include other components for implementing the functions of the device, which are not described herein again.
In an exemplary embodiment, a computer-readable storage medium, such as a memory, is also provided that includes instructions executable by a processor in a computer device to perform the method of determining a near-word shape in the embodiments described below. For example, the computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
In an exemplary embodiment, an application is provided that includes one or more instructions that, when executed by a processor of a computer device, enable the computer device to perform the method for determining a near-word.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.
Claims (18)
1. A method for determining a shape near word, comprising:
acquiring a first character and a second character;
acquiring a first structural feature of the first character and a second structural feature of the second character;
determining a structural similarity between the first character and the second character according to the first structural feature and the second structural feature;
determining an image similarity between the first character and the second character based on a first character image of the first character and a second character image of the second character, wherein the determining the image similarity between the first character and the second character based on the first character image of the first character and the second character image of the second character comprises: acquiring a first character image of the first character and a second character image of the second character, wherein the first character image and the second character image are respectively used for representing screen display styles of the first character and the second character, the screen display styles refer to display forms of characters on equipment screens of different systems, and the image similarity between the first character and the second character is determined according to the first character image and the second character image;
determining the similarity between the first character and the second character according to the structural similarity and the image similarity through a formula, wherein the formula is as follows:
wherein s is the similarity between the first character and the second character, s1For the structural similarity, s2The image similarity is obtained;
and acquiring a shape-similar word determination result of the first character and the second character according to the similarity between the first character and the second character, wherein the shape-similar word determination result is used for indicating whether the first character and the second character are shape-similar words or not.
2. The method according to claim 1, wherein the structural features of the character include stroke, stroke order, structural type and four corner coding of the character, and the obtaining the first structural feature of the first character and the second structural feature of the second character comprises:
inquiring a storage address of the first character and a storage address of the second character from a character information base according to the character identifier of the first character and the character identifier of the second character respectively;
and acquiring a first stroke, a first stroke sequence, a first structure type and a first four-corner code of the first character from the storage address of the first character, and acquiring a second stroke, a second stroke sequence, a second structure type and a second four-corner code of the second character from the storage address of the second character.
3. The method of claim 2, wherein determining the structural similarity between the first character and the second character based on the first structural feature and the second structural feature comprises:
respectively counting the number of first strokes of the first character and the number of second strokes of the second character according to the first strokes and the second strokes, and determining the stroke number similarity between the first character and the second character according to the number of the first strokes and the number of the second strokes;
determining stroke order similarity between the first character and the second character according to the first stroke order and the second stroke order;
determining the structure type similarity between the first character and the second character according to the first structure type and the second structure type;
determining the four-corner coding similarity between the first character and the second character according to the first four-corner coding and the second four-corner coding;
determining the structural similarity between the first character and the second character according to the stroke number similarity, the stroke sequence similarity, the structural type similarity, the four-corner coding similarity, the first weight, the second weight, the third weight and the fourth weight between the first character and the second character;
the first weight is the weight of stroke number similarity, the second weight is the weight of stroke sequence similarity, the third weight is the weight of structure type similarity, and the fourth weight is the weight of four-corner coding similarity.
4. The method of claim 3, wherein determining stroke order similarity between the first character and the second character based on the first stroke order and the second stroke order comprises:
determining a stroke sequence editing distance between the first character and the second character according to a plurality of stroke sequence codes included in the first stroke sequence and a plurality of stroke sequence codes included in the second stroke sequence, wherein the stroke sequence editing distance is the minimum operation unit number required for converting strokes of the first character into the second character through editing operation;
determining stroke order similarity between the first character and the second character according to the stroke order edit distance between the first character and the second character, the first stroke number of the first character and the second stroke number of the second character.
5. The method according to claim 1, wherein the determining the image similarity between the first character and the second character from the first character image and the second character image comprises:
extracting a plurality of first feature points in the first character image and a plurality of second feature points in the second character image, and acquiring first descriptors of the plurality of first feature points and second descriptors of the plurality of second feature points;
determining a second feature point matched with each first feature point according to the first descriptors of the plurality of first feature points and the second descriptors of the plurality of second feature points to obtain a plurality of matched feature point pairs which are matched with each other;
and determining the image similarity between the first character image and the second character image according to the number of the plurality of matched characteristic point pairs and the number of the first characteristic points.
6. The method according to claim 1, wherein the acquiring of the first character image of the first character and the second character image of the second character comprises any one of:
acquiring a first character image when the first character is displayed and a second character image when the second character is displayed on equipment with the same system; and
the method comprises the steps of obtaining a plurality of display images of a first character on a plurality of devices, determining a fused image of the display images of the first character as the first character image, obtaining a plurality of display images of a second character on the plurality of devices, and determining a fused image of the display images of the second character as the second character image, wherein the systems configured on the devices are different.
7. The method for determining the shape-near word according to claim 1, wherein the obtaining of the shape-near word determination result of the first character and the second character according to the similarity between the first character and the second character comprises:
when the similarity of the first character and the second character is larger than a first preset threshold value, acquiring a first result of the first character and the second character, wherein the first result is used for indicating that the first character and the second character are similar characters;
when the similarity between the first character and the second character is not smaller than the first preset threshold, acquiring a first root character to which the first character belongs and a second root character to which the second character belongs, and acquiring a second result of the first character and the second character according to the first root character, the second root character, the structural similarity between the first character and the second character, and the image similarity, wherein the second result is used for indicating whether the first character and the second character are similar characters or not.
8. The method according to claim 7, wherein the obtaining a second result of the first character and the second character according to the first root character, the second root character, and the structural similarity and the image similarity between the first character and the second character comprises:
when the first root character and the second root character are the same, the structural similarity is larger than a third preset threshold value, and the image similarity is larger than a fourth preset threshold value, determining that the first character and the second character are similar characters;
when the first root character and the second root character are the same and the structural similarity is not larger than a fifth preset threshold value, determining that the first character and the second character are not similar to a word;
and when the first root character and the second root character are the same, the structural similarity is greater than a fifth preset threshold, the image similarity is greater than a sixth preset threshold, and the similarity between the first character and the second character is greater than a seventh preset threshold, determining that the first character and the second character are similar characters.
9. A font near word determination apparatus, comprising:
a first obtaining module configured to obtain a first character and a second character;
a second obtaining module configured to obtain a first structural feature of the first character and a second structural feature of the second character;
a first determination module configured to determine a structural similarity between the first character and the second character based on the first structural feature and the second structural feature;
a second determination module configured to determine an image similarity between a first character image of the first character and a second character image of the second character based on the first character image and the second character image; wherein the second determining module comprises: an acquisition unit configured to acquire a first character image of the first character and a second character image of the second character, the first character image and the second character image being used to represent screen display styles of the first character and the second character, respectively, a determination unit configured to determine an image similarity between the first character and the second character according to the first character image and the second character image;
a third determining module configured to determine the similarity between the first character and the second character according to the structural similarity and the image similarity by using a formula:
wherein s is the similarity between the first character and the second character, s1For the structural similarity, s2The image similarity is obtained;
a fourth determining module configured to obtain a shape-similar word determining result of the first character and the second character according to the similarity between the first character and the second character, wherein the shape-similar word determining result is used for indicating whether the first character and the second character are shape-similar words.
10. The shape near word determining apparatus according to claim 9, wherein the structural characteristics of the character include a stroke, a stroke order, a structural type and a four-corner coding of the character,
the second obtaining module is configured to query a storage address of the first character and a storage address of the second character from a character information base according to the character identifier of the first character and the character identifier of the second character respectively; and acquiring a first stroke, a first stroke sequence, a first structure type and a first four-corner code of the first character from the storage address of the first character, and acquiring a second stroke, a second stroke sequence, a second structure type and a second four-corner code of the second character from the storage address of the second character.
11. The near word shape determining apparatus according to claim 10,
the first determining module is configured to count a first stroke number of the first character and a second stroke number of the second character according to the first stroke and the second stroke, respectively, and determine stroke number similarity between the first character and the second character according to the first stroke number and the second stroke number; determining stroke order similarity between the first character and the second character according to the first stroke order and the second stroke order; determining the structure type similarity between the first character and the second character according to the first structure type and the second structure type; determining the four-corner coding similarity between the first character and the second character according to the first four-corner coding and the second four-corner coding; determining the structural similarity between the first character and the second character according to the stroke number similarity, the stroke sequence similarity, the structural type similarity, the four-corner coding similarity, the first weight, the second weight, the third weight and the fourth weight between the first character and the second character;
the first weight is the weight of stroke number similarity, the second weight is the weight of stroke sequence similarity, the third weight is the weight of structure type similarity, and the fourth weight is the weight of four-corner coding similarity.
12. The apparatus according to claim 11, wherein the first determining module is configured to:
determining a stroke sequence editing distance between the first character and the second character according to a plurality of stroke sequence codes included in the first stroke sequence and a plurality of stroke sequence codes included in the second stroke sequence, wherein the stroke sequence editing distance is the minimum operation unit number required for converting strokes of the first character into the second character through editing operation;
determining stroke order similarity between the first character and the second character according to the stroke order edit distance between the first character and the second character, the first stroke number of the first character and the second stroke number of the second character.
13. The device according to claim 9, wherein the determination unit is configured to extract a plurality of first feature points in the first character image and a plurality of second feature points in the second character image, and obtain a first descriptor of the plurality of first feature points and a second descriptor of the plurality of second feature points; determining a second feature point matched with each first feature point according to the first descriptors of the plurality of first feature points and the second descriptors of the plurality of second feature points to obtain a plurality of matched feature point pairs which are matched with each other; and determining the image similarity between the first character image and the second character image according to the number of the plurality of matched characteristic point pairs and the number of the first characteristic points.
14. The apparatus according to claim 13, wherein the obtaining unit is further configured to any one of:
acquiring a first character image when the first character is displayed and a second character image when the second character is displayed on equipment with the same system; and
acquiring a plurality of display images of the first character on a plurality of devices, and determining a fused image of the display images of the first character as the first character image; and acquiring a plurality of display images of the second character on the plurality of devices, and determining a fused image of the plurality of display images of the second character as the second character image, wherein the plurality of devices are configured with different systems.
15. The near word shape determining apparatus according to claim 9,
the fourth determination module further configured to: when the similarity of the first character and the second character is larger than a first preset threshold value, acquiring a first result of the first character and the second character, wherein the first result is used for indicating that the first character and the second character are similar characters; when the similarity of the first character and the second character is not smaller than the first preset threshold, acquiring a first root character to which the first character belongs and a second root character to which the second character belongs, and determining a second result of the first character and the second character according to the first root character, the second root character, the structural similarity and the image similarity between the first character and the second character, wherein the second result is used for indicating whether the first character and the second character are similar characters or not.
16. The near word shape determining apparatus according to claim 15,
the fourth determining module is further configured to determine that the first character and the second character are similar characters when the first root character and the second root character are the same, the structural similarity is greater than a third preset threshold, and the image similarity is greater than a fourth preset threshold; when the first root character and the second root character are the same and the structural similarity is not larger than a fifth preset threshold value, determining that the first character and the second character are not similar to a word; and when the first root character and the second root character are the same, the structural similarity is greater than a fifth preset threshold, the image similarity is greater than a sixth preset threshold, and the similarity between the first character and the second character is greater than a seventh preset threshold, determining that the first character and the second character are similar characters.
17. A computer device comprising one or more processors and one or more memories having stored therein at least one instruction that is loaded and executed by the one or more processors to perform operations performed by the method of determining a near word as claimed in any one of claims 1 to 8.
18. A non-transitory computer-readable storage medium having stored therein at least one instruction, which is loaded and executed by a processor to perform operations performed by the method for determining a near word according to any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910359360.3A CN110097002B (en) | 2019-04-30 | 2019-04-30 | Shape and proximity word determining method and device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910359360.3A CN110097002B (en) | 2019-04-30 | 2019-04-30 | Shape and proximity word determining method and device, computer equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110097002A CN110097002A (en) | 2019-08-06 |
CN110097002B true CN110097002B (en) | 2020-12-11 |
Family
ID=67446432
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910359360.3A Active CN110097002B (en) | 2019-04-30 | 2019-04-30 | Shape and proximity word determining method and device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110097002B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111222590B (en) * | 2019-12-31 | 2024-04-12 | 咪咕文化科技有限公司 | Shape-near-word determining method, electronic device, and computer-readable storage medium |
CN111242219A (en) * | 2020-01-14 | 2020-06-05 | 北大方正集团有限公司 | Character similarity determining method and device, electronic equipment and storage medium |
CN112766236B (en) * | 2021-03-10 | 2023-04-07 | 拉扎斯网络科技(上海)有限公司 | Text generation method and device, computer equipment and computer readable storage medium |
CN114972817A (en) * | 2022-04-25 | 2022-08-30 | 深圳创维-Rgb电子有限公司 | Image similarity matching method, device and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106598920A (en) * | 2016-11-28 | 2017-04-26 | 昆明理工大学 | Similar Chinese character classification method combining stroke codes with Chinese character dot matrixes |
CN108154167A (en) * | 2017-12-04 | 2018-06-12 | 昆明理工大学 | A kind of Chinese character pattern similarity calculating method |
CN109190615A (en) * | 2018-07-26 | 2019-01-11 | 徐庆 | Nearly word form identification decision method, apparatus, computer equipment and storage medium |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5909509A (en) * | 1996-05-08 | 1999-06-01 | Industrial Technology Research Inst. | Statistical-based recognition of similar characters |
CN104239882B (en) * | 2013-06-14 | 2017-05-03 | 富士通株式会社 | Image similarity determining device and method and image feature obtaining device and method |
CN103927330A (en) * | 2014-03-19 | 2014-07-16 | 北京奇虎科技有限公司 | Method and device for determining characters with similar forms in search engine |
CN105608462A (en) * | 2015-12-10 | 2016-05-25 | 小米科技有限责任公司 | Character similarity judgment method and device |
CN106874947B (en) * | 2017-02-07 | 2019-03-12 | 第四范式(北京)技术有限公司 | Method and apparatus for determining text shape recency |
CN109299726A (en) * | 2018-08-01 | 2019-02-01 | 昆明理工大学 | A kind of Chinese character pattern Similarity algorithm based on feature vector and stroke order coding |
-
2019
- 2019-04-30 CN CN201910359360.3A patent/CN110097002B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106598920A (en) * | 2016-11-28 | 2017-04-26 | 昆明理工大学 | Similar Chinese character classification method combining stroke codes with Chinese character dot matrixes |
CN108154167A (en) * | 2017-12-04 | 2018-06-12 | 昆明理工大学 | A kind of Chinese character pattern similarity calculating method |
CN109190615A (en) * | 2018-07-26 | 2019-01-11 | 徐庆 | Nearly word form identification decision method, apparatus, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110097002A (en) | 2019-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110097002B (en) | Shape and proximity word determining method and device, computer equipment and storage medium | |
CN111079576B (en) | Living body detection method, living body detection device, living body detection equipment and storage medium | |
CN110807361B (en) | Human body identification method, device, computer equipment and storage medium | |
CN110083791B (en) | Target group detection method and device, computer equipment and storage medium | |
CN110222789B (en) | Image recognition method and storage medium | |
CN110650379B (en) | Video abstract generation method and device, electronic equipment and storage medium | |
CN109815150B (en) | Application testing method and device, electronic equipment and storage medium | |
CN110059652B (en) | Face image processing method, device and storage medium | |
CN109522863B (en) | Ear key point detection method and device and storage medium | |
CN110991457B (en) | Two-dimensional code processing method and device, electronic equipment and storage medium | |
CN113038165B (en) | Method, apparatus and storage medium for determining encoding parameter set | |
CN112084811A (en) | Identity information determining method and device and storage medium | |
CN110738185B (en) | Form object identification method, form object identification device and storage medium | |
CN112396076A (en) | License plate image generation method and device and computer storage medium | |
CN111353946A (en) | Image restoration method, device, equipment and storage medium | |
CN110675473B (en) | Method, device, electronic equipment and medium for generating GIF dynamic diagram | |
CN110503159B (en) | Character recognition method, device, equipment and medium | |
CN111586279B (en) | Method, device and equipment for determining shooting state and storage medium | |
CN114741559A (en) | Method, apparatus and storage medium for determining video cover | |
CN112989198B (en) | Push content determination method, device, equipment and computer-readable storage medium | |
CN110728167A (en) | Text detection method and device and computer readable storage medium | |
CN111428551B (en) | Density detection method, density detection model training method and device | |
CN110232417B (en) | Image recognition method and device, computer equipment and computer readable storage medium | |
CN110163192B (en) | Character recognition method, device and readable medium | |
CN113343709B (en) | Method for training intention recognition model, method, device and equipment for intention recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |