CN106874947B - Method and apparatus for determining text shape recency - Google Patents

Method and apparatus for determining text shape recency Download PDF

Info

Publication number
CN106874947B
CN106874947B CN201710067064.7A CN201710067064A CN106874947B CN 106874947 B CN106874947 B CN 106874947B CN 201710067064 A CN201710067064 A CN 201710067064A CN 106874947 B CN106874947 B CN 106874947B
Authority
CN
China
Prior art keywords
picture
text
recency
shape
pixel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710067064.7A
Other languages
Chinese (zh)
Other versions
CN106874947A (en
Inventor
王珵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
4Paradigm Beijing Technology Co Ltd
Original Assignee
4Paradigm Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 4Paradigm Beijing Technology Co Ltd filed Critical 4Paradigm Beijing Technology Co Ltd
Priority to CN201710067064.7A priority Critical patent/CN106874947B/en
Publication of CN106874947A publication Critical patent/CN106874947A/en
Application granted granted Critical
Publication of CN106874947B publication Critical patent/CN106874947B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Abstract

A kind of method and apparatus for determining text shape recency is provided, the method includes (A) to carry out picture format conversion for the first text and the second text to be compared respectively, to obtain corresponding first picture and second picture;(B) the pixel matching situation between the first picture and second picture is determined;And (C) determines the shape recency between the first text and the second text based on determining pixel matching situation.In the method and apparatus according to an exemplary embodiment of the present invention for determining text shape recency, pixel matching degree after being picture format based on text conversion judges the shape short range degree of text, to be no longer limited by any coding rule and become a kind of general nearly word form method of determination.

Description

Method and apparatus for determining text shape recency
Technical field
All things considered of the present invention is related to computer information processing, more specifically to one kind based at computerized information The method and apparatus for managing to determine text shape recency.
Background technique
In the language such as Chinese, Japanese, Korean, there are approximate text on a large amount of fonts, these nearly word forms are deposited Easilying lead to the mistake in text use.
Past more searches the nearly word form that mistake uses by manpower, but since nearly word form itself is easy to obscure, leads Cause not only needs to expend a large amount of manpower, but also the case where inevitable mistakes and omissions.
With the development of computer technology, nearly word form is searched by the way of automatic identification more and more.In general, Text is based on font and is encoded (for example, being converted into according to coding rule by 1-5 characters (for example, letter or number etc.) group At code word) in the case where, the shape short range degree between text can be determined based on the editing distance between code word, here, as Example, editing distance can based on the increase of character between code word, delete or change to calculate.For example, Publication No. A kind of Chinese patent application " method and device that Chinese character string similarity is calculated based on editing distance " of CN103399907A A kind of method that Chinese character string similarity is calculated based on editing distance is disclosed, uses four-corner number code will be in character string Chinese character be converted into quadrangle coding, thus based on editing distance calculate Chinese character similarity.
Above-mentioned literal code is extremely strong to the dependence of coding rule, however, these coding rules can not effectively portray radical Similarity between radical.By taking Chinese character as an example, five codes, Cangjie's code, four corner braces etc. are based on the radical and/or stroke of Chinese character Equal components, but the editing distance between coding result can not usually embody the similarity degree between Chinese character, for example, " space " word 5-stroke coding be " PGFJ ", the 5-stroke coding of " looking up with eyes wide open " word is " HGFJ ", and editing distance between the two is only 1, but this two A word is visually and dissimilar.In another example the quadrangle coding of " behaviour " word is 56094, the quadrangle coding of " dry " word is 96894, Only have 2 identical characters between two code words, editing distance is larger, however the similarity of the two words is actually high.In addition, right In the especially few text of stroke number, for example, " two " and " three ", " dry " and " in ", it is also difficult to pass through the editing distance between code word To judge approximation.
As can be seen that necessarily by the limit of coding rule when judging text approximation based on the editing distance between code word System, that is to say, that the part text feature information lost when coding leads to be difficult to the shape short range degree for effectively measuring text.
Summary of the invention
Exemplary embodiment of the present invention is intended to overcome the defect for being difficult to accurately identify the nearly text of shape in the prior art.
An exemplary embodiment of the present invention provides a kind of method for determining text shape recency, comprising: (A) will First text and the second text to be compared carries out picture format conversion respectively, to obtain corresponding first picture and the second figure Piece;(B) the pixel matching situation between the first picture and second picture is determined;And (C) based on determining pixel matching situation To determine the shape recency between the first text and the second text.
Optionally, in the method, in step (A), the first text and the second text are distinguished according to raster font Carry out picture format conversion.
Optionally, in the method, in step (B), the first picture and second are determined as unit of single pixel Pixel matching situation between picture.
Optionally, in the method, step (B) include: (b1) to the pixel of the first picture and second picture respectively into Row binary conversion treatment;And (b2) is determined by the pixel matching between the first picture and second picture after binary conversion treatment Situation.
Optionally, in the method, step (B) includes: that (b3) carries out at least the first picture and second picture respectively A kind of geometric transformation;And (b4) determines the first picture before geometric transformation and the first figure after every kind of geometric transformation The piece pixel matching between the second picture before geometric transformation and the second picture after every kind of geometric transformation respectively Situation;Also, step (C) includes: that (c1) is based respectively on the pixel matching situation of each determination to determine the first text and second Candidate shape recency between text;And highest candidate's shape recency among determining candidate shape recency is determined as first by (c2) Shape recency between picture and second picture.
Optionally, in the method, at least one geometric transformation includes translation transformation, scale transformation, deletes capable change Change and delete rank transformation at least one of.
Optionally, in the method, in step (B), by considering the match condition between non-empty white pixel come really Pixel matching situation between fixed first picture and second picture.
Optionally, the method also includes steps (D) and step (E), wherein the method executes before step (A) Step (D): the stroke quantity of the first text and the second text to be compared is determined, wherein if determining first in step (D) Text and the second text at least one of stroke quantity be less than preset threshold, then the method executes step (A), step (B) and step (C);Otherwise, the method executes step (E): being determined based on the editing distance of the first text and the second text Shape recency between first text and the second text.
Optionally, in the method, the method determines every respectively for multiple first texts and multiple second texts Shape recency between a first text and each second text, also, the method also includes: (F) is directed to each first text, It selects close as the shape of each first text with described each at least one higher second text of first text shape recency Word, and establish the nearly word form library being made of the multiple first text and its nearly word form.
In accordance with an alternative illustrative embodiment of the present invention, a kind of method of execution text check and correction is provided, comprising: determine to school To text;The nearly word form of the text to be proofreaded is searched for from the nearly word form library established based on exemplary embodiment of the present;With And export the nearly word form of the text to be proofreaded searched.
In accordance with an alternative illustrative embodiment of the present invention, a kind of equipment for determining text shape recency is provided, comprising: turn Changing device, for the first text and the second text to be compared to be carried out picture format conversion respectively, to obtain corresponding first Picture and second picture;Determining device is matched, for determining the pixel matching situation between the first picture and second picture;And Shape recency determining device, for determining that the shape between the first text and the second text is close based on determining pixel matching situation Degree.
Optionally, in the apparatus, conversion equipment carries out the first text and the second text according to raster font respectively Picture format conversion.
Optionally, in the apparatus, matching determining device determines the first picture and second as unit of single pixel Pixel matching situation between picture.
Optionally, in the apparatus, matching determining device includes: binary conversion treatment unit, for the first picture and The pixel of second picture carries out binary conversion treatment respectively;And matching unit, for determining by the after binary conversion treatment Pixel matching situation between one picture and second picture.
Optionally, in the apparatus, matching determining device includes: converter unit, for the first picture and the second figure Piece carries out at least one geometric transformation respectively;And matching unit, for determining the first picture and warp before geometric transformation The first picture after crossing every kind of geometric transformation is respectively with the second picture before geometric transformation and after every kind of geometric transformation Second picture between pixel matching situation;Also, shape recency determining device includes: candidate shape recency determination unit, is used for The pixel matching situation of each determination is based respectively on to determine the candidate shape recency between the first text and the second text;And most High shape recency determination unit, for highest candidate's shape recency among the candidate shape recency determined to be determined as the first picture and Shape recency between two pictures.
Optionally, in the apparatus, at least one geometric transformation includes translation transformation, scale transformation, deletes capable change Change and delete rank transformation at least one of.
Optionally, in the apparatus, matching determining device is by considering the match condition between non-empty white pixel come really Pixel matching situation between fixed first picture and second picture.
Optionally, the equipment further include: stroke determining device, for determining the first text and the second text to be compared Stroke quantity;Wherein, if stroke determining device determine the first text and the second text at least one of stroke number Amount is less than preset threshold, then conversion equipment executes picture format conversion, and matching determining device determines pixel matching situation, and shape Recency determining device determines the shape recency between the first text and the second text based on determining pixel matching situation;Otherwise, Shape recency determining device is determined based on the editing distance of the first text and the second text between the first text and the second text Shape recency.
Optionally, the equipment determines each first text and every for multiple first texts and multiple second texts respectively Shape recency between a second text, also, the equipment further include: character library establishes device, for being directed to each first text, It selects close as the shape of each first text with described each at least one higher second text of first text shape recency Word, and establish the nearly word form library being made of the multiple first text and its nearly word form.
In accordance with an alternative illustrative embodiment of the present invention, a kind of equipment of execution text check and correction is provided, comprising: text determines Device, for determining text to be proofreaded;Nearly word form searcher, for being built from by equipment according to an exemplary embodiment of the present invention The nearly word form of the text to be proofreaded is searched in vertical nearly word form library;And nearly word form output device, it is searched for exporting The nearly word form of the text to be proofreaded.
In accordance with an alternative illustrative embodiment of the present invention, it provides a kind of for determining computer-readable Jie of text shape recency Matter, wherein record has the computer program for executing following steps on the computer-readable medium: (A) will be to be compared The first text and the second text carry out picture format conversion respectively, to obtain corresponding first picture and second picture;(B) really Pixel matching situation between fixed first picture and second picture;And (C) determines based on determining pixel matching situation Shape recency between one text and the second text.
Optionally, in the computer-readable medium, in step (A), according to raster font by the first text and Two texts carry out picture format conversion respectively.
Optionally, in the computer-readable medium, in step (B), first is determined as unit of single pixel Pixel matching situation between picture and second picture.
Optionally, in the computer-readable medium, step (B) includes: (b1) to the first picture and second picture Pixel carries out binary conversion treatment respectively;And (b2) is determined by between the first picture and second picture after binary conversion treatment Pixel matching situation.
Optionally, in the computer-readable medium, step (B) includes: (b3) to the first picture and second picture point It Jin Hang not at least one geometric transformation;And (b4) determines every kind of geometric transformation of the first picture and process before geometric transformation The first picture afterwards is respectively between the second picture before geometric transformation and the second picture after every kind of geometric transformation Pixel matching situation;Also, step (C) includes: that (c1) is based respectively on the pixel matching situation of each determination to determine first Candidate shape recency between text and the second text;And (c2) by highest candidate's shape recency among determining candidate shape recency The shape recency being determined as between the first picture and second picture.
Optionally, in the computer-readable medium, at least one geometric transformation includes translation transformation, scaling change Change, delete row transformation and delete rank transformation at least one of.
Optionally, in the computer-readable medium, in step (B), by considering between non-empty white pixel The pixel matching situation between the first picture and second picture is determined with situation.
Optionally, in the computer-readable medium, the computer program also executes step (D) and step (E), In, the computer program executes step (D) before step (A): determining the pen of the first text and the second text to be compared Draw quantity, wherein if determined in step (D) the first text and the second text at least one of stroke quantity be less than Preset threshold, then the computer program executes step (A), step (B) and step (C);Otherwise, the computer program executes Step (E): the shape recency between the first text and the second text is determined based on the editing distance of the first text and the second text.
Optionally, in the computer-readable medium, the computer program is for multiple first texts and multiple the Two texts determine the shape recency between each first text and each second text respectively, also, the computer program is also held Row: (F) be directed to each first text, select and described each at least one higher second text of first text shape recency as The nearly word form of each first text, and establish the nearly word form library being made of the multiple first text and its nearly word form.
In accordance with an alternative illustrative embodiment of the present invention, a kind of computer-readable medium of execution text check and correction is provided, In, record has the computer program for executing following steps on the computer-readable medium: determining text to be proofreaded;From The text to be proofreaded is searched in the nearly word form library established by computer-readable medium according to an exemplary embodiment of the present invention Nearly word form;And export the nearly word form of the text to be proofreaded searched.
In accordance with an alternative illustrative embodiment of the present invention, a kind of for determining the computing device of text shape recency, packet is provided Storage unit and processor are included, set of computer-executable instructions conjunction is stored in storage unit, is referred to when the computer is executable When set being enabled to be executed by the processor, execute following step: (A) carries out the first text and the second text to be compared respectively Picture format conversion, to obtain corresponding first picture and second picture;(B) picture between the first picture and second picture is determined Plain match condition;And (C) determines the shape recency between the first text and the second text based on determining pixel matching situation.
Optionally, in the computing device, in step (A), according to raster font by the first text and the second text Picture format conversion is carried out respectively.
Optionally, in the computing device, in step (B), determined as unit of single pixel the first picture with Pixel matching situation between second picture.
Optionally, in the computing device, step (B) includes: the pixel point of (b1) to the first picture and second picture It carry out not binary conversion treatment;And (b2) is determined by the pixel between the first picture and second picture after binary conversion treatment Match condition.
Optionally, in the computing device, step (B) includes: that (b3) carries out the first picture and second picture respectively At least one geometric transformation;And (b4) determines the first picture before geometric transformation and after every kind of geometric transformation The one picture pixel between the second picture before geometric transformation and the second picture after every kind of geometric transformation respectively Match condition;Also, step (C) include: (c1) be based respectively on the pixel matching situation of each determination determine the first text with Candidate shape recency between second text;And highest candidate's shape recency among determining candidate shape recency is determined as by (c2) Shape recency between first picture and second picture.
Optionally, in the computing device, at least one geometric transformation includes translation transformation, scale transformation, deletes Row transformation and delete rank transformation at least one of.
Optionally, in the computing device, in step (B), by considering the match condition between non-empty white pixel To determine the pixel matching situation between the first picture and second picture.
Optionally, in the computing device, the set of computer-executable instructions, which is closed, also executes step (D) and step (E), wherein the set of computer-executable instructions is closed before step (A), is executed step (D): determining the first text to be compared The stroke quantity of word and the second text, wherein if determining at least one among the first text and the second text in step (D) A stroke quantity is less than preset threshold, then the set of computer-executable instructions, which is closed, executes step (A), step (B) and step (C);Otherwise, the set of computer-executable instructions, which is closed, executes step (E): the editing distance based on the first text and the second text To determine the shape recency between the first text and the second text.
Optionally, in the computing device, the set of computer-executable instructions is closed for multiple first texts and more A second text determines the shape recency between each first text and each second text respectively, also, the computer can be held Row instruction set also executes: (F) is directed to each first text, selection and each first text shape recency higher at least one Nearly word form of a second text as each first text, and establish and be made of the multiple first text and its nearly word form Nearly word form library.
In accordance with an alternative illustrative embodiment of the present invention, a kind of computing device of execution text check and correction, including storage are provided Component and processor are stored with set of computer-executable instructions conjunction in storage unit, when the set of computer-executable instructions is closed When being executed by the processor, following step is executed: determining text to be proofreaded;From by meter according to an exemplary embodiment of the present invention Calculate the nearly word form that the text to be proofreaded is searched in the nearly word form library that device is established;And the text to be proofreaded that output searches The nearly word form of word.
In the method and apparatus according to an exemplary embodiment of the present invention for determining text shape recency, text can be based on The pixel matching degree after picture format is converted to judge the shape short range degree of text, to be no longer limited by any coding rule Then become a kind of general nearly word form method of determination.
Detailed description of the invention
From the detailed description with reference to the accompanying drawing to the embodiment of the present invention, these and or other aspects of the invention and Advantage will become clearer and be easier to understand, in which:
Fig. 1 shows according to an exemplary embodiment of the present invention for determining the block diagram of the equipment of text shape recency;
The matching in equipment that Fig. 2 shows according to an exemplary embodiment of the present invention for determining text shape recency determines dress The block diagram set;
The matching that Fig. 3 is shown in the equipment for determining text shape recency of another exemplary embodiment according to the present invention is true Determine the block diagram of device;
The nearly word form that Fig. 4 is shown in the equipment according to an exemplary embodiment of the present invention for determining text shape recency determines The block diagram of device;
Fig. 5 shows the block diagram of the equipment for determining text shape recency of another exemplary embodiment according to the present invention;
Fig. 6 shows the block diagram of the equipment for determining text shape recency of another exemplary embodiment according to the present invention;
Fig. 7 shows according to an exemplary embodiment of the present invention for determining the flow chart of the method for text shape recency;
Fig. 8 shows the flow chart of the method for determining text shape recency of another exemplary embodiment according to the present invention;
Fig. 9 shows the flow chart of the method for determining text shape recency of another exemplary embodiment according to the present invention;
Figure 10 shows the flow chart of the method for determining text shape recency of another exemplary embodiment according to the present invention;
Figure 11 shows the flow chart of the method for determining text shape recency of another exemplary embodiment according to the present invention;
Figure 12 shows the flow chart of the method for determining text shape recency of another exemplary embodiment according to the present invention;
Figure 13 shows the block diagram of the equipment of execution text check and correction according to an exemplary embodiment of the present invention;And
Figure 14 shows the flow chart of the method for execution text check and correction according to an exemplary embodiment of the present invention.
Specific embodiment
In order to make those skilled in the art more fully understand the present invention, with reference to the accompanying drawings and detailed description to this hair Bright exemplary embodiment is described in further detail.
In an exemplary embodiment of the present invention, it by the way that different literals are respectively converted into picture format, and is based in turn The match condition of picture pixels determines the shape recency between text.Text described here is not limited to Chinese character, can also be all There may be the text of approximation between any different words of such as Japanese, Korean in shape.
Fig. 1 shows according to an exemplary embodiment of the present invention for determining the block diagram of the equipment of text shape recency.Specifically Come, system shown in FIG. 1 can be realized all by computer program with software mode, can also carry out reality by special hardware device It is existing, it can also be realized by way of software and hardware combining.Correspondingly, each device for forming equipment shown in FIG. 1 can be only The virtual module that corresponding function is realized by computer program is also possible to realize the logical of the function by hardware configuration With or dedicated devices, can also be that operation has the hardware device etc. of corresponding computer program.
As shown in Figure 1, conversion equipment 100 is used to the first text and the second text to be compared carrying out picture format respectively Conversion, to obtain corresponding first picture and second picture.
Particularly, the first text and the second text, which are used to indicate, to carry out any two text of font comparison, as Example, the two texts can derive from same language, also it is contemplated that application scenarios and derive from different language.
Here, the first text and the second text are represented by the character string under predetermined literal code rule.Conversion equipment 100 can read character string corresponding with the first text and character string corresponding with the second text respectively, and then by the character string It is respectively converted into the first picture and second picture of picture format.Here, the picture format can be bitmap format, can also be with It is the format that other can obtain each pixel number.Conversion equipment 100 can execute picture according to the picture format parameter of default Format conversion, may also set up and/or adjust relevant parameter, for example, the photo resolution etc. after conversion.Here, conversion equipment 100 Picture format conversion can be executed to the first text and the second text according to identical parameter, so that the first picture after conversion and the Two pictures can specification having the same.
As an example, black portions can correspond to the stroke of text in the first picture and second picture after conversion, and White portion can correspond to the white space of non-stroke.As an example, turning to further ensure that the matched effect of later pixel Changing device 100 is when executing picture format conversion, font corresponding to the text of settable picture format.For example, conversion equipment First text and the second text can be carried out respectively picture format conversion according to raster font by 100.Here, it should be noted that converting means Picture format conversion can also be executed according to other non-raster fonts by setting 100, for example, Song typeface etc. is relatively close proximity to latticed form Font or any other font.
Matching determining device 200 is used to determine the pixel matching situation between the first picture and second picture.As described above, After picture format conversion, the black portions in picture can correspond to the stroke of text, and the white portion in picture It can correspond to the white space of non-stroke.It, may be all by black for each pixel of composition picture Stroke occupy, it is also possible to all by white non-stroke occupy.In addition, pixel is also possible to partly be occupied by stroke, also It is to say, both having there is stroke part within a single pixel, there is also non-stroke parts.In this case, the value of each pixel Can be indicate pixel entirety black and white degree brightness value, as an example, the value of each pixel can be by 0 value or 1 value of binaryzation are also possible to indicate the brightness value (for example, value between 0 to 255) of actual grey.
Here, as an example, matching determining device 200 can determine the first picture and second as unit of single pixel Pixel matching situation between picture.That is, matching determining device 200 can determine in the first picture and second picture one by one Match condition between corresponding pixel points.Here, matching determining device 200 can determine the matching feelings between whole corresponding pixel points Condition, optionally, matching determining device 200 can also determine match condition only for the one part of pixel in picture, for example, matching Determining device 200 can determine the picture between the first picture and second picture by considering the match condition between non-empty white pixel Plain match condition, that is to say, that for the blank respective pixel in the first picture and second picture (here, if certain specific seat A pair of of pixel in cursor position is not related to any stroke in the first picture and second picture, then they constitute a pair of of blank pair Answer pixel), matching determining device 200 can not need to determine the match condition between them.
In the match condition for determining respective pixel, if the value of corresponding pixel points is identical or approximate in two pictures, Then it is believed that matching each other between the two respective pixels;If the value of corresponding pixel points exists obvious poor in two pictures It is different, then it is believed that being mismatched between the two respective pixels.Correspondingly, the recordable related respective pixel of matching determining device 200 The matching result of point, for example, the pixel number that record matches each other.In addition, as an example, matching determining device 200 may be used also Other matching result information are recorded, for example, the position etc. of matched pixel point.
It should be noted that being also based on the square being made of multiple pixels other than being compared as unit of single pixel Battle array or block are unit to compare the pixel matching situation between the first picture and second picture, here, in same matrix or block Pixel can have unified matching result.
In addition, operation efficiency when matching is executed to further increase, it can be before executing specific matching operation to figure The pixel of piece carries out binary conversion treatment, and correspondingly, Fig. 2 shows according to an exemplary embodiment of the present invention for determining that text shape is close The block diagram of matching determining device 200 in the equipment of degree.
As shown in Fig. 2, binary conversion treatment unit 210 for carrying out two-value to the pixel of the first picture and second picture respectively Change processing.By binary conversion treatment, the value of single pixel can be a value among 0 or 1 the two.It here, can be preparatory Threshold value (threshold value can be global threshold, be also possible to local threshold) about pixel brightness value is set, for pixel value height In the pixel (that is, non-relatively large number of pixel in stroke region) of respective threshold, the value of this pixel can be set to 1 or 0 it In one value, and for be lower than respective threshold pixel (that is, the relatively large number of pixel in stroke region), can be by their value Another value being set as among 1 or 0.Here, the threshold value can be the median of the range of luminance values of pixel, be also possible to Different from the other values of median.For example, it is assumed that the range of luminance values of pixel is [0,255], wherein 0 corresponding minimum brightness, and 255 corresponding maximum brightnesses, correspondingly, the pixel of minimum brightness can be occupied all by stroke, and not deposited in the pixel of maximum brightness In any stroke region;In this case, 50% (that is, 127) of highest brightness value can be set a threshold to, correspondingly, such as The brightness value of fruit pixel is higher than 127, then is the white pixel with higher brightness by the pixel classifications, otherwise, then by the picture Element is classified as with the black pixel compared with low-light level.In addition, an exemplary embodiment of the present invention, in order to preferably match effect Fruit can also set a threshold to 75% (that is, 191) of maximum brightness, be divided in this way, the brightness value of only pixel is higher than 191 Class is white pixel, otherwise will be classified as black pixel.
Matching unit 220 is for determining by the pixel between the first picture and second picture after binary conversion treatment With situation.Here, matching unit 220 can determine and record the match condition between pixel in the manner described above.As an example, Matching unit 220 can determine pixel matching situation for all respective pixels after binary conversion treatment, alternatively, matching unit 220 can also be only for the black pixel of correspondence after binary conversion treatment (that is, at least one pixel is black in corresponding a pair of pixel Pixel) determine pixel matching situation.
Referring back to Fig. 1, shape recency determining device 300 for determined based on determining pixel matching situation the first text with Shape recency between second text.Particularly, it has been determined between the first picture and second picture in matching determining device 200 After pixel matching situation, relevant match information (for example, the number of matched pixel pair, position or matching degree etc.) can be mentioned Supply shape recency determining device 300.Correspondingly, shape recency determining device 300 can pass through the entirety for the pixel for considering to match each other Accounting determines the shape recency between the first text and the second text.Here, shape recency determining device 300 can qualitatively or quantitatively Determine the shape recency between the first text and the second text.Particularly, if the entirety of the pixel to match each other account for it is relatively high, Then determine the shape recency between the first text and the second text it is higher (or can correspondingly calculate the first text and the second text it Between shape recency numerical value), or determine that the first text and the second text are nearly word form;If the entirety of the pixel to match each other accounts for It is relatively low, it is determined that shape recency between the first text and the second text is lower (or can correspondingly to calculate the first text and Shape recency numerical value between two texts), or determine the first text and the second text for non-nearly word form.
As an example, the number and whole of the corresponding pixel points that shape recency determining device 300 can be matched each other by calculating Ratio between the number of related corresponding pixel points determines the shape recency numerical value between the first picture and second picture.For example, For the first picture and second picture of binaryzation, shape recency determining device 300 can calculate matched correspondence between two width pictures Ratio among the number of black pixel and two width pictures between the sum of black pixel, and determined based on the ratio two width pictures it Between shape recency numerical value.
Although matching determining device 200 and shape recency determining device 300 are shown as isolated device, however, this is The two is distinguished in logic function, not limits relationship between the two, it can be seen that is deposited between above-mentioned two device It is cooperating with each other, can be embodied in unified processing unit.
An exemplary embodiment of the present invention can be to first in order to further increase accuracy when determining shape recency Picture and second picture carry out various transformation, and calculate separately shape recency for a variety of transformation.
The matching that Fig. 3 is shown in the equipment for determining text shape recency of another exemplary embodiment according to the present invention is true Determine the block diagram of device 200.
Become as shown in figure 3, converter unit 230 is used to carry out the first picture and second picture respectively at least one geometry It changes.Here, converter unit 230 can entirety to every width picture or part execute the geometric transformations such as translation, scaling, rotation, To generate several pictures after modification.As an example, at least one geometric transformation may include translation transformation, scaling Convert, delete row transformation and delete rank transformation at least one of.
Correspondingly, matching unit 220 for the first picture before determining geometric transformation and passes through every kind of geometric transformation The first picture afterwards is respectively between the second picture before geometric transformation and the second picture after every kind of geometric transformation Pixel matching situation.Here, matching unit 220 shown in Fig. 3 can be according to similar with matching unit 220 shown in Fig. 2 Mode is operated, just for object other than the first original picture and second picture, further include becoming by geometry The first picture and second picture after changing.
As can be seen that matching determining device 200 shown in Fig. 3 can provide a variety of pixel matching situations.Correspondingly, Fig. 4 shows The frame of nearly word form determining device 300 in equipment according to an exemplary embodiment of the present invention for determining text shape recency out Figure.
As shown in figure 4, candidate shape recency determination unit 310 determines dress for being based respectively on each matching as shown in Figure 3 Pixel matching situation determined by setting 200 determines the candidate shape recency between the first text and the second text.Here, for by Every kind of pixel matching situation that matching determining device 200 shown in Fig. 3 is exported, candidate shape recency determining device 310 can be corresponding Determine candidate shape recency corresponding with every kind of pixel matching situation in ground.That is, for any pair of original or warp The first picture and second picture of transformation are crossed, candidate shape recency determining device 310 can determine a corresponding candidate shape recency, by This can obtain multiple candidate shape recencies in total.
Correspondingly, highest candidate's shape among candidate shape recency that highest shape recency determination unit 320 is used to determine is close Degree is determined as the shape recency between the first picture and second picture.Here, highest shape recency determination unit 320 can be by will be by waiting The candidate shape recency that form slection recency determining device 310 determines is ranked up to obtain highest shape recency therein, and by the highest Shape recency is as the most end form recency between the first picture and second picture.
It should be understood that the exemplary embodiment of above-mentioned example for explaining only the invention, not in progress in all senses Limitation.In fact, those skilled in the art can carry out any modification or improvement to above-mentioned example according to content of this disclosure, These modifications or improvement still fall within protection scope of the present invention.For example, Fig. 2 and matching determining device 200 shown in Fig. 3 can close And be integrated, to include binary conversion treatment unit 210 and converter unit shown in Fig. 3 230 and matching unit shown in Fig. 2 220.That is, in order to determine the pixel matching situation between the first picture and second picture, it can be first to the first picture and Two pictures carry out binary conversion treatment, and to after binary conversion treatment the first picture and second picture carry out respectively it is one or more several What is converted, and then determines that original first picture and every kind of transformed first picture and original second picture and every kind are transformed The pixel matching situation of second picture between any two.
As described above, can be based in the equipment according to an exemplary embodiment of the present invention for determining text shape recency Picture format judges the shape short range degree between different literals, to be no longer limited by the coding rule of text itself.
However, exemplary embodiment of the present invention does not repel traditional editing distance based between literal code come really Determine the mode of text shape recency, but the text based on picture format can be compared and be combined with each other with any other mode.
Fig. 5 shows the block diagram of the equipment for determining text shape recency of another exemplary embodiment according to the present invention.Such as Shown in Fig. 5, the equipment for determining text shape recency may include conversion equipment 100, matching determining device 200, nearly word form Determining device 300 and stroke determining device 400.
As can be seen that in Fig. 5, being compared in combination with picture alignments with editing distance compared with equipment shown in FIG. 1 Both modes determine the shape recency between text.As an example, existing when in the first text and the second text to be compared When the text of stroke negligible amounts, can be used picture alignments, and when the first text and the second text be stroke quantity compared with When more texts, editing distance alignments can be used.
Particularly, stroke determining device 400 is used to determine the stroke quantity of the first text and the second text to be compared. Here, as an example, stroke determining device 400 can determine respectively the first text by inquiring the table about strokes of characters number The stroke quantity of stroke quantity and the second text.
Here, if stroke determining device 400 determine the first text and the second text at least one of stroke number Amount is less than preset threshold, then picture format conversion can be performed in conversion equipment 100, and matching determining device 200 can determine pixel matching Situation, and shape recency determining device 300 can be determined based on determining pixel matching situation the first text and the second text it Between shape recency;Otherwise, shape recency determining device 300 can determine first based on the editing distance of the first text and the second text Shape recency between text and the second text.Here, shape recency determining device 300 can be according to the text of the first text and the second text Word encodes to determine editing distance between the two, wherein an exemplary embodiment of the present invention, literal code, which can be used, appoints What coding mode appropriate, it is preferred that, shape recency determining device 300 can determine the first text and the second text of Cangjie's code The editing distance of word, and determine based on the editing distance shape recency between the first text and the second text.
That is, as long as the stroke quantity of the first text or the second text is less than preset threshold, that is, text to be compared The middle text less there are stroke, then conversion equipment 100, matching determining device 200 and shape recency determining device 300 can be according to It is operated referring to figs. 1 to mode described in Fig. 4, will not be described in great detail here.On the other hand, if the first text and second Text is a fairly large number of text of stroke, then shape recency determining device 300 can obtain the coding and the second text of the first text Coding between editing distance, and determine based on the editing distance shape recency between the first text and the second text, this In, editing distance is bigger, and the shape recency between text is lower.In addition, shape recency determining device 300 can will also be based on editing distance Determining text shape recency is unified under same numberical range with the text shape recency determined based on pixel matching situation, example Such as, with percentage to indicate text shape recency, also, similarly numerical value can indicate equivalent similarity degree, for example, based on compiling Can to be considered as same font close with the 80% shape recency based on the determination of pixel matching situation for the 80% shape recency that volume distance determines Like degree.
Fig. 6 shows the block diagram of the equipment for determining text shape recency of another exemplary embodiment according to the present invention.Such as Shown in Fig. 6, the equipment for determining text shape recency may include conversion equipment 100, matching determining device 200, nearly word form Determining device 300 and character library establish device 500.Optionally, equipment shown in fig. 6 may also include stroke determining device and (not show Out).
As can be seen that in Fig. 6, still further comprising the dress for constructing nearly word form library compared with equipment shown in FIG. 1 It sets.Particularly, conversion equipment 100, matching determining device 200 and nearly word form determining device 300 (or together with stroke determine fill Set) determine that the shape between each first text and each second text is close respectively for multiple first texts and multiple second texts On the basis of degree, character library establishes device 500 for for each first text, selection and each first text shape recency compared with Nearly word form of second text of at least one high as each first text, and establish by the multiple first text and its The nearly word form library of nearly word form composition.
As an example, establishing the form that the nearly word form library that device 500 is established can have mapping table by character library, wherein each Mapping relations are formed between text and its nearly word form.Correspondingly, some can be easily determined by searching for the nearly word form library Whether text and another text are nearly word forms, or, it is convenient to which the nearly word form for finding some text has.Based on above-mentioned Search result can further execute such as wrong word school peering.
It is described below with reference to Fig. 7 to Figure 12 according to an exemplary embodiment of the present invention for determining text shape recency Method, these methods or in which certain steps can be implemented by Fig. 1 to device shown in fig. 6, can also pass through completely Computer program is realized with software mode.
Fig. 7 shows according to an exemplary embodiment of the present invention for determining the flow chart of the method for text shape recency.
Referring to Fig. 7, in the step s 100, the first text and the second text to be compared is subjected to picture format respectively and turned It changes, to obtain corresponding first picture and second picture.Here, first after being encoded according to specific coding mode can be read First text of reading and the second text are converted to picture format from coded format respectively by text and the second text, optionally, When executing format conversion, the font of picture character may specify, for example, raster font or Song typeface etc..
Next, in step s 200, determining the pixel matching situation between the first picture and second picture.It illustrates Come, it is assumed that the first picture and second picture are 100 × 100 pixel-matrix, then share 100 × 100 pixels pair, wherein For every a pair will there is the pixel of same coordinate position to determine its match condition.In step s 200, it may be determined that whole pictures Whether element is to matching or its matching degree, and pixel is to (that is, non-empty white pixel where can also only determining the stroke in two width pictures It is right) whether match or its matching degree.Here, matching, which refers to, has the same or similar brightness between two pixels, show this Two pixels are used as the probability of stroke pixel or non-stroke pixel higher simultaneously.
Next, in step S300, determined based on determining pixel matching situation the first text and the second text it Between shape recency.As described above, can further determine that matching picture after the match condition between all related pixels has been determined The accounting situation of element, and determine according to the accounting situation of matched pixel the shape recency of the first text and the second text.
As can be seen that in the above-described embodiments, due to text be converted into picture format to remain all strokes, The characteristic informations such as structure can determine the nearly property of shape between text more fully hereinafter on this basis, and avoid literal code band The information loss come.
It should be noted that any specific embodiment can be used in exemplary embodiment of the present invention, following example is only used for These bright specific embodiments, and it is not intended in any way to limit protection scope of the present invention.
Fig. 8 shows the flow chart of the method for determining text shape recency of another exemplary embodiment according to the present invention.
Referring to Fig. 8, in the step s 100, the first text and the second text to be compared is subjected to picture format respectively and turned It changes, to obtain corresponding first picture and second picture.
Next, carrying out binary conversion treatment respectively to the pixel of the first picture and second picture in step S210.Pass through Binary conversion treatment, each picture can be by the black pixel where stroke and the white pixel of stroke does not form.Here, it can default or adjust Division threshold value in whole binary conversion treatment, to obtain more effective matching result.
In step S220, determine by the pixel matching between the first picture and second picture after binary conversion treatment Situation.Here, as an example, the pixel matching situation between the first picture and second picture can be determined only for black pixel, Particularly, for each black pixel of any picture among the first picture and second picture, it may be determined that the first picture and Whether the respective pixel of another picture among two pictures is black pixel, if the respective pixel of another picture is also black picture Element, it is determined that above-mentioned two pixel is the black pixel to match each other.Correspondingly, between statistics available first picture and second picture that The sum of this matched black pixel.It should be noted that the mode of pixel matching situation determined above is only as an example, those skilled in the art Member can also be for whole pixels, the white pixel in two width pictures or particular picture (for example, only first picture or only second picture) In black pixel or white pixel come determine and if respective pixel constitute matched pixel.
Next, in step S300, determined based on determining pixel matching situation the first text and the second text it Between shape recency.Particularly, based on the information about match condition determined in step s 200, matching can be further calculated Accounting of the pixel in total pixel, so that it is determined that the shape recency of text.
For example, it is assumed that the information about match condition specifies the black picture to match each other between the first picture and second picture The sum of element, as an example, the shape recency between the first text and the second text can be calculated according to following formula:
Shape recency=2 × (sum of the black pixel to match each other)/(black number of pixels+second picture of the first picture Black number of pixels).
Above-mentioned example can more efficiently judge the nearly relationship of shape between the few text of stroke.For example, it is assumed that first Text is " fourth " word and the second text is " people " word, if considering all pixels including white pixel when determining matching relationship It is not only to consider black pixel, then since the stroke of " fourth " Yu " people " the two words is inherently very few (that is, the first picture and the second figure The black pixel quantity that stroke occupies in piece is few), and occupy the most most white pixels of picture and match each other, so calculating The result come can falsely indicate that " fourth " word and " people " font recency are high.
Fig. 9 shows the flow chart of the method for determining text shape recency of another exemplary embodiment according to the present invention.
Referring to Fig. 9, in the step s 100, the first text and the second text to be compared is subjected to picture format respectively and turned It changes, to obtain corresponding first picture and second picture.
Next, carrying out at least one geometric transformation respectively to the first picture and second picture in step S230.This In, can carry out translation transformation, scale transformation for the first picture and second picture respectively, delete row transformation and delete rank transformation etc. One or more transformation, as an example, same kind of geometric transformation can be executed for the first picture and second picture.According to this The exemplary embodiment of invention can execute above-mentioned geometric transformation for entire picture or local picture.
It, can be by the whole or part of the picture along predetermined when executing translation transformation to the first picture or second picture Direction (for example, can along it is upward, downward, to the left, to the right, upper left, lower-left, upper right, the directions such as bottom right) move in parallel.
In addition, when executing scale transformation to the first picture or second picture, it can be according to predetermined ratio to the picture Whole or part amplifies/reduces stretching, it is contemplated that layout of the text in picture, as an example, can according to [0.8, 1.2] ratio between is amplified/is reduced to picture.
In addition, when deleting row to the first picture or second picture execution or deleting rank transformation, it can be few by certain stroke pixels Row or column deleted, here, can by the quantity of the row or column of deletion control within limits so that passing through this change It changes, helps to identify the matching such as " encouraged " between word and " nail " word.
Next, in step S220, the first picture before geometric transformation is determined and after every kind of geometric transformation The first picture respectively between the second picture before geometric transformation and the second picture after every kind of geometric transformation Pixel matching situation.Here, the pixel between every a pair of first picture and second picture can be determined in above-described manner Match condition.
Particularly, it is assumed that after the first picture have passed through the geometric transformation of m (m is positive integer) kind, obtained m transformation The first picture afterwards, and after second picture have passed through the geometric transformation of n (n is positive integer) kind, obtained n convert after the One picture, here, the geometric transformation of m kind can be the same or different in quantity or type with the geometric transformation of n kind;Correspondingly, In step S220, can confirm respectively original first picture and m transformation the first picture each of with original second figure Piece and n transformation second picture each of between pixel matching situation, correspondingly, can altogether obtain (m+1) × (n+ 1) the pixel matching situation of a determination.
In step s310, be based respectively on the pixel matching situation of each determination determine the first text and the second text it Between candidate shape recency.Here, can mode as described above determine that the shape between text is close based on pixel matching situation Degree, only because multiple pixel matching situations have been determined in step S220, it therefore, can for each pixel matching situation Corresponding candidate shape recency is obtained, so that in step s310, a candidate shape recency of (m+1) × (n+1) can be obtained in total.
Next, in step s 320, highest candidate's shape recency among determining candidate shape recency is determined as first Shape recency between picture and second picture.That is, can be to (m+1) obtained in step s310 × (n+1) a candidate Shape recency is ranked up, and chooses maximum value therein as the most end form recency between the first picture and second picture.
As described above, exemplary embodiment of the present invention and not lying in limits the scope of the invention, different embodiments Between certain steps can be combined with each other, to form new specific embodiment.For example, Figure 10 show it is another according to the present invention The flow chart of the method for determining text shape recency of exemplary embodiment.It will be introduced briefly shown in Fig. 10 show below Example property embodiment, the execution details about correlation step will not be described in great detail here.
Referring to Fig.1 0, in the step s 100, the first text and the second text to be compared are subjected to picture format respectively and turned It changes, to obtain corresponding first picture and second picture.
In step S210, binary conversion treatment is carried out respectively to the pixel of the first picture and second picture.
In step S230, to Jing Guo binary conversion treatment the first picture and second picture carry out at least one geometry respectively Transformation.
Next, determining original first picture, the first picture by binary conversion treatment and process in step S220 The first picture after every kind of geometric transformation is respectively with original second picture, the second picture by binary conversion treatment and by every The pixel matching situation between second picture after kind geometric transformation.
In step s310, be based respectively on the pixel matching situation of each determination determine the first text and the second text it Between candidate shape recency.
Then, in step s 320, highest candidate's shape recency among determining candidate shape recency is determined as the first figure Shape recency between piece and second picture.
Figure 11 shows the flow chart of the method for determining text shape recency of another exemplary embodiment according to the present invention. In the exemplary embodiment shown in Figure 11, determined based on picture comparison the mode of text shape recency with based on editing distance come Determine that the mode of text shape recency is combined with each other.
As shown in figure 11, in step S400, the stroke quantity of the first text and the second text to be compared is determined.This In, can determine the stroke quantity of the first text and the stroke quantity of both the second texts respectively, can also only determine the first text or The stroke quantity of second text.
In step S410, determine whether the stroke quantity of step S400 the first text determined or the second text is less than Preset threshold.Here, the preset threshold can be the stroke quantity of default, be also possible to be manually set and be adjusted Stroke quantity.
It is less than if determined in step S410 in the stroke quantity of step S400 the first text determined or the second text Preset threshold, then the method executes step S100 to S300 and determines step to execute such as Fig. 7 to nearly word form shown in Fig. 10. Otherwise, the method executes step S330, in step S330, is determined based on the editing distance of the first text and the second text Shape recency between first text and the second text.
An exemplary embodiment of the present invention can effectively determine the shape recency between text, on this basis, can be into One step establishes the character library about nearly word form.
Particularly, Figure 12 shows the method for determining text shape recency of another exemplary embodiment according to the present invention Flow chart.By executing method shown in Figure 12, can establish by multiple first texts and close with each first text shape The second text composition nearly word form library.It will be introduced briefly exemplary embodiment shown in Figure 12 below, walked about correlation Rapid execution details will not be described in great detail here.
As shown in figure 12, in the step s 100, the first text and the second text to be compared are subjected to picture format respectively Conversion, to obtain corresponding first picture and second picture.It here, can be first in order to obtain the nearly word form of specific first text Picture format conversion is carried out to first text and by specific second text being compared with it.
Next, in step s 200, determining the pixel matching situation between above-mentioned first picture and second picture.So Afterwards, in step S300, determine that the shape between above-mentioned first text and the second text is close based on determining pixel matching situation Degree.
Then, in step S340, it is determined whether there is also by be compared with specific first text other Two texts.If determined in step S340 there is also such second text, S110 is thened follow the steps, will be carried out to next The second text compared carries out picture format conversion.Then, the method re-execute the steps S200 and subsequent step with into one Walk the shape recency determined between specific first text and next second text.
If in step S340 determine arrived be compared with specific first text the last one second Text, then the method carries out step S500 and, for specific first text, was determining shape recency in step S500 All second texts in select with described specific at least one higher second text of first text shape recency as the spy The nearly word form of fixed first text, and will be in specific first text and its nearly word form of determination deposit nearly word form library.
Next, in step S510, it is determined whether there is also it needs to be determined that nearly word form other first texts.If It determines in step S510 there is also such first text, thens follow the steps S520, in step S520, obtain next first Text and the second text that will be compared with it.Then, the method is directed to above-mentioned first text and the second text again Execute step S100 and its subsequent step.
Method shown in above-mentioned Fig. 7 to Figure 12 is merely exemplary, and certain steps therein might not be according to shown Timing execute, and can in a reverse order or be performed simultaneously completely, in addition, different attached drawings can also be combined with each other to produce Raw new exemplary embodiment, for example, certain steps in Figure 12 can be replaced certain steps shown in Fig. 8 to Figure 11.? That is any mode appropriate, which can be used, in those skilled in the art carries out variations and modifications to above method embodiment.
By the above-mentioned means, the nearly word form database convenient for query calls can be established, without being directed to text every time It is calculated, saves the computing resources such as time, hardware.It is described hereinafter with reference to Figure 13 and Figure 14 exemplary according to the present invention The device and method of embodiment proofreaded for executing text.
Particularly, Figure 13 shows the block diagram of the equipment of execution text check and correction according to an exemplary embodiment of the present invention.Such as Shown in Figure 13, the equipment includes text determining device 1000, nearly word form searcher 2000 and nearly word form output device 3000. Figure 14 shows the flow chart of the method for execution text check and correction according to an exemplary embodiment of the present invention, for convenience, below Assuming that the method for Figure 14 is executed by the equipment of Figure 13, it is noted, however, that the method for Figure 14 can also be by other equipment or completely logical Computer program is crossed to implement.
Particularly, in step S1000, text to be proofreaded is determined by text determining device 1000.Here, text determines Device 100 can determine as check and correction target text, as an example, the text can be keyword, important word, easy wrongly written character, The various texts that sensitive words etc. are proofreaded.
Next, being built by nearly word form searcher 2000 from based on exemplary embodiment of the present invention in step S2000 The nearly word form of the text to be proofreaded is searched in vertical nearly word form library.As described above, in nearly word form library, it may include each text And the one or more nearly word forms being similar to it.These nearly word forms are by being based on after text progress picture format conversion Pixel matching situation between picture and establish mapping relations.Correspondingly, nearly word form searcher 2000 can be in nearly word form library Middle search one or more nearly word forms corresponding with text to be proofreaded.
Then, in step S3000, the shape of text to be proofreaded described in searching is exported as nearly word form output device 3000 Nearly word.Here, as an example, text to be proofreaded together with its nearly word form can be supplied to together user by nearly word form output device 3000 Or automatic Proofreading device (not shown), so that in later use or this group of nearly word form can be handled.For example, user can be close in output shape More easily determine whether text to be proofreaded is wrong word under the prompt of word, or further determines whether that it is necessary to will text be proofreaded Word is substituted for one of nearly word form.Alternatively, the automatic Proofreading device can be to automatically determine according to the context of text to be proofreaded It is no it is necessary to which text to be proofreaded is replaced with its one of nearly word form, optionally, automatic Proofreading device replacement prompt or can will replace It changes as the result is shown to user, so that user is finally confirmed whether to execute replacement.
It should be understood that the exemplary embodiment of above-mentioned example for explaining only the invention, model and is not intended to limit the present invention It encloses.Here, on the basis of nearly word form output device 3000 exports the nearly word form of text to be proofreaded, can carry out it is any it is appropriate into One step processing or decision, realize wrong word check and correction, it is lack of standardization write a Chinese character in simplified form replacement etc. operation.
It is described above by reference to Fig. 1 to Figure 12 according to an exemplary embodiment of the present invention for determining setting for text shape recency Standby and method.It should be understood that the above-mentioned method for determining text shape recency can pass through the program that is recorded in computer-readable media It realizes, correspondingly, an exemplary embodiment of the present invention, it is possible to provide a kind of computer for determining text shape recency can Read medium, record has a computer program for executing following methods step on the computer-readable medium: (A) will to than Compared with the first text and the second text carry out picture format conversion respectively, to obtain corresponding first picture and second picture;(B) Determine the pixel matching situation between the first picture and second picture;And (C) is determined based on determining pixel matching situation Shape recency between first text and the second text.
Computer program in above-mentioned computer-readable medium can be in client, host, agent apparatus, server etc. Run in the environment disposed in computer equipment, it should be noted that the computer program can also be used in execute in addition to above-mentioned steps with Outer additional step or execute when executing above-mentioned steps more specifically handles, these additional steps and is further processed Content is described referring to figs. 1 to Figure 14, here in order to avoid repetition will be repeated no longer.
Correspondingly, the above-mentioned equipment for determining text shape recency can also be completely dependent on the operation of computer program to realize Corresponding function, that is, each device is corresponding to each step in the function structure of computer program, so that whole system passes through specially Door software package (for example, the library lib) and be called, to realize corresponding function.
On the other hand, Fig. 1 can also be by hardware, soft to each device shown in fig. 6 and unshowned relevant apparatus Part, firmware, middleware, microcode or any combination thereof are realized.When with the realization of software, firmware, middleware or microcode, use It can store in the computer-readable medium of such as storage medium, make in the program code or code segment for executing corresponding operating Corresponding operation can be executed by reading and running corresponding program code or code segment by obtaining processor.
Here, exemplary embodiment of the present invention is also implemented as the computing device for determining text shape recency, should Computing device includes storage unit and processor, set of computer-executable instructions conjunction is stored in storage unit, when the calculating When machine executable instruction set is executed by the processor, the above-mentioned method for determining text shape recency is executed.
Particularly, the computing device can be deployed in server or client, can also be deployed in distributed network On node apparatus in network environment.In addition, the computing device can be PC computer, board device, personal digital assistant, intelligence Energy mobile phone, web are applied or other are able to carry out the device of above-metioned instruction set.
Here, the computing device is not necessarily single computing device, can also be it is any can be alone or in combination Execute the device of above-metioned instruction (or instruction set) or the aggregate of circuit.Computing device can also be integrated control system or system A part of manager, or can be configured to Local or Remote (for example, via wireless transmission) with the portable of interface inter-link Formula electronic device.
In the computing device, processor may include central processing unit (CPU), graphics processor (GPU), may be programmed and patrol Collect device, dedicated processor systems, microcontroller or microprocessor.As an example, not a limit, processor may also include simulation Processor, digital processing unit, microprocessor, multi-core processor, processor array, network processing unit etc..
It is above-mentioned about for determine certain operations described in the method for text shape recency can by software mode come It realizes, certain operations can be realized by hardware mode, in addition, can also realize these behaviour by way of software and hardware combining Make.
Processor can run the instruction being stored in one of storage unit or code, wherein the storage unit can be with Storing data.Instruction and data can be also sent and received via Network Interface Unit and by network, wherein the network connects Any of transport protocol can be used in mouth device.
Storage unit can be integral to the processor and be integrated, for example, RAM or flash memory are arranged in integrated circuit microprocessor etc. Within.In addition, storage unit may include independent device, such as, external dish driving, storage array or any Database Systems can Other storage devices used.Storage unit and processor can be coupled operationally, or can for example by the port I/O, Network connection etc. communicates with each other, and enables a processor to read the file being stored in storage unit.
In addition, the computing device may also include video display (such as, liquid crystal display) and user's interactive interface is (all Such as, keyboard, mouse, touch input device etc.).The all components of computing device can be connected to each other via bus and/or network.
It is above-mentioned to be described as various interconnections or coupling about for determining to operate involved in the method for text shape recency Functional block or function diagram.However, these functional blocks or function diagram can be equably integrated into single logic device or It is operated according to non-exact boundary.
Particularly, as described above, the calculating dress according to an exemplary embodiment of the present invention for determining text shape recency Setting may include storage unit and processor, set of computer-executable instructions conjunction is stored in storage unit, when the computer can When set of instructions is executed by the processor, execute following step: (A) divides the first text and the second text to be compared Not carry out picture format conversion, to obtain corresponding first picture and second picture;(B) determine the first picture and second picture it Between pixel matching situation;And (C) is determined between the first text and the second text based on determining pixel matching situation Shape recency.
It should be noted that have been combined above Fig. 1 to Figure 12 describe it is according to an exemplary embodiment of the present invention for determine text Details is managed everywhere in font recency, will not be described in great detail processing details when computing device executes each step here.
Each exemplary embodiment of the invention is described above, it should be appreciated that foregoing description is merely exemplary, not Exhaustive, and present invention is also not necessarily limited to disclosed each exemplary embodiments.Without departing from scope and spirit of the present invention In the case where, many modifications and changes are obvious for those skilled in the art.Therefore, originally The protection scope of invention should be subject to the scope of the claims.

Claims (36)

1. a kind of method for determining text shape recency, comprising:
(A) the first text and the second text to be compared are subjected to picture format conversion respectively, to obtain corresponding first picture And second picture;
(B) the pixel matching situation between the first picture and second picture is determined;And
(C) the shape recency between the first text and the second text is determined based on determining pixel matching situation;
Wherein, step (B) includes:
(b3) it carries out at least one geometric transformation respectively to the first picture and second picture, specifically includes: m is carried out to the first picture Kind geometric transformation, obtains m transformed first pictures, carries out the geometric transformation of n kind to second picture, and it is a transformed to obtain n Second picture, here, m and n are positive integer, and the geometric transformation of m kind can be identical in quantity or type with the geometric transformation of n kind It can also be different;And
(b4) the first picture and m before determining geometric transformation transformed first picture each of respectively with it is several What convert before second picture and n transformed second picture each of between pixel matching situation, one The pixel matching situation of a determination of (m+1) × (n+1) is obtained altogether;
Also, step (C) includes:
(c1) the pixel matching situation of each determination is based respectively on to determine that the candidate shape between the first text and the second text is close Degree;And
(c2) highest candidate's shape recency among determining candidate shape recency is determined as between the first picture and second picture Shape recency.
2. the method for claim 1, wherein in step (A), according to raster font by the first text and the second text Picture format conversion is carried out respectively.
3. the method for claim 1, wherein in step (B), determined as unit of single pixel the first picture with Pixel matching situation between second picture.
4. the method for claim 1, wherein
Step (B) further include: (b1) carries out binary conversion treatment to the pixel of the first picture and second picture respectively;
The step (b3) include: to Jing Guo binary conversion treatment the first picture and second picture carry out at least one geometry respectively Transformation;
The step (b4) comprises determining that every kind of original first picture, the first picture by binary conversion treatment and process are several What transformed first picture is respectively with original second picture, the second picture by binary conversion treatment and by every kind of geometry Pixel matching situation between transformed second picture.
5. the method for claim 1, wherein at least one geometric transformation includes translation transformation, scale transformation, deletes Row transformation and delete rank transformation at least one of.
6. the method for claim 1, wherein in step (B), by considering the matching feelings between non-empty white pixel pair Condition determines the pixel matching situation between the first picture and second picture;
Wherein, non-empty white pixel is to being pixel pair where stroke in the first picture and second picture.
7. the method as described in claim 1 further includes step (D) and step (E), wherein
The method executes step (D) before step (A): determining the stroke number of the first text and the second text to be compared Amount,
Wherein, if determined in step (D) the first text and the second text at least one of stroke quantity be less than it is pre- If threshold value, then the method executes step (A), step (B) and step (C);Otherwise, the method executes step (E): based on the The editing distance of one text and the second text determines the shape recency between the first text and the second text.
8. the method as described in any claim among claims 1 to 7, wherein the method is for multiple first texts Word and multiple second texts determine the shape recency between each first text and each second text, also, the method respectively Further include:
(F) it is directed to each first text, selection is made with described each at least one higher second text of first text shape recency For the nearly word form of each first text, and establish the nearly word form library being made of the multiple first text and its nearly word form.
9. a kind of method for executing text check and correction, comprising:
Determine text to be proofreaded;
The nearly word form of the text to be proofreaded is searched for from the nearly word form library established based on claim 8;And
Export the nearly word form of the text to be proofreaded searched.
10. a kind of equipment for determining text shape recency, comprising:
Conversion equipment, it is corresponding to obtain for the first text and the second text to be compared to be carried out picture format conversion respectively The first picture and second picture;
Determining device is matched, for determining the pixel matching situation between the first picture and second picture;And
Shape recency determining device, for determining the shape between the first text and the second text based on determining pixel matching situation Recency;
Wherein, matching determining device includes:
Converter unit is specifically used for for carrying out at least one geometric transformation respectively to the first picture and second picture to first Picture carries out the geometric transformation of m kind, obtains m transformed first pictures, carries out the geometric transformation of n kind to second picture, obtains n Transformed second picture, here, m and n are positive integer, and the geometric transformation of m kind and the geometric transformation of n kind are in quantity or type It can be the same or different;And
Matching unit, for each among the first picture and m transformed first pictures before determining geometric transformation It is a respectively with before geometric transformation second picture and n transformed second picture each of between pixel With situation, the pixel matching situation of a determination of (m+1) × (n+1) is obtained altogether;
Also, shape recency determining device includes:
Candidate shape recency determination unit determines the first text and second for being based respectively on the pixel matching situation of each determination Candidate shape recency between text;And
Highest shape recency determination unit, for highest candidate's shape recency among the candidate shape recency determined to be determined as the first figure Shape recency between piece and second picture.
11. equipment as claimed in claim 10, wherein conversion equipment divides the first text and the second text according to raster font It carry out not picture format conversion.
12. equipment as claimed in claim 10, wherein matching determining device determines the first picture as unit of single pixel Pixel matching situation between second picture.
13. equipment as claimed in claim 10, wherein
Match determining device further include: binary conversion treatment unit carries out respectively for the pixel to the first picture and second picture Binary conversion treatment;
The converter unit, for Jing Guo binary conversion treatment the first picture and second picture carry out at least one geometry respectively Transformation;
The matching unit, for determining that every kind of original first picture, the first picture by binary conversion treatment and process are several What transformed first picture is respectively with original second picture, the second picture by binary conversion treatment and by every kind of geometry Pixel matching situation between transformed second picture.
14. equipment as claimed in claim 10, wherein it is described at least one geometric transformation include translation transformation, scale transformation, Delete row transformation and delete rank transformation at least one of.
15. equipment as claimed in claim 10, wherein matching determining device is by considering the matching between non-empty white pixel pair Situation determines the pixel matching situation between the first picture and second picture;
Wherein, non-empty white pixel is to being pixel pair where stroke in the first picture and second picture.
16. equipment as claimed in claim 10, further includes:
Stroke determining device, for determining the stroke quantity of the first text and the second text to be compared;
Wherein, if stroke determining device determine the first text and the second text at least one of stroke quantity be less than it is pre- If threshold value, then conversion equipment executes picture format conversion, and matching determining device determines pixel matching situation, and shape recency determines Device determines the shape recency between the first text and the second text based on determining pixel matching situation;Otherwise, shape recency is true Determine device and determine based on the editing distance of the first text and the second text the shape recency between the first text and the second text.
17. the equipment as described in any claim among claim 10 to 16, wherein the equipment is directed to multiple first Text and multiple second texts determine the shape recency between each first text and each second text respectively, also, described set It is standby further include:
Character library establishes device, and for being directed to each first text, selection and each first text shape recency are higher at least Nearly word form of one the second text as each first text, and establish by the multiple first text and its nearly word form group At nearly word form library.
18. a kind of equipment for executing text check and correction, comprising:
Text determining device, for determining text to be proofreaded;
Nearly word form searcher, for from the nearly word form library that the equipment as described in claim 17 is established search described in wait proofread The nearly word form of text;And
Nearly word form output device, for export search described in text to be proofreaded nearly word form.
19. a kind of for determining the computer-readable medium of text shape recency, wherein recorded on the computer-readable medium There is the computer program for executing following steps:
(A) the first text and the second text to be compared are subjected to picture format conversion respectively, to obtain corresponding first picture And second picture;
(B) the pixel matching situation between the first picture and second picture is determined;And
(C) the shape recency between the first text and the second text is determined based on determining pixel matching situation;
Wherein, step (B) includes:
(b3) it carries out at least one geometric transformation respectively to the first picture and second picture, specifically includes: m is carried out to the first picture Kind geometric transformation, obtains m transformed first pictures, carries out the geometric transformation of n kind to second picture, and it is a transformed to obtain n Second picture, here, m and n are positive integer, and the geometric transformation of m kind can be identical in quantity or type with the geometric transformation of n kind It can also be different;And
(b4) the first picture and m before determining geometric transformation transformed first picture each of respectively with it is several What convert before second picture and n transformed second picture each of between pixel matching situation, one The pixel matching situation of a determination of (m+1) × (n+1) is obtained altogether;
Also, step (C) includes:
(c1) the pixel matching situation of each determination is based respectively on to determine that the candidate shape between the first text and the second text is close Degree;And
(c2) highest candidate's shape recency among determining candidate shape recency is determined as between the first picture and second picture Shape recency.
20. computer-readable medium as claimed in claim 19, wherein literary by first according to raster font in step (A) Word and the second text carry out picture format conversion respectively.
21. computer-readable medium as claimed in claim 19, wherein in step (B), come as unit of single pixel true Pixel matching situation between fixed first picture and second picture.
22. computer-readable medium as claimed in claim 19, wherein
Step (B) further include: (b1) carries out binary conversion treatment to the pixel of the first picture and second picture respectively;
The step (b3) include: to Jing Guo binary conversion treatment the first picture and second picture carry out at least one geometry respectively Transformation;
The step (b4) comprises determining that every kind of original first picture, the first picture by binary conversion treatment and process are several What transformed first picture is respectively with original second picture, the second picture by binary conversion treatment and by every kind of geometry Pixel matching situation between transformed second picture.
23. computer-readable medium as claimed in claim 19, wherein at least one geometric transformation includes that translation becomes Change, scale transformation, delete row transformation and delete rank transformation at least one of.
24. computer-readable medium as claimed in claim 19, wherein in step (B), by considering non-empty white pixel pair Between match condition determine the pixel matching situation between the first picture and second picture;
Wherein, non-empty white pixel is to being pixel pair where stroke in the first picture and second picture.
25. computer-readable medium as claimed in claim 19, wherein the computer program also executes step (D) and step Suddenly (E), wherein
The computer program executes step (D) before step (A): determining the first text and the second text to be compared Stroke quantity,
Wherein, if determined in step (D) the first text and the second text at least one of stroke quantity be less than it is pre- If threshold value, then the computer program executes step (A), step (B) and step (C);Otherwise, the computer program executes step Suddenly (E): the shape recency between the first text and the second text is determined based on the editing distance of the first text and the second text.
26. the computer-readable medium as described in any claim among claim 19 to 25, wherein the computer Program pin determines the shape between each first text and each second text to multiple first texts and multiple second texts respectively Recency, also, the computer program also executes:
(F) it is directed to each first text, selection is made with described each at least one higher second text of first text shape recency For the nearly word form of each first text, and establish the nearly word form library being made of the multiple first text and its nearly word form.
27. a kind of computer-readable medium for executing text check and correction, wherein record and be used on the computer-readable medium Execute the computer program of following steps:
Determine text to be proofreaded;
The shape of text to be proofreaded described in search from the nearly word form library that the computer-readable medium as described in claim 26 is established Nearly word;And
Export the nearly word form of the text to be proofreaded searched.
28. it is a kind of for determining the computing device of text shape recency, including storage unit and processor, it is stored in storage unit Set of computer-executable instructions is closed, and when the set of computer-executable instructions, which is closed, to be executed by the processor, executes following steps It is rapid:
(A) the first text and the second text to be compared are subjected to picture format conversion respectively, to obtain corresponding first picture And second picture;
(B) the pixel matching situation between the first picture and second picture is determined;And
(C) the shape recency between the first text and the second text is determined based on determining pixel matching situation;Wherein, step (B) include:
(b3) it carries out at least one geometric transformation respectively to the first picture and second picture, specifically includes: m is carried out to the first picture Kind geometric transformation, obtains m transformed first pictures, carries out the geometric transformation of n kind to second picture, and it is a transformed to obtain n Second picture, here, m and n are positive integer, and the geometric transformation of m kind can be identical in quantity or type with the geometric transformation of n kind It can also be different;And
(b4) the first picture and m before determining geometric transformation transformed first picture each of respectively with it is several What convert before second picture and n transformed second picture each of between pixel matching situation, one The pixel matching situation of a determination of (m+1) × (n+1) is obtained altogether;
Also, step (C) includes:
(c1) the pixel matching situation of each determination is based respectively on to determine that the candidate shape between the first text and the second text is close Degree;And
(c2) highest candidate's shape recency among determining candidate shape recency is determined as between the first picture and second picture Shape recency.
29. computing device as claimed in claim 28, wherein in step (A), according to raster font by the first text and Two texts carry out picture format conversion respectively.
30. computing device as claimed in claim 28, wherein in step (B), determine first as unit of single pixel Pixel matching situation between picture and second picture.
31. computing device as claimed in claim 28, wherein
Step (B) further include: (b1) carries out binary conversion treatment to the pixel of the first picture and second picture respectively;
The step (b3) include: to Jing Guo binary conversion treatment the first picture and second picture carry out at least one geometry respectively Transformation;
The step (b4) comprises determining that every kind of original first picture, the first picture by binary conversion treatment and process are several What transformed first picture is respectively with original second picture, the second picture by binary conversion treatment and by every kind of geometry Pixel matching situation between transformed second picture.
32. computing device as claimed in claim 28, wherein at least one geometric transformation includes translation transformation, scaling Convert, delete row transformation and delete rank transformation at least one of.
33. computing device as claimed in claim 28, wherein in step (B), by considering between non-empty white pixel pair Match condition determines the pixel matching situation between the first picture and second picture;
Wherein, non-empty white pixel is to being pixel pair where stroke in the first picture and second picture.
34. computing device as claimed in claim 28, wherein the set of computer-executable instructions, which is closed, also executes step (D) With step (E), wherein
The set of computer-executable instructions is closed before step (A), is executed step (D): determine the first text to be compared and The stroke quantity of second text,
Wherein, if determined in step (D) the first text and the second text at least one of stroke quantity be less than it is pre- If threshold value, then the set of computer-executable instructions, which is closed, executes step (A), step (B) and step (C);Otherwise, the computer Executable instruction set executes step (E): the first text and the are determined based on the editing distance of the first text and the second text Shape recency between two texts.
35. the computing device as described in any claim among claim 28 to 34, wherein the computer is executable Instruction set determines between each first text and each second text respectively for multiple first texts and multiple second texts Shape recency, also, the set of computer-executable instructions close also execute:
(F) it is directed to each first text, selection is made with described each at least one higher second text of first text shape recency For the nearly word form of each first text, and establish the nearly word form library being made of the multiple first text and its nearly word form.
36. a kind of computing device for executing text check and correction, including storage unit and processor are stored with computer in storage unit Executable instruction set executes following step when the set of computer-executable instructions, which is closed, to be executed by the processor:
Determine text to be proofreaded;
The nearly word form of text to be proofreaded described in search from the nearly word form library that the computing device as described in claim 35 is established;With And
Export the nearly word form of the text to be proofreaded searched.
CN201710067064.7A 2017-02-07 2017-02-07 Method and apparatus for determining text shape recency Active CN106874947B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710067064.7A CN106874947B (en) 2017-02-07 2017-02-07 Method and apparatus for determining text shape recency

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710067064.7A CN106874947B (en) 2017-02-07 2017-02-07 Method and apparatus for determining text shape recency

Publications (2)

Publication Number Publication Date
CN106874947A CN106874947A (en) 2017-06-20
CN106874947B true CN106874947B (en) 2019-03-12

Family

ID=59165795

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710067064.7A Active CN106874947B (en) 2017-02-07 2017-02-07 Method and apparatus for determining text shape recency

Country Status (1)

Country Link
CN (1) CN106874947B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107633250B (en) * 2017-09-11 2023-04-18 畅捷通信息技术股份有限公司 Character recognition error correction method, error correction system and computer device
CN110069753A (en) * 2018-01-24 2019-07-30 北京京东尚科信息技术有限公司 A kind of method and apparatus generating similarity information
CN110414496B (en) * 2018-04-26 2022-05-27 百度在线网络技术(北京)有限公司 Similar word recognition method and device, computer equipment and storage medium
CN109190615B (en) * 2018-07-26 2021-12-03 徐庆 Shape-near word recognition determination method, device, computer device and storage medium
CN110928915A (en) * 2018-08-31 2020-03-27 北京京东金融科技控股有限公司 Method, device and equipment for fuzzy matching of Chinese names and readable storage medium
CN110032920A (en) * 2018-11-27 2019-07-19 阿里巴巴集团控股有限公司 Text region matching process, equipment and device
CN110097002B (en) * 2019-04-30 2020-12-11 北京达佳互联信息技术有限公司 Shape and proximity word determining method and device, computer equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104484397A (en) * 2014-12-16 2015-04-01 上海合合信息科技发展有限公司 Automatic image document sequencing method and automatic image document sequencing device
CN104504579A (en) * 2014-12-09 2015-04-08 北京奇虎科技有限公司 Information checking method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101551859B (en) * 2008-03-31 2012-01-04 夏普株式会社 Image recognition device and image retrieval device
CN102375807B (en) * 2010-08-27 2014-01-15 汉王科技股份有限公司 Method and device for proofing characters

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104504579A (en) * 2014-12-09 2015-04-08 北京奇虎科技有限公司 Information checking method and device
CN104484397A (en) * 2014-12-16 2015-04-01 上海合合信息科技发展有限公司 Automatic image document sequencing method and automatic image document sequencing device

Also Published As

Publication number Publication date
CN106874947A (en) 2017-06-20

Similar Documents

Publication Publication Date Title
CN106874947B (en) Method and apparatus for determining text shape recency
WO2020119420A1 (en) Front-end page generation method and apparatus, computer device, and storage medium
CN101558416B (en) Text detection on mobile communications devices
US20140313216A1 (en) Recognition and Representation of Image Sketches
CN111428457B (en) Automatic formatting of data tables
CN107766349B (en) Method, device, equipment and client for generating text
CN111753717B (en) Method, device, equipment and medium for extracting structured information of text
US11386589B2 (en) Method and device for image generation and colorization
CN111259772A (en) Image annotation method, device, equipment and medium
JP2015130158A (en) Cloud-based font service system
KR20200059993A (en) Apparatus and method for generating conti for webtoon
CN110197238A (en) A kind of recognition methods, system and the terminal device of font classification
CN111208998A (en) Method and device for automatically laying out data visualization large screen and storage medium
CN111640123B (en) Method, device, equipment and medium for generating background-free image
Laich et al. Guiding program synthesis by learning to generate examples
Rahmadi et al. Visual recognition of graphical user interface components using deep learning technique
CN111274249A (en) User image data storage optimization method, device and readable storage medium
US11461909B2 (en) Method, medium, and apparatus for specifying object included in image utilizing inverted index
US9846951B2 (en) Determining a consistent color for an image
WO2023284670A1 (en) Construction method and apparatus for graphic code extraction model, identification method and apparatus, and device and medium
CN105843414A (en) Input correction method for input method and input method device
CN111709338A (en) Method and device for detecting table and training method of detection model
US20160042545A1 (en) Display controller, information processing apparatus, display control method, computer-readable storage medium, and information processing system
CN104423964A (en) Method and system used for determining visualization credibility
CN114064010A (en) Front-end code generation method, device, system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant