CN103996021A - Fusion method of multiple character identification results - Google Patents

Fusion method of multiple character identification results Download PDF

Info

Publication number
CN103996021A
CN103996021A CN201410191507.XA CN201410191507A CN103996021A CN 103996021 A CN103996021 A CN 103996021A CN 201410191507 A CN201410191507 A CN 201410191507A CN 103996021 A CN103996021 A CN 103996021A
Authority
CN
China
Prior art keywords
character
msub
mtr
mtd
characters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410191507.XA
Other languages
Chinese (zh)
Inventor
吕岳
陈圣昌
吕淑静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Normal University
Original Assignee
East China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Normal University filed Critical East China Normal University
Priority to CN201410191507.XA priority Critical patent/CN103996021A/en
Publication of CN103996021A publication Critical patent/CN103996021A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Input (AREA)

Abstract

The invention discloses a fusion method of multiple character identification results. The fusion method comprises the steps that at least two strings are obtained from at least two character identifiers, and each string comprises multiple characters; identical characters in the two strings are aligned via an optimal alignment algorithm based on a minimal editing distance; all the strings are aligned according to the identical characters, i.e., the identical characters in the multiple strings are aligned; segmentation is carried out according to the aligned identical characters in the multiple strings to obtain segmental aligned links; and an optimal link path is selected from the segmental aligned links to obtain a fusion result. The method determines a result which is most probable to be correct in multiple different identification parts by utilizing a statistical model based on characters, thereby selecting the optimal link path, and achieving a good effect.

Description

Fusion method of multi-character recognition results
Technical Field
The invention relates to a character recognition technology, in particular to a fusion method of multi-character recognition results.
Background
Automatic mail sorting is an important component of postal automation, wherein, one automatic mail sorting technology is to collect mail images, segment the postal code area and address area of mail receivers, identify numbers and Chinese characters of the segmentation results, and realize automatic sorting according to the identification results. Therefore, a correct identification of the mail addresses is an important basis for a correct sorting.
In practical application, the address area of the mail is not clear enough, and the like, which often brings many errors to the recognition result of the character recognizer, and there are two main types: firstly, the character segmentation of the address block is correct, but errors are caused because the first character recognition accuracy is not high enough; the second is character segmentation error of address block, which causes recognition result error. For the errors, the proposed and used method for fusing the results of the multiple character recognizers can reduce the influence caused by the errors of a single character recognizer, so that the recognition accuracy of the final result is greatly improved.
The recognition error correction of the Chinese character recognizer belongs to a post-processing part of a recognition system, namely, the error result of the character recognizer is corrected by combining the semantics and word senses of natural language. In the prior art, post-processing is mainly performed based on single-character recognizer recognition results, and error correction for the single-character recognizer recognition results is mainly based on two methods, namely statistics and rules. The rule-based approach is to use a rule set and some exact dictionary information; statistical-based methods typically use a language model that is based on knowledge of the language and knowledge in the analysis corpus. For single-character recognizer recognition results, if the erroneous result is due to character segmentation errors of the character, it is difficult to correct whether rule-based or statistical-based.
Disclosure of Invention
The invention overcomes the defects of the prior art and provides a method for fusing multi-character recognition results.
The invention provides a method for fusing multi-character recognition results, which comprises the following steps:
the method comprises the following steps: obtaining at least 2 character strings from at least 2 character recognizers; the character string comprises a plurality of characters;
step two: aligning the same character in the two character strings by using an optimized alignment algorithm based on the minimum editing distance;
step three: aligning all character strings according to the same character to realize the same character alignment of multiple character strings;
step four: segmenting according to the same aligned characters in the multi-character string to obtain segment aligned links;
step five: and selecting the optimal link path in the segment alignment link to obtain a fusion result.
In the method for fusing the multi-character recognition results provided by the invention, the second step comprises the following steps:
step a: calculating the minimum editing distance between the two character strings to generate an editing distance matrix;
step b: obtaining a unit which can be reached by a minimum edit distance backspacing path in the edit distance matrix, and calculating an attribute tuple of the unit;
step c: acquiring an optimal alignment mode from the unit according to the attribute tuple;
step d: and repeating the steps a to c until the two character strings are aligned.
In the method for fusing the multi-character recognition results, the minimum editing distance between the characters is expressed by the following formula:
<math> <mrow> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <mi>dis</mi> <mi>tan</mi> <mi>ce</mi> <mo>[</mo> <mn>0,0</mn> <mo>]</mo> <mo>=</mo> <mn>0</mn> </mtd> </mtr> <mtr> <mtd> <mi>dis</mi> <mi>tan</mi> <mi>ce</mi> <mo>[</mo> <mi>i</mi> <mo>,</mo> <mn>0</mn> <mo>]</mo> <mo>=</mo> <mi>dis</mi> <mi>tan</mi> <mi>ce</mi> <mo>[</mo> <mi>i</mi> <mo>-</mo> <mn>1,0</mn> <mo>]</mo> <mo>+</mo> <mn>1</mn> <mi>for</mi> <mn>1</mn> <mo>&le;</mo> <mi>i</mi> <mo>&le;</mo> <mi>m</mi> </mtd> </mtr> <mtr> <mtd> <mi>dis</mi> <mi>tan</mi> <mi>ce</mi> <mo>[</mo> <mn>0</mn> <mo>,</mo> <mi>j</mi> <mo>]</mo> <mo>=</mo> <mi>dis</mi> <mi>tan</mi> <mi>ce</mi> <mo>[</mo> <mn>0</mn> <mo>,</mo> <mi>j</mi> <mo>-</mo> <mn>1</mn> <mo>]</mo> <mo>+</mo> <mn>1</mn> <mi>for</mi> <mn>1</mn> <mo>&le;</mo> <mi>j</mi> <mo>&le;</mo> <mi>n</mi> </mtd> </mtr> </mtable> </mfenced> <mo>;</mo> </mrow> </math>
dis tan ce [ i , j ] = min dis tan ce [ i - 1 , j ] + ins - cos t ( B j - 1 ) dis tan ce [ i - 1 , j - 1 ] + subst - cos t ( A i - 1 , B j - 1 ) dis tan ce [ i , j - 1 ] + del - cos t ( A i - 1 ) ;
wherein, <math> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <mi>ins</mi> <mo>-</mo> <mi>cos</mi> <mi>t</mi> <mrow> <mo>(</mo> <msub> <mi>B</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mn>1</mn> </mtd> </mtr> <mtr> <mtd> <mi>del</mi> <mo>-</mo> <mi>cos</mi> <mi>t</mi> <mrow> <mo>(</mo> <msub> <mi>A</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mn>1</mn> </mtd> </mtr> <mtr> <mtd> <mi>subst</mi> <mo>-</mo> <mi>cos</mi> <mi>t</mi> <mrow> <mo>(</mo> <msub> <mi>A</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>A</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <mn>2</mn> <mo>,</mo> <msub> <mi>A</mi> <mi>i</mi> </msub> <mo>&NotEqual;</mo> <msub> <mi>A</mi> <mi>j</mi> </msub> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> <mo>,</mo> <msub> <mi>A</mi> <mi>i</mi> </msub> <mo>=</mo> <msub> <mi>A</mi> <mi>j</mi> </msub> </mtd> </mtr> </mtable> </mfenced> </mtd> </mtr> </mtable> </mfenced> </math>
wherein distance [ i, j ]]Represents the minimum edit distance, i represents the character number in the target string, m represents the total number of characters in the target string, j represents the character number in the source string, n represents the total number of characters in the source string, ins-cost (B)j) Indicating a distance penalty, del-cost (A) of adding a characteri) Represents the distance cost of deleting a character, subs-cost (A)i,Aj) Representing the distance cost of replacing a character.
In the method for fusing the multi-character recognition results, the optimal path comprises the following steps:
step b 1: for two address strings with the lengths of m and n respectively, constructing an editing distance matrix with m +1 rows and n +1 columns, selecting units [ m, n ] or [0, 0] from the editing distance matrix as a starting point and an end point respectively, and taking the starting point to the end point as a path direction;
step b 2: establishing a tuple for characterizing each cell attribute in the distance editing matrix, the tuple comprising:
element targetijFor characterizing the maximum number of identical characters from said starting point to said cell;
element tagijIf the numerical value is true, the character in the ith row is represented to be the same as the character in the jth column;
element subijCharacterizing a maximum number of replacement operations from the endpoint to the cell;
element leftijIf the value is true, the unit is characterized by the existence of a transverse unit;
element downijIf the numerical value is true, the unit is characterized to have a longitudinal unit;
element obliqueijIf the numerical value is true, the unit is represented to have an inclined unit;
step b 3: according to the tuple, if the transverse unit of the starting point exists and the maximum replacing operation times of the transverse unit are equal to the maximum replacing operation times of the starting point, the path is from the starting point to the transverse unit; otherwise, if the longitudinal unit of the starting point exists and the maximum replacing operation times of the longitudinal unit are equal to the maximum replacing operation times of the starting point, the path is from the starting point to the longitudinal unit; otherwise, if the slant unit of the starting point exists, the path is from the starting point to the slant unit; after the path is updated, continuing to update the path trend according to the tuple until the path is from the starting point to the end point position;
step b 4: obtaining the tuple tag from the pathijAnd for the true unit, obtaining the same character between two character strings, and aligning the two character strings according to the same character.
In the method for fusing the multi-character recognition results, the characters are grouped according to positions by the aligned character strings in the fourth step, the probability values of the characters between the groups from one character to the other are calculated one by one, and the path formed by the characters with the maximum probability value is marked as the path of the correct character.
In the method for fusing multi-character recognition results, the probability is expressed by the following formula:
in the formula, rk1,rk2,rk3Respectively represent the weight, pr (a)k|ak+1) Is shown in character ak+1Character a in case of already occurringkProbability of occurrence, pr (b)k|bk+1) Is shown in character bk+1Character b in case of already occurringkProbability of occurrence, pr (c)k|ck+1) Is shown in character ck+1Character c in case it has appearedkProbability of occurrence, pr (L)A) Represents a segment LA={a1,a2,...,amThe probability of occurrence of the entire string in pr (L)B) Represents LB={b1,b2,...,bnThe probability of occurrence of the entire string, pr (L)C) Represents LC={c1,c2,...,cpThe probability of occurrence of the entire string.
The beneficial effects of the invention include: the invention adopts an optimal alignment method based on the minimum editing distance, and selects a path which can ensure that the number of times of replacement operation is the maximum when the number of the same character alignment is the maximum by calculating the maximum number of the same characters and the maximum number of times of replacement operation of each path unit, thereby maximizing the expected alignment. In order to solve the problem, a statistical model based on characters is used for confirming the most probable correct result in a recognition difference part, so that the optimal link path is selected, and a good effect is achieved.
Drawings
FIG. 1 is a flow chart of a method for fusing multiple character recognition results according to the present invention.
FIG. 2 is a diagram of an edit distance matrix in one embodiment.
Fig. 3 is a diagram of a segment aligned link in one embodiment.
FIG. 4 is a flow diagram of tagging element attribute tuples.
FIG. 5 is a flowchart of a method for obtaining an optimal alignment of two address strings according to a path element attribute tuple.
Fig. 6 is a flow chart of a selection probability calculation method.
Detailed Description
The present invention will be described in further detail with reference to the following specific examples and the accompanying drawings. The procedures, conditions, experimental methods and the like for carrying out the present invention are general knowledge and common general knowledge in the art except for the contents specifically mentioned below, and the present invention is not particularly limited.
The invention confirms the most correct character possible in the difference parts of a plurality of character strings through the statistical model of the character recognition result, thereby selecting the path of the optimal link and further achieving good recognition effect. The invention is suitable for recognizing Chinese and English characters in images, and is particularly suitable for recognizing Chinese addresses containing Chinese and English characters and numbers. As shown in fig. 1, the method for fusing multi-character recognition results of the present invention comprises the following steps:
the method comprises the following steps: obtaining at least 2 character strings from at least 2 character recognizers; the character string includes a plurality of characters;
step two: aligning the same character in the two character strings by using an optimized alignment algorithm based on the minimum editing distance;
step three: aligning all character strings according to the same character to realize the alignment of the same character of multiple character strings;
step four: segmenting according to the same aligned characters in the multi-character string to obtain segment aligned links;
step five: and selecting the optimal link path in the segment alignment link to obtain a fusion result.
The invention can fuse the results of a plurality of character recognizers and can effectively improve the performance of a recognition system.
The following exemplifies a character string composed of three character recognizers, in which recognition errors exist in the single character recognizer. The character string is a character string generated by a character recognizer according to a character image containing' correct address: and 4, identifying a target image of 'New people shopping in New people network of 41 th building No. 755 of Weihai road of Shanghai city'.
OCR A: the squashed canoe minor of the New people 11, chorea No. 755, of New people
OCR B: xinminjiu Jun of Xinmin network, Wei Hai Lu 755 # 41G Ba
OCR C: new people shopping with Weihailu No. 7S5 straight-building new people network
Since different character recognizers have great difference in address string segmentation and recognition, there are many segmentation or recognition errors in their character strings. It is possible for a character segmentation error to divide a plurality of characters into 1 character or 1 character into a plurality of characters. This makes the length and position of the recognition address strings output by different character recognizers not necessarily the same. Therefore, when fusing the results of multiple character recognizers, it is necessary to align the same characters that are correctly recognized. The invention adopts an alignment method based on the editing distance, and can effectively select the expected optimal path.
The edit distance of two strings represents the minimum cost required to convert from one string to another through the following three editing operations. The editing operation includes three types: add (I), delete (D) and replace (S), each with a different cost value.
In the alignment of the output address strings of the multi-character recognizer, the method uses the address string alignment based on the editing distance and mainly comprises the following 3 steps:
step a: calculating the minimum editing distance between the two character strings to generate an editing distance matrix;
step b: obtaining a unit which can be reached by a minimum edit distance backspacing path in the edit distance matrix, and calculating an attribute tuple of the unit;
step c: acquiring an optimal alignment mode from the unit according to the attribute tuple;
step d: and repeating the steps a to c until the two character strings are aligned.
Calculating an edit distance for two character strings, wherein A ═ a1,a2,...,amIs the target string, B ═ B1,b2,...,bnAnd is the source character string. By distance [ i, j ]]Representing a character string { a }1,a2,...,aiI is not less than 1 and not more than m and b1,b2,...,bnJ is less than or equal to 1 and less than or equal to n. The value of each cell of the edit distance matrix is the minimum of the costs in the three paths that may exist to reach the cell. ComputingThe method comprises the following steps:
<math> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <mi>dis</mi> <mi>tan</mi> <mi>ce</mi> <mo>[</mo> <mn>0,0</mn> <mo>]</mo> <mo>=</mo> <mn>0</mn> </mtd> </mtr> <mtr> <mtd> <mi>dis</mi> <mi>tan</mi> <mi>ce</mi> <mo>[</mo> <mi>i</mi> <mo>,</mo> <mn>0</mn> <mo>]</mo> <mo>=</mo> <mi>dis</mi> <mi>tan</mi> <mi>ce</mi> <mo>[</mo> <mi>i</mi> <mo>-</mo> <mn>1,0</mn> <mo>]</mo> <mo>+</mo> <mn>1</mn> <mi>for</mi> <mn>1</mn> <mo>&le;</mo> <mi>i</mi> <mo>&le;</mo> <mi>m</mi> </mtd> </mtr> <mtr> <mtd> <mi>dis</mi> <mi>tan</mi> <mi>ce</mi> <mo>[</mo> <mn>0</mn> <mo>,</mo> <mi>j</mi> <mo>]</mo> <mo>=</mo> <mi>dis</mi> <mi>tan</mi> <mi>ce</mi> <mo>[</mo> <mn>0</mn> <mo>,</mo> <mi>j</mi> <mo>-</mo> <mn>1</mn> <mo>]</mo> <mo>+</mo> <mn>1</mn> <mi>for</mi> <mn>1</mn> <mo>&le;</mo> <mi>j</mi> <mo>&le;</mo> <mi>n</mi> </mtd> </mtr> </mtable> </mfenced> </math>
dis tan ce [ i , j ] = min dis tan ce [ i - 1 , j ] + ins - cos t ( B j - 1 ) dis tan ce [ i - 1 , j - 1 ] + subst - cos t ( A i - 1 , B j - 1 ) dis tan ce [ i , j - 1 ] + del - cos t ( A i - 1 ) ;
wherein, <math> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <mi>ins</mi> <mo>-</mo> <mi>cos</mi> <mi>t</mi> <mrow> <mo>(</mo> <msub> <mi>B</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mn>1</mn> </mtd> </mtr> <mtr> <mtd> <mi>del</mi> <mo>-</mo> <mi>cos</mi> <mi>t</mi> <mrow> <mo>(</mo> <msub> <mi>A</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mn>1</mn> </mtd> </mtr> <mtr> <mtd> <mi>subst</mi> <mo>-</mo> <mi>cos</mi> <mi>t</mi> <mrow> <mo>(</mo> <msub> <mi>A</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>A</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <mn>2</mn> <mo>,</mo> <msub> <mi>A</mi> <mi>i</mi> </msub> <mo>&NotEqual;</mo> <msub> <mi>A</mi> <mi>j</mi> </msub> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> <mo>,</mo> <msub> <mi>A</mi> <mi>i</mi> </msub> <mo>=</mo> <msub> <mi>A</mi> <mi>j</mi> </msub> </mtd> </mtr> </mtable> </mfenced> </mtd> </mtr> </mtable> </mfenced> </math>
wherein distance [ i, j ]]Represents the minimum edit distance, i represents the character number in the target string, m represents the total number of characters in the target string, j represents the character number in the source string, n represents the total number of characters in the source string, ins-cost (B)j) Indicating a distance penalty, del-cost (A) of adding a characteri) Represents the distance cost of deleting a character, subs-cost (A)i,Aj) Representing the distance cost of replacing a character.
For example, the above algorithm is used to calculate the edit distance matrix of the character strings "north way of the middle mountain" and "iron mountain way of the Baoshan", as shown in table 1:
TABLE 1 edit distance matrix for character strings "Zhongshan Bei Yi" and "Baoshan Tie shan Yi
Road surface 5 6 5 6 7 6
4 5 4 5 6 7
North China 3 4 3 4 5 6
Mountain 2 3 2 3 4 5
In 1 2 3 4 5 6
# 0 1 2 3 4 5
# Treasure Mountain Iron Mountain Road surface
The optimal path selected based on the minimum edit distance rollback method is not unique, and different paths have obvious difference in alignment. As shown in table 2, two different alignment modes are represented by the same minimum edit distance and the maximum number of characters, and different numbers of replacement operations. The occurrence probability of the first alignment mode is greater than that of the second alignment mode. Therefore, the present invention provides a method for selecting an optimal path. The method not only meets the requirement of the minimum editing distance, but also meets the requirement of the maximum number of replacement operations when the number of the same characters is selected to be the maximum, and the improvement can obviously improve the accuracy of alignment.
TABLE 2 alignment of the strings "Zhongshan Bei Yi" and "Baoshan Tie shan Yi
For the selection of the optimal path, the time complexity is reduced if each path is searchedIs O (3)n) Where n is the larger of the source string length and the target string length. In order to solve the problem that the complexity of searching a target path is too high, the following path searching method is provided.
Edit each cell [ i, j ] of the distance matrix]Using a tuple of attributes (target)ij,subij,tagij,leftij,downij,obliqueij) Representing the properties of the cell. The attribute tuple includes:
element targetijFor characterizing the maximum number of identical characters from the starting point to the cell;
element tagijIf the numerical value is true, the method represents whether the ith row character is the same as the jth row character;
element subijFor characterizing a maximum number of replacement operations from the endpoint to the cell;
element leftijIf the numerical value is true, the representation unit has a transverse unit;
element downijIf the numerical value is true, the representation unit has a longitudinal unit;
element obliqueijIf the numerical value is true, the representation unit has an oblique unit;
for each cell of the edit distance matrix, there are 3 possible directional paths, which are horizontal, vertical and diagonal, respectively. And marking the direction attribute of the path unit according to the minimum edit distance rollback path.
Referring to FIG. 4, the edit distance matrix is a matrix of i rows and j columns with cells [0, 0]]As an end point, cell [ i, j ]]As a starting point, for cell [ i, j]If the cell [ i, j-1 ]]Is present and distance [ i, j-1 ]]<distance[i,j]Then leftijTrue; if the cell [ i-1, j ]]Exists and distance [ i-1, j ]]<distance[i,j]Then downijTrue; if the cell [ i-1, j-1 ] is obliquely down]Is present and distance [ i-1, j-1]<distance[i,j]Or distance[i-1,j-1]==distance[i,j]And tagijWhen true, then obliqueijTrue. As shown in FIG. 4, the present invention also includes the use of the unit [0, 0]]As starting point, cell [ i, j ]]The way to find the optimal path for the end point is that the direction of its cells is reversed from the above process.
Path unit [ i, j ]],targetijRepresenting slave units m, n]Reach cell [ i, j]The largest number of identical characters encountered. If cell [ i, j]If the corresponding characters are equal, tagijTrue; cell [ i, j ]]It is only possible to reach from 3 directions, from above, from obliquely above and from the right. Therefore, targetij=max(targeti+1j,targetij+1,targeti+1j+1+tagi+1j+1)。
Path unit [ i, j ]],subijRepresenting the slave unit [0, 0]]Reach cell [ i, j]The maximum number of replacement operations performed. For the cell [ i, j]It is possible to arrive from any of the 3 directions of left, from bottom and obliquely bottom, and the arrival unit [ i, j ] is selected]Maximum value sub of the replacement operation ofij=max(subij-1,subi-1j,subi-1j-1+1)。
Selecting a slave unit m, n according to the attribute tuple of the unit]To [0, 0]]The path performs the replacement operation the most times under the condition of the most identical words. For cell [ i, j]If leftijTrue and subij-1=subijThen goes to the unit [ i, j-1 ]](ii) a If downijTrue and subi-1j=subijThen goes to the unit [ i-1, j ]](ii) a Otherwise, go to the unit [ i-1, j-1]. And repeating the steps until the position goes from the starting point to the end point to obtain the optimal path. Obtaining tuple tag from optimal pathijAnd if the cell is a true cell, obtaining the same character corresponding to the cell, and aligning the two character strings according to the same character to obtain the optimal alignment mode.
Table 3 shows the attribute distribution of a unit, and fig. 2 shows the unit attribute tuples generated in the alignment of two character strings "north-middle road" and "iron-mountain road" in baoshan, and the selection result of the optimal path.
Attribute representation of Table 3 elements
After the character strings are aligned pairwise by the method, the same characters aligned pairwise with addresses are obtained, and the same characters aligned pairwise with the addresses are combined into multi-address alignment by a method of searching and matching the same subscript, so that the aim of obtaining the same characters aligned with multiple addresses is fulfilled. Referring to fig. 5, the method steps for three address (OCRA, OCRB, OCRC) alignment are: 1. marking the same character labels of a certain alignment of OCRA and OCRB as i and j respectively; 2. in the alignment of OCRA and OCRC, if the ith character of OCRA is aligned with the kth character of OCRC, turning to step 3, otherwise, returning to step 1. 3. In the alignment of OCRB and OCRC, if the j-th character of OCRB is aligned with the k-th character of OCRC, the character is a plurality of address alignment characters. And (3) after the kth character is recorded, returning to the step 1 to search the next same character with multiple address alignments until all the same characters with multiple address alignments are obtained.
Multiple character recognizer string fusion is an optimal path selection problem. Aligning character strings of a plurality of character recognizers, segmenting according to the same aligned characters to form segment aligned links, and finally selecting an optimal link path by using a character-based statistical language model. In the selection of the optimal link path, because the space between Chinese written words is lack and the recognition result is wrong, the path selection based on dictionary matching and rules is difficult to use. Thus, it is a good effect to use a character-based statistical model to determine the most likely correct result in identifying the difference portion.
The character string is segmented according to the same character, and multiple character recognizers can form multi-segment alignment to form a segment alignment link. In the segment alignment link, the aligned same characters can be regarded as combined into one character, multiple candidate paths are formed among different characters, and the selected different characters with the highest probability value and the path formed by the same characters are the optimal link path. Paths are selected within the aligned segments, and the most probable segment is selected if the segments are not of the same length. In this case, there is only a path within a segment and no path between segments. And if the lengths of the plurality of segments are the same, selecting the one with the highest probability in the corresponding single character in the plurality of segments according to a reverse order, wherein the condition comprises an intra-segment path and an inter-segment path. FIG. 3 is a segment aligned link diagram formed after string alignment.
The probability maximum path is selected based on a character statistical language model, wherein the statistical language model discloses the rules existing in the natural language by using a probability statistical method, and actually the rules are probability distribution to give the probability of all possible character strings in the natural language. The appearance of any string is acceptable to the statistical language model, only with different acceptability. For example, for a string w1,w2,...,wi(i represents the string length), the probability of occurrence is:
pr(W1,W2,…,wi)=pr(W1)*pr(W2|W1)*…*pr(wi|W1,W2,…,wi-1)
wherein pr (w)l),pr(W2|W1),...,pr(wi|W1,W2,...,wi-1) Is calculated by corpus statistics. But pr (w)i|W1,W2,...,wi-1) The computation of (2) is easy to cause the sparse problem due to the insufficient completeness of the corpus, and the Markov chain model, namely an N-gram model is usually used for carrying out the hypothesis. For example: unigram: pr (w)i|W1,W2,...,wi-1)=pr(wi);bigram:pr(wi|W1,W2,...,wi-1)=pr(wi|wi-1) (ii) a The bigram model used in the invention is used for probability calculation.
Pr (a) of Bigram modelk|ak+1) Using a Maximum Likelihood Estimation (MLE) estimation,wherein # (a)k+1) Denotes ak+1Number of occurrences in corpus, # (a)k,ak+1) Is shown (a)k,ak+1) Number of occurrences in the corpus.
Due to the completeness of the corpus, there are many entries that do not exist in the corpus or occur only infrequently. For nonexistent entry probability, a simple Laplace smoothing algorithm is usedFor the small probability problem of bigram model, an interpolation method is used to improve, pr (a)k|ak+1)=λ1pr(ak|ak+1)+λ2pr(ak). Therein, sigmaiλi=1。
Calculating the probability of characters on the path has two directions of left to right and right to left. More information can be obtained by selecting right to left, and the right result can be better selected by adding some rules in the calculation of probability, wherein the rules are as follows:
1. in the Chinese address identification, when keywords such as 'Chao', 'number', 'building', 'layer', 'building', 'multi-span', 'room' and the like appear, the probability of the appearance of the numbers in the front is much greater than that of the Chinese characters;
2. in the Chinese address recognition, the probability of many keywords is much higher than that of a general word, for example, the probability of selecting keywords such as "province", "city", "district", "county", "town", "road", "village", "fidu", "number", "building", "layer", "building", "room", etc. is higher than that of a general word; therefore, it may be preferable to increase in calculating the probabilityA weight rkIn the case of a signal that satisfies the condition 1 or 2,otherwise rk=1;
3. For the probability of a single number, because the occurrence frequency of the number in the training sample is very high, and the number only exists between 0 and 9, the probability of the number is large during calculation, and therefore, a limit value N (for example, the value is 50) is given to the occurrence frequency of the single number;
4. for the probability of 2 consecutive numbers, a limited number of occurrences M (e.g., 1000) is also given for the same reason as rule 3.
Referring to fig. 6, if the address strings OCRA, OCRB, OCRC are strings of three character recognizers, L is respectively set for three segments after character segmentationA={a1,a2,...,am},LB={b1,b2,...,bn},LC={c1,c2,...,cp}. If the number m, n, p of the characters after segmentation is equal, i.e. m ═ n ═ p, then the character max (r) with the highest probability is selected in sequencek1log(pr(ak|ak+1)),rk2log(pr(bk|bk+1)),rk3log(pr(ck|ck+1) K is not less than 1 and not more than m); otherwise, if not, selecting the segment max (log (pr (L)) with the maximum probabilityA)),log(pr(LB)),log(pr(LC)))。
Wherein, log (pr (L)A) Is calculated as follows) is calculated as follows,
pr(LA)=pr(a1,a2,…,am)
=pr(am)*pr(am-1|am)*...*pr(al|a2)
for the convenience of calculation, two sides are obtained by taking logarithm,
<math> <mrow> <mi>log</mi> <mrow> <mo>(</mo> <mi>pr</mi> <mrow> <mo>(</mo> <msub> <mi>L</mi> <mi>A</mi> </msub> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>=</mo> <mi>log</mi> <mrow> <mo>(</mo> <mi>pr</mi> <mrow> <mo>(</mo> <msub> <mi>a</mi> <mi>m</mi> </msub> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>+</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mi>m</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mi>log</mi> <mrow> <mo>(</mo> <mi>pr</mi> <mrow> <mo>(</mo> <msub> <mi>a</mi> <mi>k</mi> </msub> <mo>|</mo> <msub> <mi>a</mi> <mrow> <mi>k</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </mrow> </math>
due to LA,LB,LCMay be different, to avoid the deviation caused by the different lengths, an average is taken, and a rule weight is added, that is:
<math> <mrow> <mi>log</mi> <mrow> <mo>(</mo> <mi>pr</mi> <mrow> <mo>(</mo> <msub> <mi>L</mi> <mi>A</mi> </msub> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mi>m</mi> </mfrac> <mrow> <mo>(</mo> <msub> <mi>r</mi> <mrow> <mi>m</mi> <mn>1</mn> </mrow> </msub> <mi>log</mi> <mrow> <mo>(</mo> <mi>pr</mi> <mrow> <mo>(</mo> <msub> <mi>a</mi> <mi>m</mi> </msub> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>+</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mi>m</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msub> <mi>r</mi> <mrow> <mi>k</mi> <mn>1</mn> </mrow> </msub> <mi>log</mi> <mrow> <mo>(</mo> <mi>pr</mi> <mrow> <mo>(</mo> <msub> <mi>a</mi> <mi>k</mi> </msub> <mo>|</mo> <msub> <mi>a</mi> <mrow> <mi>k</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </mrow> </math>
similarly, log (pr (L) can be calculatedB) And log (pr (L)C))。
The following exemplifies a character string composed of three character recognizers, which is a character recognizer according to the "correct address" containing a character image: and 8' of Zhangjiang Harley road 898.
OCR A: zhang Jiang Ha Le 898, 8;
OCR B: zhangjiang thunderbolt 898 neon 8;
OCR C: zhang Jiang Harley 8 in 8 No. 8;
when the optimal link path selection is performed on the 3 address strings, the situations of the same segment length and different segment lengths are encountered.
For the case where the segment lengths are not the same: l isA{98 do }, LB{98 neon }, LCSegment probabilities are calculated as follows:
due to log (pr (L)A))>log(pr(LC))>log(pr(LB) The selected segment is "98 do".
For the same segment length case: l isA(Lei) Lei, LBThunderway, LCThe link probability is calculated and selected as follows:
since r (chore, 8) log (pr (chore |8)) < r (way, 8) log (pr (way |8)), the selected character is "way".
For the selection of the next character, wherein the character that has appeared selects the character that has been selected in the previous step, i.e. "way", the probability calculation and selection are as follows:
r (thunder, road) log (pr (thunder | road)) -1 ═ log (thunder) + log (thunder | road)) -10.6176
r (thunderbolt, way) log (pr (thunderbolt | way)) -1 ═ log (thunderbolt) + log (thunderbolt |)) -38.0009
Since r (thunder, way) log (pr (thunder | way)) > r (thunderbolt, way) log (pr (thunderbolt | way)), the character selected is "thunder". The entire character of the segment is selected as "thunderroad".
The protection of the present invention is not limited to the above embodiments. Variations and advantages that may occur to those skilled in the art may be incorporated into the invention without departing from the spirit and scope of the inventive concept, and the scope of the appended claims is intended to be protected.

Claims (6)

1. A method for fusing multi-character recognition results is characterized by comprising the following steps:
the method comprises the following steps: obtaining at least 2 character strings from at least 2 character recognizers; the character string comprises a plurality of characters;
step two: aligning the same character in the two character strings by using an optimized alignment algorithm based on the minimum editing distance;
step three: aligning all character strings according to the same character to realize the same character alignment of multiple character strings;
step four: segmenting according to the same aligned characters in the multi-character string to obtain segment aligned links;
step five: and selecting the optimal link path in the segment alignment link to obtain a fusion result.
2. The method for fusing multiple character recognition results according to claim 1, wherein the second step comprises the steps of:
step a: calculating the minimum editing distance between the two character strings to generate an editing distance matrix;
step b: obtaining a unit which can be reached by a minimum edit distance backspacing path in the edit distance matrix, and calculating an attribute tuple of the unit;
step c: acquiring an optimal alignment mode from the unit according to the attribute tuple;
step d: and repeating the steps a to c until the two character strings are aligned.
3. The method for fusing multiple character recognition results according to claim 2, wherein the minimum edit distance between characters is expressed by the following formula:
<math> <mrow> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <mi>dis</mi> <mi>tan</mi> <mi>ce</mi> <mo>[</mo> <mn>0,0</mn> <mo>]</mo> <mo>=</mo> <mn>0</mn> </mtd> </mtr> <mtr> <mtd> <mi>dis</mi> <mi>tan</mi> <mi>ce</mi> <mo>[</mo> <mi>i</mi> <mo>,</mo> <mn>0</mn> <mo>]</mo> <mo>=</mo> <mi>dis</mi> <mi>tan</mi> <mi>ce</mi> <mo>[</mo> <mi>i</mi> <mo>-</mo> <mn>1,0</mn> <mo>]</mo> <mo>+</mo> <mn>1</mn> <mi>for</mi> <mn>1</mn> <mo>&le;</mo> <mi>i</mi> <mo>&le;</mo> <mi>m</mi> </mtd> </mtr> <mtr> <mtd> <mi>dis</mi> <mi>tan</mi> <mi>ce</mi> <mo>[</mo> <mn>0</mn> <mo>,</mo> <mi>j</mi> <mo>]</mo> <mo>=</mo> <mi>dis</mi> <mi>tan</mi> <mi>ce</mi> <mo>[</mo> <mn>0</mn> <mo>,</mo> <mi>j</mi> <mo>-</mo> <mn>1</mn> <mo>]</mo> <mo>+</mo> <mn>1</mn> <mi>for</mi> <mn>1</mn> <mo>&le;</mo> <mi>j</mi> <mo>&le;</mo> <mi>n</mi> </mtd> </mtr> </mtable> </mfenced> <mo>;</mo> </mrow> </math>
dis tan ce [ i , j ] = min dis tan ce [ i - 1 , j ] + ins - cos t ( B j - 1 ) dis tan ce [ i - 1 , j - 1 ] + subst - cos t ( A i - 1 , B j - 1 ) dis tan ce [ i , j - 1 ] + del - cos t ( A i - 1 ) ;
wherein, <math> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <mi>ins</mi> <mo>-</mo> <mi>cos</mi> <mi>t</mi> <mrow> <mo>(</mo> <msub> <mi>B</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mn>1</mn> </mtd> </mtr> <mtr> <mtd> <mi>del</mi> <mo>-</mo> <mi>cos</mi> <mi>t</mi> <mrow> <mo>(</mo> <msub> <mi>A</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mn>1</mn> </mtd> </mtr> <mtr> <mtd> <mi>subst</mi> <mo>-</mo> <mi>cos</mi> <mi>t</mi> <mrow> <mo>(</mo> <msub> <mi>A</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>A</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <mn>2</mn> <mo>,</mo> <msub> <mi>A</mi> <mi>i</mi> </msub> <mo>&NotEqual;</mo> <msub> <mi>A</mi> <mi>j</mi> </msub> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> <mo>,</mo> <msub> <mi>A</mi> <mi>i</mi> </msub> <mo>=</mo> <msub> <mi>A</mi> <mi>j</mi> </msub> </mtd> </mtr> </mtable> </mfenced> </mtd> </mtr> </mtable> </mfenced> </math>
wherein distance [ i, j ]]Represents the minimum edit distance, i represents the targetThe number of characters in the character string, m the total number of characters in the target character string, j the number of characters in the source character string, n the total number of characters in the source character string, ins-cost (B)j) Indicating a distance penalty, del-cost (A) of adding a characteri) Represents the distance cost of deleting a character, subs-cost (A)i,Aj) Representing the distance cost of replacing a character.
4. The method for fusing multiple character recognition results according to claim 2, wherein the optimal path comprises the steps of:
step b 1: for two address strings with the lengths of m and n respectively, constructing an editing distance matrix with m +1 rows and n +1 columns, selecting units [ m, n ] or [0, 0] from the editing distance matrix as a starting point and an end point respectively, and taking the starting point to the end point as a path direction;
step b 2: establishing a tuple for characterizing each cell attribute in the distance editing matrix, the tuple comprising:
element targetijFor characterizing the maximum number of identical characters from said starting point to said cell;
element tagijIf the numerical value is true, the character in the ith row is represented to be the same as the character in the jth column;
element subijCharacterizing a maximum number of replacement operations from the endpoint to the cell;
element leftijIf the value is true, the unit is characterized by the existence of a transverse unit;
element downijIf the numerical value is true, the unit is characterized to have a longitudinal unit;
element obliqueijIf the numerical value is true, the unit is represented to have an inclined unit;
step b 3: according to the tuple, if the transverse unit of the starting point exists and the maximum replacing operation times of the transverse unit are equal to the maximum replacing operation times of the starting point, the path is from the starting point to the transverse unit; otherwise, if the longitudinal unit of the starting point exists and the maximum replacing operation times of the longitudinal unit are equal to the maximum replacing operation times of the starting point, the path is from the starting point to the longitudinal unit; otherwise, if the slant unit of the starting point exists, the path is from the starting point to the slant unit; after the path is updated, continuing to update the path trend according to the tuple until the path is from the starting point to the end point position;
step b 4: obtaining the tuple tag from the pathijAnd for the true unit, obtaining the same character between two character strings, and aligning the two character strings according to the same character.
5. The method for fusing multi-character recognition results as claimed in claim 1, wherein in the fourth step, the characters are grouped by position according to the aligned character strings, the probability values of the characters between the groups are calculated one by one from one character, and the path composed of the characters with the maximum probability value is marked as the path of the correct character.
6. The method for fusing multiple character recognition results according to claim 5, wherein the probability is expressed by the following formula:
in the formula, rk1,rk2,rk3Respectively represent the weight, pr (a)k|ak+1) Is shown in character ak+1Character a in case of already occurringkProbability of occurrence, pr (b)k|bk+1) Is shown in character bk+1Character b in case of already occurringkProbability of occurrence, pr (c)k|ck+1) Is shown in character ck+1Character c in case it has appearedkProbability of occurrence, pr (L)A) Represents a segment LA={a1,a2,...,amThe probability of occurrence of the entire string in pr (L)B) Represents LB={b1,b2,...,bnThe probability of occurrence of the entire string, pr (L)C) Represents LC={c1,c2,...,cpThe probability of occurrence of the entire string.
CN201410191507.XA 2014-05-08 2014-05-08 Fusion method of multiple character identification results Pending CN103996021A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410191507.XA CN103996021A (en) 2014-05-08 2014-05-08 Fusion method of multiple character identification results

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410191507.XA CN103996021A (en) 2014-05-08 2014-05-08 Fusion method of multiple character identification results

Publications (1)

Publication Number Publication Date
CN103996021A true CN103996021A (en) 2014-08-20

Family

ID=51310182

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410191507.XA Pending CN103996021A (en) 2014-05-08 2014-05-08 Fusion method of multiple character identification results

Country Status (1)

Country Link
CN (1) CN103996021A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105653517A (en) * 2015-11-05 2016-06-08 乐视致新电子科技(天津)有限公司 Recognition rate determining method and apparatus
CN107220639A (en) * 2017-04-14 2017-09-29 北京捷通华声科技股份有限公司 The correcting method and device of OCR recognition results
CN107609592A (en) * 2017-09-15 2018-01-19 桂林电子科技大学 A kind of figure edit distance approach towards Letter identification
CN107967303A (en) * 2017-11-10 2018-04-27 传神语联网网络科技股份有限公司 The method and device that language material is shown
CN108052609A (en) * 2017-12-13 2018-05-18 武汉烽火普天信息技术有限公司 A kind of address matching method based on dictionary and machine learning
CN108647319A (en) * 2018-05-10 2018-10-12 思派(北京)网络科技有限公司 A kind of labeling system and its method based on short text clustering
CN111832554A (en) * 2019-04-15 2020-10-27 顺丰科技有限公司 Image detection method, device and storage medium
CN112257703A (en) * 2020-12-24 2021-01-22 北京世纪好未来教育科技有限公司 Image recognition method, device, equipment and readable storage medium
CN112784125A (en) * 2021-01-14 2021-05-11 辽宁工程技术大学 Mode identification method and device for input information

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101950560A (en) * 2010-09-10 2011-01-19 中国科学院声学研究所 Continuous voice tone identification method
EP2309487A1 (en) * 2009-09-11 2011-04-13 Honda Research Institute Europe GmbH Automatic speech recognition system integrating multiple sequence alignment for model bootstrapping
CN103680499A (en) * 2013-11-29 2014-03-26 北京中科模识科技有限公司 High-precision recognition method and high-precision recognition system on basis of voice and subtitle synchronization

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2309487A1 (en) * 2009-09-11 2011-04-13 Honda Research Institute Europe GmbH Automatic speech recognition system integrating multiple sequence alignment for model bootstrapping
CN101950560A (en) * 2010-09-10 2011-01-19 中国科学院声学研究所 Continuous voice tone identification method
CN103680499A (en) * 2013-11-29 2014-03-26 北京中科模识科技有限公司 High-precision recognition method and high-precision recognition system on basis of voice and subtitle synchronization

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
全中华等: "基于串匹配的特殊点匹配和自由伪造签名快速排除法", 《应用科学学报》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105653517A (en) * 2015-11-05 2016-06-08 乐视致新电子科技(天津)有限公司 Recognition rate determining method and apparatus
CN107220639A (en) * 2017-04-14 2017-09-29 北京捷通华声科技股份有限公司 The correcting method and device of OCR recognition results
CN107609592B (en) * 2017-09-15 2020-10-23 桂林电子科技大学 Graph editing distance method for letter recognition
CN107609592A (en) * 2017-09-15 2018-01-19 桂林电子科技大学 A kind of figure edit distance approach towards Letter identification
CN107967303A (en) * 2017-11-10 2018-04-27 传神语联网网络科技股份有限公司 The method and device that language material is shown
CN107967303B (en) * 2017-11-10 2021-03-26 传神语联网网络科技股份有限公司 Corpus display method and apparatus
CN108052609A (en) * 2017-12-13 2018-05-18 武汉烽火普天信息技术有限公司 A kind of address matching method based on dictionary and machine learning
CN108647319A (en) * 2018-05-10 2018-10-12 思派(北京)网络科技有限公司 A kind of labeling system and its method based on short text clustering
CN108647319B (en) * 2018-05-10 2021-07-06 思派(北京)网络科技有限公司 Labeling system and method based on short text clustering
CN111832554A (en) * 2019-04-15 2020-10-27 顺丰科技有限公司 Image detection method, device and storage medium
CN112257703A (en) * 2020-12-24 2021-01-22 北京世纪好未来教育科技有限公司 Image recognition method, device, equipment and readable storage medium
CN112784125A (en) * 2021-01-14 2021-05-11 辽宁工程技术大学 Mode identification method and device for input information
CN112784125B (en) * 2021-01-14 2024-07-05 辽宁工程技术大学 Method and device for identifying mode of input information

Similar Documents

Publication Publication Date Title
CN103996021A (en) Fusion method of multiple character identification results
WO2016165538A1 (en) Address data management method and device
CN108369582B (en) Address error correction method and terminal
CN112560478B (en) Chinese address Roberta-BiLSTM-CRF coupling analysis method using semantic annotation
CN104991889A (en) Fuzzy word segmentation based non-multi-character word error automatic proofreading method
CN111062376A (en) Text recognition method based on optical character recognition and error correction tight coupling processing
CN103970733B (en) A kind of Chinese new word identification method based on graph structure
CN101655837A (en) Method for detecting and correcting error on text after voice recognition
CN111062397A (en) Intelligent bill processing system
CN111897917B (en) Rail transit industry term extraction method based on multi-modal natural language features
CN109086266B (en) Error detection and correction method for text-shaped near characters
CN104615676A (en) Picture searching method based on maximum similarity matching
CN110991184B (en) Relay protection fixed value self-adaptive checking method based on comprehensive dictionary characteristics
CN110851559A (en) Automatic data element identification method and identification system
CN113901214B (en) Method and device for extracting form information, electronic equipment and storage medium
CN108304377A (en) A kind of extracting method and relevant apparatus of long-tail word
CN103324632A (en) Concept identification method and device based on collaborative learning
CN114780680A (en) Retrieval and completion method and system based on place name and address database
CN114595661A (en) Method, apparatus, and medium for reviewing bid document
CN112182353B (en) Method, electronic device, and storage medium for information search
CN105930478A (en) Element object spatial information fingerprint-based spatial data change capture method
CN117371534A (en) Knowledge graph construction method and system based on BERT
CN112651590B (en) Instruction processing flow recommending method
CN114154494A (en) Disambiguation word segmentation method, system, device and storage medium
CN108595584B (en) Chinese character output method and system based on digital marks

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20140820

WD01 Invention patent application deemed withdrawn after publication