CN1645374A - Digit marking character string searching technology - Google Patents

Digit marking character string searching technology Download PDF

Info

Publication number
CN1645374A
CN1645374A CNA2005100233835A CN200510023383A CN1645374A CN 1645374 A CN1645374 A CN 1645374A CN A2005100233835 A CNA2005100233835 A CN A2005100233835A CN 200510023383 A CN200510023383 A CN 200510023383A CN 1645374 A CN1645374 A CN 1645374A
Authority
CN
China
Prior art keywords
character string
character
bit
place value
base
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2005100233835A
Other languages
Chinese (zh)
Inventor
徐文新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CNA2005100233835A priority Critical patent/CN1645374A/en
Publication of CN1645374A publication Critical patent/CN1645374A/en
Priority to CN200510057491.4A priority patent/CN101488127B/en
Priority to PCT/CN2005/001642 priority patent/WO2006074586A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures

Abstract

An indexing method of bit labeled character string includes dividing basic character of character string to be m group, labeling these basic character information by bit 'or' operation, recording character string information named as 'bit value', using 'bit value' operation to select out preliminay result set R1 from databank record then using character normal bit to bit comparison mode for the secondary indexing to obtain final indexing result set R2.

Description

Bit mark character string retrieval technique
Technical field
The present invention is a kind of character string fuzzy search technology, and purpose is to improve the speed of database character string fuzzy search.Method is that the base character that will form character string is divided into the m group, and uses by the data W of m bit and come mark to form the base character information of character string.If the base character C1 of character string S belongs to the n group, then data W is labeled as 1 from n bit of right-to-left (also can be from left to right), similarly, according to other base character C2, C3, C4 ... under group data W is carried out mark, mark can with " or " (or) computing carry out.Finish the data W behind whole base character marks, record the information of character string S, be called " place value " of character string S.To " place value " Wt of " place value " Wn of character string Sn and character string T to be retrieved carry out the position " with " (and) computing, its result is called Wg.If Wg equals Wt, then " place value " Wn equals or comprises Wt.Because different character strings has identical place value, obtain PRELIMINARY RESULTS collection R1 utilizing " place value " computing that data-base recording is screened, again with common character by turn manner of comparison make quadratic search, draw final retrieval set R2.
Background technology
Database character string fuzzy search at present adopts by turn manner of comparison to carry out, as judges whether comprise character f among the character string bdopfqew, computing machine from first to last compares character string bdopfqew by turn with f, efficient is not high.
On October 19th, 2004, I have applied for " prime number replacing character string search technology " patent, application number 200410067258.X, this method has improved the speed of character string fuzzy search effectively, but implement " prime number replacing string search " for long character string, need write down the prime number product with the integer of a plurality of fields, more to the memory space demand.In order to improve the speed of character string fuzzy search, and minimizing is to the demand of memory space, the present invention proposes with several position (bit) composition information of coming the tab character string, after database finished mark, utilize the position " with " computing makes preliminary screening to record, utilizing by turn in PRELIMINARY RESULTS again, manner of comparison retrieves net result.
Summary of the invention
The present invention is a kind of character string fuzzy search technology, and method is that the base character that will form character string is divided into the m group, and uses by the data W of m bit and come mark to form the base character information of character string.Its process has two, one to utilize the inclusive-OR operation of position that character string is carried out mark; Two, utilize the position " with " computing retrieves, and the following describes the realization principle:
21000 Chinese characters and other symbol of income GBK scope all have ISN, according to ISN whole Chinese characters and other symbol are divided into 31 groups, for n group value of investing 2 N-1From scale-of-two, every group is being 1 on n bit of right-to-left, and all the other bit are 0, is referred to as " basic place value ".
Group Numerical value Basic place value
????1 ????1 ????00000000000000000000000000000001
????2 ????2 ????00000000000000000000000000000010
????3 ????4 ????00000000000000000000000000000100
????4 ????8 ????00000000000000000000000000001000
????5 ????16 ????00000000000000000000000000010000
????6 ????32 ????00000000000000000000000000100000
????7 ????64 ????00000000000000000000000001000000
????8 ????128 ????00000000000000000000000010000000
????9 ????256 ????00000000000000000000000100000000
????10 ????512 ????00000000000000000000001000000000
????11 ????1024 ????00000000000000000000010000000000
????12 ????2048 ????00000000000000000000100000000000
????13 ????4096 ????00000000000000000001000000000000
????14 ????8192 ????00000000000000000010000000000000
????15 ????16384 ????00000000000000000100000000000000
????16 ????32768 ????00000000000000001000000000000000
????17 ????65536 ????00000000000000010000000000000000
????18 ????131072 ????00000000000000100000000000000000
????19 ????262144 ????00000000000001000000000000000000
????20 ????524288 ????00000000000010000000000000000000
????21 ????1048576 ????00000000000100000000000000000000
????22 ????2097152 ????00000000001000000000000000000000
????23 ????4194304 ????00000000010000000000000000000000
????24 ????8388608 ????00000000100000000000000000000000
????25 ????16777216 ????00000001000000000000000000000000
????26 ????33554432 ????00000010000000000000000000000000
????27 ????67108864 ????00000100000000000000000000000000
????28 ????134217728 ????00001000000000000000000000000000
????29 ????268435456 ????00010000000000000000000000000000
????30 ????536870912 ????00100000000000000000000000000000
????31 ????1073741824 ????01000000000000000000000000000000
Be provided with character string " the straight long river of lonely cigarette, desert setting sun circle ", then:
Chinese character ISN Group Numerical value Basic place value
Greatly ????22823 ????8 ????128 ????00000000000000000000000010000000
Unconcerned ????28448 ????22 ????2097152 ????00000000001000000000000000000000
Lonely ????23396 ????23 ????4194304 ????00000000010000000000000000000000
Cigarette ????28895 ????4 ????8 ????00000000000000000000000000001000
Directly ????30452 ????11 ????1024 ????00000000000000000000010000000000
Long ????27265 ????17 ????65536 ????00000000000000010000000000000000
The river ????27827 ????21 ????1048576 ????00000000000100000000000000000000
Fall ????31683 ????2 ????2 ????00000000000000000000000000000010
Day ????26085 ????15 ????16384 ????00000000000000000100000000000000
Circle ????22278 ????21 ????1048576 ????00000000000100000000000000000000
????8471690
The place value of whole character string ????7423114 ????00000000011100010100010010001010
Place value to " big, desert, orphan, cigarette, straight, length, river, fall, day, justify " ten characters do " or " (or) computing, can obtain " place value " of whole character string: 00000000011100010100010010001010.
Another aspect, the total value of character string " the straight long river of lonely cigarette, desert setting sun circle " is 8471690, removes a repetition values 1048576 in " river " and " circle ", net value is 7423114.It is corresponding with 00000000011100010100010010001010.
Can obtain " place value " of any character string with this kind method, " place value " of " white clouds thousand years empty long " is: 00100010000001001010000000010000.
And " place value " of " the long river setting sun " is 00000000000100010100000000000010.
Judge whether " place value " Wn of character string Sn comprises or equal " place value " Wt of T, as long as " place value " Wt to " place value " Wn of character string and T do " with " (and) computing, if Wg equals Wt as a result, then Wn comprises or equals Wt, furtherly, character string Sn may comprise or equal T.That is:
Wg=Wn?and?Wt
As Wg=Wt
Then Wn comprises or equals Wt,
And Sn may comprise or equal T.
The straight long river of lonely cigarette, desert setting sun circle S1 Thousand years empty long S2 of white clouds
??00000000011100010100010010001010W1 ??00100010000001001010000000010000W2
??00000000000100010100000000000010Wt ??00000000000100010100000000000010Wt Long river setting sun T
??00000000000100010100000000000010Wg1 ??00000000000000000000000000000000Wg2 " and " value
As seen from above-mentioned, " the basic place value " of " river " and " circle " is identical, and kinds of characters string " place value " identical existence.The purpose of bit mark character string retrieval is to utilize bit arithmetic that the character string in the database is done preliminary search to obtain R1, carries out quadratic search with common relative method by turn in the result, obtains net result R2.The position " with " computing by turn than comparatively fast, in the enforcement, in order to improve retrieval rate, should reduce R1 more than character as far as possible, makes it near R2, reduces the used time of quadratic search.
Some explanation:
1. establishing the character string average length is L, and data-base recording bar number is R, and string length to be retrieved is l, and the used figure place of mark is m, and then the bar number of preliminary search result set R1 can be estimated roughly with following formula:
R 1 = ( L * R ) m ! / ( 1 ! * ( m - 1 ) ! )
This formula is not considered the probability distribution problem of string token place value, thus inaccurate, but general description the influence of Several Parameters to R1.
Be provided with the title database of 3,000,000 records, the character string average length is 16, and with 31 bit marks, the used search key length of user is 4, then
R 1 = ( 16 * 3,000,000 ) 31 ! / ( 4 ! * ( 31 - 4 ) ! ) = 1526
As seen for general Chinese character words and phrases, title, place name, unit name, can carry out mark to character string effectively with 31 bit outside the sign bit among 32 bit of a lint-long integer.
Longer for the character string average length, record strip is counted the more data storehouse, in sql SEVER 2000, can adopt 63 bit of data type bigint to carry out mark, correspondingly, base character is divided into 63 groups, certainly for 32 bit processors, with bigint inevitable Wn and Wt carry out the position " with " whether computing and comparison Wg equate to use the more time with Wt, whether adopt should do to contrast and test.In fact, any data type of being convenient to carry out " or " and " with " computing of position in any database all can be used for the tab character string, and is better naturally if independent programming constructs does not have the exclusive data type of sign symbol position.
3. be the Chinese words and phrases database of double word symbol for the overwhelming majority, can consider with two bit to be that 1 basic place value is carried out mark, the branch Chinese character be 31! / (2! * (31-2)! ) group, promptly 465 groups.But thus, " place value " common 4 bit of double word symbol words and phrases are 1, can resolve to 4! / (2! * (4-2)! ), i.e. 6 Chinese characters.If the user is with a Chinese character index, the database of 100,000 words and phrases, then
R 1 = ( 2 * 100,000 ) * 6 465 = 2580
R 1 = ( 2 * 100,000 ) 31 = 6452
As seen adopting two bit is that 1 basic place value is carried out labeling properties and slightly improved, but this kind method restricted application.That is to say that when string length to be retrieved was 1 character, bit mark character string search method performance was inferior to the prime number replacing character string retrieving method.
4. base character is for Chinese Chinese character normally, certainly during Chinese character retrieval, can be basic compile other.For the Chinese phonetic alphabet, can be letter, initial consonant, simple or compound vowel of a Chinese syllable, syllable.For other Languages, can be letter, syllable, word etc.
5. the used data type of mark still should be considered the figure place of cpu except that considering software factors such as programming language, database.For 64 cpu, should pay the utmost attention to and adopt 64 bit to come the tab character string, to make full use of the performance of cpu, improve the dispersion of " place value ".
4. the grouping of base character if can realize the word frequency equalization, and then performance is optimum naturally, carries out modular arithmetic with grouping with Hanzi internal code, and relatively easy the realization is not optimum grouping.
Embodiment
The present invention has obtained good realization in database character string fuzzy searches such as Chinese vocabulary, phrase, phrase, title, make up database with sql SERVER2000 below, with vb6.0 is programming language, specify, the character string fuzzy search of other programming language and other database can be with reference to enforcement.
1. set up database
If database shuku has table biao, field shuming is wherein arranged, data type is nvarchar, length is 40.Other sets up field wei, and data type is " long ", and just 4 bytes have 32 bit, and wherein one is positive and negative numerical symbol, and all the other 31 bit can utilize.
2 utilize the inclusive-OR operation of position that the database character string is made " mark "
The long array of 31 elements of dim shuzu (30) As Long ' definition.Shuzu (0)=1 For x=1 To 30 shuzu (x)=2*shuzu (x-1) Next ' is to 31 element assignment of long array, from 1,2,4,8,16 to 1073741824, from scale-of-two, it is 1 that a bit is arranged, and all the other bit are 0, just " basic place value ".Dim biaostr As String when the basic place value Dim x As Integer biaors.MoveFirst of a character of place value Dim weizhilin As Long storage of the character string Dim weizhi As Long of pre-treatment storage character string<!--SIPO<DP n=" 6 "〉--〉<dp n=" d6 "/' first record Do weizhilin=0 weizhi=0 With biaors biaostr=.Fields (" shuming ") the End With ' that moves to database record set biaors reads in the character string of a record, invest string variable biaostr For x=1 To Len (biaostr) index=Abs (AscW (Mid (biaostr, x, 1)) Mod 31) ' from string variable biaostr, get a character, and with this character ISN, with 31 is that mould is done computing, take absolute value again, and invest index, just base character is divided into groups.Weizhilin=shuzu (index) ' invests weizhilin with array shuzu (index) value, is one of 1,2,4,8,16 to 1073741824.Weizhi=weizhi Or weizhilin ' is with " basic place value " the weizhilin value of a character and the inclusive-OR operation of weizhi work position.Next ' loop ends, " place value " weizhi With biaors .Fields (" wei ")=weizhi End With biaors.Update ' that obtains current string handles next record Loop While Not biaors.EOF with the field wei biaors.MoveNext ' that " place value " weizhi stores into current record
3. utilize the position " with " computing carries out the fuzzy search of database character string
Dim shuzu (30) As Long shuzu (0)=1 For x=1 To 30 shuzu (x)=2*shuzu (x-1) Next<!--SIPO<DP n=" 7 "〉--〉<dp n=" d7 "/' the long array of 31 elements of definition, assignment is finished consistent from 1,2,4,8,16 to 1073741824 with the array of " mark ".It is character string For x=1 To Len to be retrieved (biaostr) index=Abs (AscW (Mid (textstr that the place value Dim textstr As String ' of a character of place value Dim weizhilin As Long ' storage of a character string of Dim weizhi As Long ' storage stores current searching character string Dim xAs Integer weizhilin=0 weizhi=0 textstr=Text1.Text ' Text1.Text, x, 1)) Mod 31) weizhilin=shuzu (index) weizhi=weizhi Or weiztilin Next ' obtains " place value " weizhi of character string to be retrieved, and method is consistent with database character string " mark " method.StrQuery=" select*from (SELECT*FROM biao WHERE (wei﹠amp; Amp; " ﹠amp; Amp; Weizhi﹠amp; Amp; ")=" ﹠amp; Amp; Weizhi﹠amp; Amp; ") DERIVEDTBL WHERE (shuming like ' % " ﹠amp; Amp; Textstr﹠amp; Amp; " % ') " ' with " place value " work of each record of " place value " of character string to be retrieved and database " with " (and) computing, make preliminary search, make quadratic search in common character string fuzzy search mode again, obtain net result.This is the query statement of sql SERVER2000, and other database may be slightly different.Adodc 1.RecordSource=strQuery Adodc 1.Refresh ' execution retrieval DataList1.ListField=" shuming "<!--SIPO<DP n=" 8 "〉--〉<dp n=" d8 "/DataList1.ReFill ' shows current result for retrieval in list box.

Claims (10)

1. character string fuzzy search technology is characterized in that: the base character that will form character string is divided into the m group, and uses by the data W of m bit and come mark to form the base character information of character string.If the base character C1 of character string S belongs to the n group, then data W is labeled as 1 from n bit of right-to-left (or from left to right), similarly, according to other base character C2, C3, C4 ... affiliated group is carried out mark to data W, finish the data W behind whole base character marks, record the information of character string S, be called " place value " of character string S." place value " Wn of character string Sn and " place value " Wt of character string T to be retrieved are compared, if Wn equals or comprises Wt, then character string Sn may equal or comprise character string T, thereby realizes the fuzzy search of character string.
2. in accordance with the method for claim 1, it is characterized in that: it is 1 and all the other bit are 0 " place value substantially " that mark can be earlier invests corresponding n bit to each group base character, to " the basic place value " of whole base characters of a character string carry out the position " or " (or) computing, obtain " place value " of a character string.
3. in accordance with the method for claim 1, it is characterized in that: relatively whether two " place values " have relation of inclusion, available position " with " (and) computing carry out.To " place value " Wn of character string Sn and character string T " place value,, Wt carry out the position " with " (and) computing, the result is called Wg, if Wg equals Wt, then Wn equals or comprises Wt.
4. in accordance with the method for claim 1, it is characterized in that: because different character strings has identical " place value ", obtain PRELIMINARY RESULTS R1 so utilize " place value " computing that data-base recording is screened, again with common character by turn manner of comparison make quadratic search, draw final result for retrieval R2.
5. in accordance with the method for claim 1, it is characterized in that:, can carry out mark to character string effectively with 31 bit outside the sign bit among 32 bit of a lint-long integer for general words and phrases, phrase, proper noun database.For the bigger database of character string average length, in sql SEVER 2000, can carry out mark with 63 bit of data type bigint, correspondingly, base character then should be divided into 63 groups.Any data type that can carry out " or " and " with " computing of position in any database all can be used for the tab character string, and the exclusive data type of not having the sign symbol position as independent programming constructs is then better.
6. according to claim 1 and 5 described methods, it is characterized in that: the figure place of the used bit of mark, except that considering the character string average length, should consider the bar number of current database record simultaneously, mark is carried out in record strip number database application how more multidigit bit, correspondingly, base character then should be divided into more groups.
7. according to claim 1,5 and 6 described methods, it is characterized in that: the used data type of mark still should be considered the figure place of cpu except that considering software factors such as programming language, database.For 64 cpu, should pay the utmost attention to and adopt 64 bit to come the tab character string, to make full use of the performance of cpu, improve the dispersion of " place value ".
8. in accordance with the method for claim 1, it is characterized in that: base character is for Chinese Chinese character normally, during Chinese character retrieval, can be basic compile other; For the Chinese phonetic alphabet, can be letter, initial consonant, simple or compound vowel of a Chinese syllable, syllable; For other Languages, can be letter, syllable, word etc.
9. in accordance with the method for claim 1, it is characterized in that: base character is divided into groups, the base character number of each group needn't equate, should make every effort in this language or current database, respectively organize base character word frequency sum and be tending towards balanced, especially the high frequency base character answers equilibrium to be allocated in each group, so that best performance.
10. in accordance with the method for claim 1, it is characterized in that: for the Chinese words and phrases database of the overwhelming majority for double word symbol, available two bit are that 1 basic place value is carried out mark, divide Chinese character be 31~/ (2! * (31-2)! ) group, promptly 465 groups, to improve performance.
CNA2005100233835A 2005-01-17 2005-01-17 Digit marking character string searching technology Pending CN1645374A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CNA2005100233835A CN1645374A (en) 2005-01-17 2005-01-17 Digit marking character string searching technology
CN200510057491.4A CN101488127B (en) 2005-01-17 2005-09-13 Bit mark character string fuzzy retrieval method for grouping character and labellng with bit
PCT/CN2005/001642 WO2006074586A1 (en) 2005-01-17 2005-10-08 Retrieval technology of character string marked with bit

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNA2005100233835A CN1645374A (en) 2005-01-17 2005-01-17 Digit marking character string searching technology

Publications (1)

Publication Number Publication Date
CN1645374A true CN1645374A (en) 2005-07-27

Family

ID=34875846

Family Applications (2)

Application Number Title Priority Date Filing Date
CNA2005100233835A Pending CN1645374A (en) 2005-01-17 2005-01-17 Digit marking character string searching technology
CN200510057491.4A Active CN101488127B (en) 2005-01-17 2005-09-13 Bit mark character string fuzzy retrieval method for grouping character and labellng with bit

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN200510057491.4A Active CN101488127B (en) 2005-01-17 2005-09-13 Bit mark character string fuzzy retrieval method for grouping character and labellng with bit

Country Status (2)

Country Link
CN (2) CN1645374A (en)
WO (1) WO2006074586A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010088833A1 (en) * 2009-02-03 2010-08-12 华为技术有限公司 Character string processing method and system and matcher
CN101535993B (en) * 2006-10-30 2011-11-09 新叶股份有限公司 Bit sequence searching method and device

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682033A (en) * 2011-03-17 2012-09-19 环达电脑(上海)有限公司 Method for querying words by matching binary characteristic values
CN103870537B (en) * 2013-12-03 2017-02-01 山东金质信息技术有限公司 Intelligent word segmentation method for standard retrieval
CN106933938A (en) * 2015-12-30 2017-07-07 唯溥思株式会社 The document retrieval method and literature index method encoded using multibyte

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2669601B2 (en) * 1994-11-22 1997-10-29 インターナショナル・ビジネス・マシーンズ・コーポレイション Information retrieval method and system
JP3636941B2 (en) * 1999-07-19 2005-04-06 松下電器産業株式会社 Information retrieval method and information retrieval apparatus
JP4298138B2 (en) * 2000-06-21 2009-07-15 株式会社日立製作所 Information retrieval method, apparatus for implementing the same, and recording medium recording the processing program
US6785677B1 (en) * 2001-05-02 2004-08-31 Unisys Corporation Method for execution of query to search strings of characters that match pattern with a target string utilizing bit vector
JP2003152548A (en) * 2001-11-14 2003-05-23 Canon Inc Retrieving method of character string in data compression

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101535993B (en) * 2006-10-30 2011-11-09 新叶股份有限公司 Bit sequence searching method and device
WO2010088833A1 (en) * 2009-02-03 2010-08-12 华为技术有限公司 Character string processing method and system and matcher

Also Published As

Publication number Publication date
CN101488127A (en) 2009-07-22
CN101488127B (en) 2015-01-07
WO2006074586A1 (en) 2006-07-20

Similar Documents

Publication Publication Date Title
US11275740B2 (en) Efficient use of trie data structure in databases
US7516125B2 (en) Processor for fast contextual searching
US7512596B2 (en) Processor for fast phrase searching
CN103365992B (en) Method for realizing dictionary search of Trie tree based on one-dimensional linear space
CN108509505B (en) Character string retrieval method and device based on partition double-array Trie
US10984029B2 (en) Multi-level directory tree with fixed superblock and block sizes for select operations on bit vectors
US10417208B2 (en) Constant range minimum query
Meurer Corpuscle–a new corpus management platform for annotated corpora
Arroyuelo et al. Space-efficient construction of Lempel–Ziv compressed text indexes
CN1645374A (en) Digit marking character string searching technology
Bannai et al. Computing all distinct squares in linear time for integer alphabets
Boucher et al. Computing the original eBWT faster, simpler, and with less memory
Puntambekar Data structures
CN109885641B (en) Method and system for searching Chinese full text in database
Lewenstein et al. Space-efficient string indexing for wildcard pattern matching
Kärkkäinen et al. Full-text indexes in external memory
Barsky et al. Full-text (substring) indexes in external memory
CN113420564B (en) Hybrid matching-based electric power nameplate semantic structuring method and system
Dinklage Translating between wavelet tree and wavelet matrix construction
Li et al. Study on efficiency of full-text retrieval based on lucene
CN102184165A (en) LCS (Longest Common Subsequence) algorithm for saving memory
Chan et al. Faster query algorithms for the text fingerprinting problem
Katajainen et al. A compact data structure for representing a dynamic multiset
Robenek et al. Ternary Tree Optimalization for n-gram Indexing.
Andreica et al. Practical Algorithmic Techniques for Several String Processing Problems

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication