CN106598951A - Dependency structure treebank acquisition method and system - Google Patents
Dependency structure treebank acquisition method and system Download PDFInfo
- Publication number
- CN106598951A CN106598951A CN201611208593.6A CN201611208593A CN106598951A CN 106598951 A CN106598951 A CN 106598951A CN 201611208593 A CN201611208593 A CN 201611208593A CN 106598951 A CN106598951 A CN 106598951A
- Authority
- CN
- China
- Prior art keywords
- treebank
- phrase
- converted
- speech
- dependence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a dependency structure treebank acquisition method and system. The method comprises the steps of calling a first treebank and converting phrase structures in the first treebank into dependency structures by adopting a conversion tool of the first treebank; converting phrase structures of flat structures in the first treebank into dependency structures by utilizing a syntactic analyzer; and performing dependency relationship conversion on the dependency structures in the first treebank by utilizing a dependency relationship mapping model obtained by training to obtain a dependency structure treebank of a second treebank type. According to the method and the system, the treebank after the conversion can be combined with the original dependency structure treebank, so that the treebank scale is increased and the performance of the syntactic analyzer is improved.
Description
Technical field
The present invention relates to treebank conversion, espespecially a kind of dependency structure treebank acquisition methods and system.
Background technology
Syntactic analysis is the very important research direction of natural language processing field.In the syntactic analysis method based on statistics
In, different according to the language material for being used, can be divided into has the method and unsupervised approach of guidance.The method for having guidance needs thing
First according to certain syntax gauge, better sentence is manually marked as training data, then by various probabilistic methods
Or machine learning method, the knowledge required for syntactic analysis is obtained from training data.Unsupervised approach is then using without Jing
The data for crossing mark are trained, and according to certain mechanism, therefrom learn grammar rule automatically.
The syntactic analysis for having guidance is present main stream approach, has reached higher standard in the language such as English at present
True rate.In having the syntactic analysis of guidance, in advance the sentence collection for training of mark is called treebank.Current most of system
Meter syntactic analysis model is all to have guidance learning mode come the parameter of training pattern using the treebank for having marked.Therefore, treebank
Construction is a very important job, and its quality and scale are directly connected to the training effect of syntactic analysis.
Syntactic analysis first has to follow a certain grammer system, and according to the grammer of the grammer system expression shape of syntax tree is determined
Formula.At present, more widely there are phrase structure grammar and dependency grammar used in syntactic analysis.For example:" Siemens will in this year
Make great efforts the three gorges project construction for participating in China." its phrase structure analysis result such as Fig. 1 a, it is analogous to the knot of fractionation layer by layer set
Structure.
The first order is that " S " refers to that " this year, Siemens will strive to participate in the three gorges project construction of China whole sentence.”.Second
Level is divided into four parts, and the Part I " NP " of the second level refers to noun phrase, correspondence " this year ";The Part II of the second level
" NP " refers to noun phrase, correspondence " Siemens ";The Part III " VP " of the second level refers to verb phrase, and correspondence " will strive to ginseng
With the three gorges project construction of China ";The Part IV " PU " of the second level is index point symbol, correspondence ".”.The third level is divided into three
Part, the Part I " ADVP " of the third level is finger-like language phrase, correspondence " general ";The Part II " ADVP " of the third level is finger-like
Language phrase, correspondence " effort ";The Part III " VP " of the third level refers to verb phrase, and correspondence " participates in the Three Gorges of China
If ".The fourth stage is divided into two parts, and the Part I " VV " of the fourth stage refers to verb, correspondence " participation ";Second of the fourth stage
" NP " is divided to refer to noun phrase, correspondence " three gorges project construction of China ".Level V is divided into three parts, first of level V
" DNP " is divided to specify language phrase, correspondence " China ";The Part II " NP " of level V is named language phrase, correspondence " Three Gorges work
Journey ";The Part III " NP " of level V refers to noun phrase, correspondence " construction ".6th grade is divided into four parts, and the of the 6th grade
A part of " NP " specifies language phrase, correspondence " China ";6th grade of Part II " DEG " is auxiliary word phrase, correspondence " ";The
Six grades of Part III " NP " specifies language phrase, correspondence " Three Gorges ";6th grade of Part IV " NP " specifies language phrase, right
Answer " engineering ".
Using dependency structure analysis " three gorges project construction of China ", as a result such as Fig. 1 b.Dependency structure is using band direction
Camber line mark out relation between each word.The analytical structure of dependency structure is more directly perceived than the analytical structure of phrase structure.
" this year, Siemens will strive to participate in the three gorges project construction of China." core node " VG " correspondence " participation ", it is " modern
Year ", " will " and " effort " be all " ADV " i.e. adverbial modifier's relation of " participation ", " Siemens " and " participation " is " SBV " relation i.e. subject-predicate
Relation, " China " with " " be " ATT " relation i.e. attribute relation, " Three Gorges " and " engineering " are " ATT " relation i.e. attribute relations,
" engineering " and " construction " is " ATT " relation i.e. attribute relation.“." after " EOS " i.e. empty node represent and terminate.
How the dependency structure shown in Fig. 1 b is converted to using the phrase structure analysis result shown in Fig. 1 a, is that this area needs
The technical problem to be solved.
The development of English syntactic analysis has benefited from the foundation of Penn Treebank (Penn treebank), Penn Treebank's
Scale is big, and mark quality is high, it has also become English syntactic analysis de facto standards, almost all of research work is all based on the tree
Storehouse is carried out.Meanwhile, the work that Penn Treebank are converted to into dependency structure is also ripe.Chinese aspect is reviewed, treebank is built
If work also has gap, both lack unified interdependent mark system, also lack large-scale interdependent treebank.Existing Chinese phrase
Structure treebank is foremost Chinese treebank PCT (Penn Chinese Treebank), TCT (Tsing-Hua University of the University of Pennsylvania
The Chinese treebank of university) etc..And the interdependent treebank of Chinese then compares less, famous has HIT-IR-CDT (Harbin Institute of Technology's Chinese
Interdependent treebank), the SDN treebank of mark (Department of Electronics of Tsing-Hua University).HIT-IR-CDT is that Harbin Institute of Technology's information retrieval is ground
Study carefully the interdependent treebank of Chinese of room mark.
The technology that Penn Treebank are converted to into dependency structure is very ripe.For corresponding to English dependency grammar,
The work that Chinese (Chinese) phrase structure treebank is converted to dependency structure is also very immature.In existing Penn2Malt crossover tools
The rule file of dependency structure is converted to there is provided Penn Chinese Treebank, can be by Penn Chinese
Treebank is converted to dependency structure.The rule that the Chinese structure transformation rule file that crossover tool Penn2Malt is provided is included
Various language phenomenons cannot be accurately described, has no ability to process in coordination, and Penn Chinese Treebank
Flat structure.
It is existing that TCT is converted to into dependency structure, completely using the method for rule.So require to the grammer body in TCT
It is very familiar, line discipline conversion, including specified core node, specified relationship type is then entered to a kind of stipulations form.It is this
The way that TCT is converted to dependency structure is not had into good versatility, needs to put into relatively large manpower.And, its is interdependent
System focuses primarily upon the description with the related various relation compositions of verb.
Above-mentioned work, is all that the treebank of phrase structure is converted to into certain interdependent treebank.Interdependent treebank after conversion
System and any existing interdependent treebank are all inconsistent, the treebank being so unfavorable for after effectively utilizes conversion.Can only be after conversion
Treebank as independent treebank, then use.
The scale and quality of treebank directly affect the performance of syntactic analysis, and treebank scale is bigger, and quality is better, train and
Parser performance it is necessarily better.Therefore, how Chinese phrase structure treebank is converted to into dependency structure treebank, fully
Treebank scale using Chinese phrase structure treebank and dependency structure treebank is big, and the measured advantage of matter, is those skilled in the art
The technical problem of urgent need to resolve.
The content of the invention
The skimble-scamble problem of system of the interdependent treebank after in order to solve existing conversion, the present invention provides a kind of dependency structure
Treebank acquisition methods and system, by phrase structure treebank dependency structure treebank is converted to, and the treebank after conversion can be very easily
Merge with original dependency structure treebank, so as to increase treebank scale, and then effectively improve the performance of parser.
To solve the above problems, the present invention provides a kind of dependency structure treebank acquisition methods, comprises the following steps:
Call the first treebank;First treebank is Chinese phrase structure treebank;
The crossover tool and parser of the first treebank are respectively adopted, the phrase structure in first treebank is turned
It is changed to dependency structure;Second treebank is the treebank of dependency structure;
Wherein, the phrase structure in first treebank is converted to by dependency structure bag using the crossover tool of the first treebank
Include:The rule that the phrase structure in the first treebank is converted to dependency structure provided using the crossover tool, or to institute
Rule obtained by stating after rule is modified, by the phrase structure dependency structure is converted to;And, rule-based method
Concluded, the phrase structure of the parallel construction in first treebank is converted to into dependency structure;
Wherein, using parser, the phrase structure in first treebank is converted to into dependency structure includes:Utilize
The parser, by the phrase structure of the flat structure in first treebank dependency structure is converted to;
The dependence mapping model obtained using training, to the dependency structure in first treebank dependence is carried out
Conversion, obtains the dependency structure treebank of the second treebank type.
Optionally, it is described that the phrase structure in the first treebank is converted to into interdependent knot using what the crossover tool was provided
The rule of structure, or rule resulting after being modified to the rule, by the phrase structure dependency structure is converted to, and is wrapped
Include:According to the Head core node mapping tables for pre-building, grammar inference in the phrase structure treebank of first treebank is determined
Core node;
Using the mapping table, and according to the rule in the mapping table, it is scanned for the core node, obtains
The dependence of other child nodes and the core node;
Wherein, the Head core nodes mapping table be according to the crossover tool provided will be short in the first treebank
Language structure is converted to the rule of dependency structure, or is modified what rear resulting rule was formed to the rule.
Optionally, the utilization parser, the phrase structure of the flat structure in first treebank is converted to
Dependency structure, specifically includes:
Using the parser, the phrase structure to the flat structure in first treebank, seek in digraph
Maximum spanning tree is looked for, the interdependent probability of different phrases in the phrase structure of the flat structure is determined;
The phrase structure of the flat structure in first treebank is converted to according to the interdependent probability of the different phrases
Dependency structure.
Optionally, the parser is trained using the phrase in second treebank.
Optionally, the method is further included:The phrase structure for obtaining the flat structure is converted to turning for dependency structure
Accuracy rate is changed, according to the accuracy rate, training is adjusted to the parser.
Optionally, using Internet resources, the probability of occurrence of the dependency structure after search and statistics conversion, according to institute
State conversion accuracy described in determine the probability.
Optionally, the rule-based method is concluded, and the phrase of the parallel construction in first treebank is tied
Structure is converted to dependency structure, specifically includes:
It is multiple fragments by the phrase structure cutting of the parallel construction;
The core node of each fragment is determined respectively, and, other nodes in each fragment in addition to core node are true
It is set to the core node depended in the fragment;
By each core node of other fragments in addition to first fragment, it is defined as depending on first fragment
Core node.
Optionally, the phrase structure cutting by the parallel construction is multiple fragments, is specifically included:
The cutting is carried out as cutting foundation using conjunction part of speech or pause mark.
Optionally, the phrase structure cutting by the parallel construction is multiple fragments, is specifically included:
Input method input condition is obtained, described cutting is carried out as cutting foundation with the input interruption in input method input condition
Point.
Optionally, the phrase structure cutting by the parallel construction is multiple fragments, is specifically included:
When the different phrases in the phrase structure of the parallel construction have incidence relation, using the incidence relation as
Cutting foundation carries out the cutting.
Optionally, the core node for determining each fragment includes:Using phrase structure place sentence as analysis
Object, determines the occurrence number in the sentence context of each node of the fragment, and according to different nodes occurrence is gone out
Several comparable situations, determines that occurrence number meets desired node as the core node.
Optionally, the foundation of the dependence mapping model includes:
Dependence marking model is trained using second treebank;
Dependence mark is carried out to first treebank using the dependence marking model;
Using original part of speech and syntactic information of first treebank, the result of the dependence mark is corrected, set up
The dependence mapping model.
Optionally, the dependence marking model uses the second linear-logarithmic model
Carry out dependence mark;
Wherein, i=0, correspondence word word_f words, father's word feature,
I=1, correspondence word pos_f words, father node part of speech feature,
I=2, correspondence pos word_f part of speech feature,
I=3, correspondence pos pos_f distance father node part of speech feature,
λ0:The weights of word word_f features during correspondence i=0;
λ1:The weights of word pos_f features during correspondence i=1;
λ2:The weights of pos word_f features during correspondence i=2;
λ3:The weights of pos pos_f distance features during correspondence i=3.
Optionally, the dependence mapping model uses third linear logarithmic model
Carry out dependence mark;
Wherein, i=0, correspondence phrase itself phrase type features,
I=1, correspondence phrase_s generates itself phrase type feature,
I=2, correspondence father's phrase_f phrase type feature,
λ0:The weights of phrase features during correspondence i=0;
λ1:The weights of phrase_s features during correspondence i=1;
λ2:The weights of phrase_f features during correspondence i=2.
Optionally, the method is further included:
Part-of-speech tagging collection in first treebank is converted into the mark collection for meeting Chinese Industrial Standards (CIS) part-of-speech tagging collection requirement.
Optionally, the Chinese Industrial Standards (CIS) part-of-speech tagging collection is 863 part-of-speech tagging collection.
Optionally, the part-of-speech tagging collection by first treebank be converted into meet Chinese Industrial Standards (CIS) part-of-speech tagging collection will
The mark collection asked, including:
Part-of-speech tagging is carried out to the word of the first treebank using second treebank, and using the part of speech mapping for pre-building
Model carries out part of speech division, corrects the part of speech of the mark.
Optionally, the part of speech mapping model uses the first linear logarithmic model:
Carry out part of speech conversion;
Wherein, i=0, correspondence pos itself part of speech feature,
I=1, correspondence pos_s pos child node parts of speech, itself part of speech feature,
I=2, correspondence pos pos_f itself part of speech feature, father node part of speech,
λ0:The weights of pos features during correspondence i=0;
λ1:The weights of pos_s pos features during correspondence i=1;
λ2:The weights of pos pos_f features during correspondence i=2.
Optionally, first treebank is Penn Chinese TreeBank Universities of Pennsylvania Chinese treebank, described
Second treebank is the interdependent treebank of HIT-IR-CDT Harbin Institute of Technologys Chinese.
The present invention also provides a kind of dependency structure treebank and obtains system, including call unit and converting unit:
The call unit, for calling the first treebank;First treebank is Chinese phrase structure treebank;
The converting unit, for the crossover tool and parser of the first treebank to be respectively adopted, by described first
Phrase structure in treebank is converted to dependency structure;Second treebank is the treebank of dependency structure;
Wherein, the phrase structure in first treebank is converted to by dependency structure bag using the crossover tool of the first treebank
Include:The rule that the phrase structure in the first treebank is converted to dependency structure provided using the crossover tool, or to institute
Rule obtained by stating after rule is modified, by the phrase structure dependency structure is converted to;And, rule-based method
Concluded, the phrase structure of the parallel construction in first treebank is converted to into dependency structure;
Wherein, using parser, the phrase structure in first treebank is converted to into dependency structure includes:Utilize
The parser, by the phrase structure of the flat structure in first treebank dependency structure is converted to;
The converting unit is additionally operable to using the dependence mapping model that obtains of training, to first treebank according to
Depositing structure carries out dependence conversion, obtains the dependency structure treebank of the second treebank type.
Optionally, the converting unit specifically includes determination subelement and scanning subelement:
The determination subelement, for according to the Head core node mapping tables for pre-building, determining first treebank
Phrase structure treebank in grammar inference core node;
The scanning subelement, for using the mapping table, and according to the rule in the mapping table, for the core
Heart node is scanned, and obtains the dependence of other child nodes and the core node;
Wherein, the Head core nodes mapping table be according to the crossover tool provided will be short in the first treebank
Language structure is converted to the rule of dependency structure, or is modified what rear resulting rule was formed to the rule.
Optionally, the converting unit is specifically for using the parser, to first treebank in it is flat
The phrase structure of structure, finds maximum spanning tree in digraph, determines different phrases in the phrase structure of the flat structure
Interdependent probability;The phrase structure of the flat structure in first treebank is changed according to the interdependent probability of the different phrases
For dependency structure.
Optionally, the system further includes parser training unit, for short in using second treebank
Language is trained to the parser.
Optionally, the system further includes adjustment unit, and the phrase structure for obtaining the flat structure is converted to
The conversion accuracy of dependency structure, according to the accuracy rate, to the parser training is adjusted.
Optionally, the adjustment unit, it is described interdependent after search and statistics conversion specifically for utilizing Internet resources
The probability of occurrence of structure, according to conversion accuracy described in the determine the probability.
Optionally, the converting unit specifically includes cutting subelement and interdependent determination subelement,
The cutting subelement, for by the phrase structure cutting of the parallel construction be multiple fragments;
The interdependent determination subelement, for determining the core node of each fragment respectively, and, will remove in each fragment
Other nodes outside core node are defined as depending on the core node in the fragment;
The interdependent determination subelement, is additionally operable to each core node of other fragments in addition to first fragment,
It is defined as depending on the core node of first fragment.
Optionally, the cutting subelement, for the phrase structure of the parallel construction to be made with conjunction part of speech or pause mark
The cutting is carried out for cutting foundation.
Optionally, the cutting subelement, for obtaining input method input condition, with the input in input method input condition
Be interrupted carries out the cutting for cutting foundation.
Optionally, the cutting subelement, for closing when the different phrases in the phrase structure of the parallel construction have
During connection relation, the cutting is carried out as cutting foundation using the incidence relation.
Optionally, the interdependent determination subelement, for using phrase structure place sentence as analysis object, it is determined that
The occurrence number in the sentence context of each node of the fragment, according to the comparison feelings of different node occurrence numbers
Condition, determines that occurrence number meets desired node as the core node.
Optionally, according to the foundation of dependence mapping model, the system also includes training unit, mark unit and correction
Unit:
The training unit, for training dependence marking model using second treebank;
The mark unit, for carrying out dependence mark to first treebank using the dependence marking model
Note;
The correction unit, for using original part of speech and syntactic information of first treebank, correcting the interdependent pass
The result of system's mark, sets up the dependence mapping model.
Optionally, the dependence marking model uses the second linear-logarithmic model
Carry out dependence mark;
Wherein, i=0, correspondence word word_f words, father's word feature,
I=1, correspondence word pos_f words, father node part of speech feature,
I=2, correspondence pos word_f part of speech feature,
I=3, correspondence pos pos_f distance father node part of speech feature,
λ0:The weights of word word_f features during correspondence i=0;
λ1:The weights of word pos_f features during correspondence i=1;
λ2:The weights of pos word_f features during correspondence i=2;
λ3:The weights of pos pos_f distance features during correspondence i=3.
Optionally, the dependence mapping model uses third linear logarithmic model
Carry out dependence mark;
Wherein, i=0, correspondence phrase itself phrase type features,
I=1, correspondence phrase_s generates itself phrase type feature,
I=2, correspondence father's phrase_f phrase type feature,
λ0:The weights of phrase features during correspondence i=0;
λ1:The weights of phrase_s features during correspondence i=1;
λ2:The weights of phrase_f features during correspondence i=2.
Optionally, the system further includes conversion unit:
The conversion unit, for the part-of-speech tagging collection in first treebank to be converted into Chinese Industrial Standards (CIS) part of speech mark is met
The mark collection that note collection is required.
Optionally, the Chinese Industrial Standards (CIS) part-of-speech tagging collection is 863 part-of-speech tagging collection.
Optionally, the conversion unit to the word of the first treebank using second treebank specifically for carrying out part of speech mark
Note, and part of speech division is carried out using the part of speech mapping model for pre-building, correct the part of speech of the mark.
Optionally, the part of speech mapping model uses the first linear logarithmic model:
Carry out part of speech conversion;
Wherein, i=0, correspondence pos itself part of speech feature,
I=1, correspondence pos_s pos child node parts of speech, itself part of speech feature,
I=2, correspondence pos pos_f itself part of speech feature, father node part of speech,
λ0:The weights of pos features during correspondence i=0;
λ1:The weights of pos_s pos features during correspondence i=1;
λ2:The weights of pos pos_f features during correspondence i=2.
Optionally, first treebank is Penn Chinese TreeBank Universities of Pennsylvania Chinese treebank, described
Second treebank is the interdependent treebank of HIT-IR-CDT Harbin Institute of Technologys Chinese.
Compared with above-mentioned prior art, dependency structure treebank acquisition methods described in the embodiment of the present invention are included the first treebank
The step of dependency structure treebank of the second treebank type being converted to such as Chinese phrase structure treebank.It is interdependent described in the embodiment of the present invention
Chinese phrase structure treebank is converted to dependency structure treebank by structure treebank acquisition methods, and so, the treebank after conversion can be very
Convenient and original dependency structure treebank is merged, and so as to increase treebank scale, and then effectively improves parser
Performance.
Meanwhile, dependency structure treebank acquisition methods described in the embodiment of the present invention are included using parser to the first treebank
In the phrase structure of flat structure the step of be converted to dependency structure, solve the phrase of the flat structures such as noun compounded phrase
Structure is converted to the difficult problem of dependency structure.
Description of the drawings
Fig. 1 a are prior art phrase structure analysis result figures;
Fig. 1 b are prior art dependency structure analysis result figures;
Fig. 2 is dependency structure treebank acquisition methods first embodiment flow chart of the present invention;
Fig. 3 is the Establishing process figure of dependence mapping model of the present invention;
Fig. 4 a are flat phrase structure schematic diagrames of the present invention;
Fig. 4 b are the schematic diagrames that flat phrase structure described in Fig. 4 a is converted to dependency structure;
Fig. 5 is the flow chart that the phrase structure of parallel construction of the present invention is converted to dependency structure method;
Fig. 6 is the schematic diagram that the phrase structure of parallel construction of the present invention is converted to dependency structure;
Fig. 7 is dependency structure treebank acquisition methods second embodiment flow chart of the present invention;
Fig. 8 is dependence schematic diagram of the present invention;
Fig. 9 is that dependency structure treebank of the present invention obtains system first embodiment structure chart;
Figure 10 is that dependency structure treebank of the present invention obtains system second embodiment structure chart.
Specific embodiment
The present invention provides a kind of dependency structure treebank acquisition methods, and the first treebank such as Chinese phrase structure treebank is converted to
The dependency structure treebank of the second treebank type, the dependency structure treebank after conversion can very easily with original dependency structure tree
Storehouse merges, and so as to increase treebank scale, and then effectively improves the performance of parser.
It is dependency structure treebank acquisition methods first embodiment flow chart of the present invention referring to Fig. 2 and Fig. 3, Fig. 2;Fig. 3
It is the Establishing process figure of dependence mapping model of the present invention.
Dependency structure treebank acquisition methods described in first embodiment of the invention, as shown in Fig. 2 comprising the following steps:
S201, call the first treebank.
First treebank can be Chinese phrase structure treebank, for example, Penn Chinese Treebank, TCT etc..
S202, crossover tool and parser that the first treebank is respectively adopted, by the phrase in first treebank
Structure is converted to dependency structure.
Second treebank can be the treebank of dependency structure, for example, HIT-IR-CDT, SDN etc..
In embodiments of the present invention, first treebank can be Penn Chinese Treebank, second treebank
Can be HIT-IR-CDT.
Wherein, the phrase structure in first treebank is converted to by dependency structure bag using the crossover tool of the first treebank
Include:The rule that the phrase structure in the first treebank is converted to dependency structure provided using the crossover tool, or to institute
Rule obtained by stating after rule is modified, by the phrase structure dependency structure is converted to;And, rule-based method
Concluded, the phrase structure of the parallel construction in first treebank is converted to into dependency structure.
Next the crossover tool using the first treebank will be converted to the phrase structure in first treebank interdependent
The concrete operations of structure launch to introduce.Specifically, it is described using the crossover tool provided by the phrase in the first treebank
Structure is converted to the rule of dependency structure, or rule resulting after being modified to the rule, and the phrase structure is turned
Dependency structure is changed to, including:
According to the Head core node mapping tables for pre-building, grammer in the phrase structure treebank of first treebank is determined
The core node of derivation;
Using the mapping table, and according to the rule in the mapping table, it is scanned for the core node, obtains
The dependence of other child nodes and the core node;
Wherein, the Head core nodes mapping table be according to the crossover tool provided will be short in the first treebank
Language structure is converted to the rule of dependency structure, or is modified what rear resulting rule was formed to the rule.
For follow-up convenient introduction, next launch in the content introduced with the first treebank as PennChinese
Treebank, the second treebank is to be introduced as a example by HIT-IR-CDT.
By observing all of grammar inference in Penn Chinese Treebank, Penn2Malt is provided
Rule file is corrected, and forms Head mapping tables, and then the structure such as side by side is processed, most Penn Chinese at last
Treebank phrase structures are converted to the dependency structure for meeting HIT-IR-CDT systems.
The phrase structure of Penn Chinese Treebank is converted to into dependency structure using Head mapping tables.
Table 1:Head mapping tables
Head mapping tables are used to determine the core node in a grammar inference.Determine child node sequence using Head mapping tables
Which is (Head) core node of father node in row.Each phrase type corresponds to a rule set in upper table.Penn
Chinese Treebank treebank phrase structure applications these rules are changed.Each rule comprising two aspect, direction and
Core phrase type.Direction is r or l.R represents scanning child node sequence from right to left, and l represents from left to right scanning child node sequence
Row.
For example, there is the grammar inference of a phrase structure in Penn Chinese Treebank treebanks:NP==>
ADJP DNP NN NN."==>" direction is represented, "==>" left side NP be father node, ADJP DNP NN NN be child node
Sequence.
NN is numbered to distinguish, by NP==>ADJP DNP NN NN are labeled as NP==>ADJPDNP NN(1)
NN(2).Referring to table 1Head mapping tables, determine that the corresponding rule sets of NP are:
Rule 1 is primarily looked at, the direction of rule 1 is r.
Pre- core node sequence is scanned from right to left, finds first pre- core node NP without in child node sequence
" ADJP DNP NN (1) NN (2) " occurs.Pre- core node sequence is rescaned from right to left in continuation, finds second pre- core
Node NN is occurred in child node sequence " ADJP DNP NN (1) NN (2) ", due to being to scan from right to left, therefore is found first
NN (2), it is determined that NN (2) is core node, is exited.Determine that other child nodes " ADJP DNP NN (1) " all depend on core section
Point NN (2).
Last rule is default rule.If rule above does not all meet, using default rule.Now such as
Really last rule is r, then rightmost child node is used as core node.If now last rule is l, most left
The child node on side is used as core node.
Penn Chinese Treebank treebank phrase structures thus can be determined according to table 1Head mapping tables
Dependence.
Conversion is carried out using above-mentioned rule be applied to common phrase structure, but in the required phrase structure changed
In may there is the phrase structure of flat structure, for the phrase structure of flat structure, possibly cannot be real using above-mentioned rule
Now the phrase structure of flat structure is converted to into dependency structure.
In embodiments of the present invention, for the phrase structure of flat structure can utilize parser, by described first
The phrase structure of the flat structure in treebank is converted to dependency structure.
Next the phrase structure in first treebank will be converted to into the tool of dependency structure to adopting parser
Gymnastics is made to launch to introduce.Specifically, it is possible to use the parser, to the phrase of the flat structure in first treebank
Structure, finds maximum spanning tree in digraph, determines the interdependent probability of different phrases in the phrase structure of the flat structure;
The phrase structure of the flat structure in first treebank is converted to by dependency structure according to the interdependent probability of the different phrases.
Interdependent probability can reflect the dependence of different phrases, and interdependent probability can be specific numerical value, different short
The interdependent probability of language is higher, illustrates that the dependence of different phrases is better.Can be by predetermined threshold value, above or equal to default
Phrase structure corresponding to the interdependent probability of threshold value is converted to dependency structure.
By taking two different phrases in the phrase structure of flat structure as an example, if the interdependent probability of the two phrases is more than
Or equal to predetermined threshold value, illustrate that the two phrases have preferable dependence, i.e., the dependency structure tool between the two phrases
There is higher reference value, being carried out can be used as in the dependency structure treebank of the second treebank type after dependence conversion
Dependency structure, therefore interdependent knot can be converted to above or equal to the phrase structure corresponding to the interdependent probability of predetermined threshold value
Structure;If the interdependent probability of the two phrases be less than predetermined threshold value, illustrate that the dependence of the two phrases is weaker, i.e., the two
Dependency structure between phrase does not have higher reference value, therefore without the need for setting up dependency structure between the two phrases,
The phrase structure corresponding to the interdependent probability of predetermined threshold value need not be will be less than and be converted to dependency structure.Above-mentioned transfer process is main
Carried out using parser, if can be trained to parser by the phrase in the second treebank, then logical
Interdependent knot when crossing parser the phrase structure of the flat structure in the first treebank being converted to into dependency structure, then after changing
Structure can more press close to the dependency structure of the second treebank type, so, in embodiments of the present invention, it is possible to use in the second treebank
Phrase is trained to the parser.
The phrase structure of the flat structure in the first treebank is converted to by parser for the accuracy rate of dependency structure
May be unable to reach very, that is, the dependency structure being converted to is possible and not all correct, in order to further improve syntax point
The conversion accuracy of parser, can by obtain the phrase structure of the flat structure be converted to dependency structure conversion it is accurate
Rate, according to the accuracy rate, to the parser training is adjusted.
Conversion accuracy can be used to indicate that the correct probability of the dependency structure being converted to, the calculating tool of conversion accuracy
Body can utilize Internet resources, the probability of occurrence of the dependency structure after search and statistics conversion, according to the probability
Determine the conversion accuracy.
The higher explanation conversion accuracy of probability of occurrence of dependency structure is higher, can select appearance by default value
, higher than the dependency structure of default value, the accuracy rate of the dependency structure selected according to the default value can reach will for probability
Ask, that is, the dependency structure selected has higher reference value.Therefore can utilize corresponding to the dependency structure selected
Phrase is adjusted training to the parser.By the adjusting training to parser, sentence can be further improved
The phrase structure of flat structure is converted to method analyzer the conversion accuracy of dependency structure, improves the performance of parser.
The content of introduction developed below with the first treebank as Penn Chinese Treebank, the second treebank is HIT-
As a example by IR-CDT.For follow-up convenient introduction, the phrase structure of flat structure can be referred to as flat phrase structure.
Referring to Fig. 4 a and Fig. 4 b, Fig. 4 a are flat phrase structure schematic diagram of the present invention;Fig. 4 b are flat described in Fig. 4 a
Phrase structure is converted to the schematic diagram of dependency structure.
It is flat that the phrase structure of Penn Chinese Treebank belongs to comparison, is mainly reflected in noun compounded phrase.
For example:The phrase of Penn Chinese Treebank, " medical procurement service centre of medical institutions ", its structure is shown
It is intended to as shown in fig. 4 a.Father node is:NP (noun phrase), child node is 6 NN (noun).6 NN be respectively " medical treatment ",
" mechanism ", " medicine ", " buying ", " service " and " " center ".
Dependency analysis are carried out to phrase structure as shown in fig. 4 a using the parser in HIT-IR-LTP, it is obtained
Inter-dependency relation.As a result referring to Fig. 4 b.
First, first order dependence is determined:" medical treatment " and " mechanism ", " medicine " and " buying " and " service " and " " center "
Three dependences.Above-mentioned dependence is represented with the camber line with arrow or with direction.I.e. " medical treatment " by band arrow or
Camber line with direction points to " mechanism ";" medicine " points to " buying " by the camber line with arrow or with direction;" service " passes through
Camber line with arrow or with direction points to " " center ".
Then, it is determined that second level dependence, " mechanism " and " medicine " and " buying " and " service " two dependences.With
Camber line with arrow or with direction represents above-mentioned dependence.I.e. " mechanism " is pointed to by the camber line with arrow or with direction
" medicine ";" buying " points to " service " by the camber line with arrow or with direction.
Thus determine dependency structure as shown in Figure 4 b.
Structure for being expressed using rule carries out special process, mainly for parallel construction.
The phrase structure quantity of such parallel construction is very big.According to the second treebank system, such case needs specially treated.
We are concluded using rule-based method, then specially treated.
In embodiments of the present invention, for the phrase structure of parallel construction can be returned using rule-based method
Receive, be converted into dependency structure.Next rule-based method will be concluded, will be arranged side by side in first treebank
The phrase structure of structure is converted to the concrete operations of dependency structure to launch to introduce, as shown in figure 5, concrete operations are as follows:
S501:It is multiple fragments by the phrase structure cutting of the parallel construction.
When the phrase structure of parallel construction is converted to into dependency structure, it is necessary first to determine the phrase knot of the parallel construction
The core node of structure, core node is used as the key for carrying out dependency structure conversion, therefore, to assure that the accuracy of core node.With one
As a example by section word, the core node of this section of word is determined, if the length of this section of word is longer, it is determined that the difficulty of core node
Can be larger, and the core word that may cause to determine and undesirable, in order to improve the accuracy for determining core node,
Before carrying out the determination of core node, it is multiple fragments that first the phrase structure of parallel construction can be carried out into cutting, is with fragment
Unit, the core node determined from each fragment can be more accurate.
The embodiment of the present invention for the slit mode that the phrase structure cutting of parallel construction is multiple fragments is not construed as limiting,
Can be according to the cutting, or acquisition input method input condition is carried out, with defeated using conjunction part of speech or pause mark as cutting
Enter the interruption of the input in method input condition carries out the cutting for cutting foundation, or is when the phrase of the parallel construction is tied
When different phrases in structure have incidence relation, the cutting is carried out as cutting foundation using the incidence relation.It is wherein different
The incidence relation of phrase can be that different phrases belong to synonym or antonym.
S502:The core node of each fragment is determined respectively, and, by other sections in each fragment in addition to core node
Point is defined as depending on the core node in the fragment.
By taking a fragment as an example, the mode for determining the core node of the fragment can be with phrase structure place sentence
As analysis object, the occurrence number in the sentence context of each node of the fragment is determined, according to different sections
The comparable situation of point occurrence number, determines that occurrence number meets desired node as the core node.
Can will appear from number of times highest node as core node, or can will appear from the higher node of number of times
As core node, or can will appear from node of the number of times higher than setting numerical value as core node.
S503:By each core node of other fragments in addition to first fragment, it is defined as depending on described first
The core node of individual fragment.
The dependency structure between other nodes and core node in a fragment is can determine that in S502, for each
Dependency structure between fragment, can be using the core node in first fragment as the core in the phrase structure of the parallel construction
Heart node, the core node in other fragments sets up dependency structure with the core node.
For example shown in Fig. 6, " special zone such as developed country and Shenzhen " in phrase structure, " developed country " and " spy such as Shenzhen
Area " constitutes coordination, i.e. " special zone such as developed country and Shenzhen " and belongs to the phrase structure of parallel construction, can according to said method
So that the phrase structure of the parallel construction is carried out into cutting, can be with cutting as " developed country " and " special zone such as Shenzhen " the two pieces
Section, the core node of first fragment " developed country " is " country ", and other nodes " prosperity " depend on core in the first fragment
Node " country ", i.e. " prosperity " point to " country " by the camber line with arrow or with direction, second fragment " spy such as Shenzhen
The core node in area " is " Shenzhen ", and other nodes in the second fragment " and ", " etc. " and " special zone " depend on respectively core node
" Shenzhen ", i.e., " and " " Shenzhen " is pointed to by the camber line with arrow or with direction, " etc. " by with arrow or with direction
Camber line points to " Shenzhen ", and " special zone " points to " Shenzhen " by the camber line with arrow or with direction.Between the two fragments, the
Core node " Shenzhen " in two fragments depends on the core node " country " in first fragment, i.e. " Shenzhen " by band arrow
Or the camber line with direction points to " country ".
Determine core node by way of cutting fragment, the accuracy rate of the core node determined can be improved, so as to
So that the dependency structure after conversion is more accurate.
S203, using the dependence mapping model that obtains of training, the dependency structure in first treebank is carried out according to
Relation conversion is deposited, the dependency structure treebank of the second treebank type is obtained.
Referring to Fig. 3, the foundation of the dependence mapping model is comprised the following steps:
S301, using second treebank train dependence marking model.
The work of dependence annotator is to note dependence for each interdependent arc label.Each there are two sections at arc two ends
Point:Own node and father node.Wherein own node depends on father node, father node domination
Own node, father node is core word.As above in figure:" medical treatment-> mechanisms " constitutes an arc, wherein " medical treatment " is
Own node, " mechanism " is father node.
This is a mark problem, using linear-logarithmic model.Using following 4 features:
Feature | Explanation | Feature | Explanation |
word word_f | Word, father's word | word pos_f | Word, father's part of speech |
pos word_f | Part of speech, father's word | pos pos_f distance | Part of speech, father's part of speech, distance |
Probability is trained using Maximum-likelihood estimation, model form is obtained as follows:
F0_ is this _ understanding _ ATT 1
F1_ is this _ n_ATT 0.8
F2_r_ understanding _ ATT 0.142857
f3_r_n_1_ATT 0.997324
S302, dependence mark is carried out to first treebank using the dependence marking model.
With the first treebank as Penn Chinese Treebank, the second treebank is as a example by HIT-IR-CDT, using interdependent pass
It is that marking model carries out dependence mark to Penn Chinese Treebank
The power of wherein four feature word word_f, word pos_f, pos word_f, pos pos_f distance
Value takes respectively 0.4,0.2,0.2,0.2.
Tested using HIT-IR-CDT testing materials, the accuracy rate of dependence marking model is 89.7%.
In order to using original correct part of speech, syntactic information in Penn Chinese Treebank, trained one according to
Relationship map model is deposited, dependence annotation results are corrected.
When phrase structure turns dependency structure, three information are recorded, the phrase type of child node generates phrase class
Type, and the phrase type of father node.
With reference to Fig. 8, the figure is dependence schematic diagram of the present invention.Fig. 8 represents the interdependent of " medical treatment " and " mechanism "
Relation record is " NN-NP-NN ", and " medical treatment " points to " mechanism ", on camber line " NN-NP-NN " is marked by the camber line with arrow.
S303, the result marked using original part of speech and syntactic information of first treebank, the correction dependence,
Set up the dependence mapping model.
During training dependence mapping model, using these three features referring to table 2.
The training dependence mapping modular character table of table 2
Feature | Explanation | Feature | Explanation |
phrase | Itself phrase type | phrase_s | Generate itself phrase type |
phrase_f | Father's phrase type |
Probability is trained using Maximum-likelihood estimation, model form is obtained as follows:
f0_NN_ATT 0.734
f1_NP_ATT 0.543
f2_NN_ATT 0.933
Dependence conversion is carried out using dependence mapping model
The weights of wherein i=0, phrase feature are 0.35;
The weights of i=1, phrase_s feature are 0.3;
The weights of i=2, phrase_f feature are 0.35.
It is as a result as follows after carrying out dependence mapping:
Word | Shanghai | Pudong | Exploitation | With | Legal system | Build | It is synchronous |
Numbering | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
Dependency structure (father node numbering) | 2 | 3 | 7 | 6 | 6 | 3 | 0 |
Syntactic relation annotator result | ATT | ATT | SBV | LAD | ATT | ATT | HED |
Syntactic relation mapping model result | ATT | ATT | SBV | LAD | ATT | COO | HED |
Referring to Fig. 3, the foundation of the dependence mapping model is comprised the following steps:
S301, using second treebank train dependence marking model.
S302, dependence mark is carried out to first treebank using the dependence marking model.
S303, the result marked using original part of speech and syntactic information of first treebank, the correction dependence,
Set up the dependence mapping model.
The dependence marking model uses the second linear-logarithmic model
Carry out the dependence mark;
Wherein, i=0, correspondence word word_f words, father's word feature;
I=1, correspondence word pos_f words, father node part of speech feature;
I=2, correspondence pos word_f part of speech feature;
I=3, correspondence pos pos_f distance father node part of speech feature;
λ0:The weights of word word_f features during correspondence i=0;
λ1:The weights of word pos_f features during correspondence i=1;
λ2:The weights of pos word_f features during correspondence i=2;
λ3:The weights of pos pos_f distance features during correspondence i=2.
The dependence mapping model uses third linear logarithmic model
Carry out the dependence mark;
Wherein, i=0, correspondence phrase itself phrase type features;
I=1, correspondence phrase_s generates itself phrase type feature;
I=2, correspondence father's phrase_f phrase type feature;
λ0:The weights of phrase features during correspondence i=0;
λ1:The weights of phrase_s features during correspondence i=1;
λ2:The weights of phrase_f features during correspondence i=2.
Dependency structure treebank acquisition methods described in the embodiment of the present invention are included the such as Chinese phrase structure conversion of the first treebank
For the second treebank type dependency structure treebank the step of.Dependency structure treebank acquisition methods described in the embodiment of the present invention are by Chinese
Phrase structure treebank is converted to dependency structure treebank, so, the dependency structure treebank after conversion can very easily with it is original
Dependency structure treebank is merged, and so as to increase treebank scale, and then effectively improves the performance of parser.
Meanwhile, dependency structure treebank acquisition methods described in the embodiment of the present invention are included using parser to the first treebank
In the phrase structure of flat structure the step of be converted to dependency structure, solve the phrase of the flat structures such as noun compounded phrase
Structure is converted to the difficult problem of dependency structure.
Referring to Fig. 7, the figure is dependency structure treebank acquisition methods second embodiment flow chart of the present invention.
Dependency structure treebank acquisition methods second embodiment of the present invention is with respect to the difference of first embodiment,
The step of converting to part-of-speech tagging collection is further increased in two embodiments.
Dependency structure treebank acquisition methods, comprise the following steps described in second embodiment of the invention:
S701, call the first treebank.
S702, crossover tool and parser that the first treebank is respectively adopted, by the phrase in first treebank
Structure is converted to dependency structure.
S702 is similar with the processing procedure of S202, will not be described here.
S703, the part-of-speech tagging collection in first treebank is converted into the mark for meeting Chinese Industrial Standards (CIS) part-of-speech tagging collection requirement
Note collection.
The Chinese Industrial Standards (CIS) part-of-speech tagging collection can be 863 part-of-speech tagging collection.
Syntactic structure information is not only included in one treebank, part-of-speech information can also be included.The word that each treebank is adopted
Property mark collection be also not quite similar.Therefore the step of converting to part-of-speech tagging collection can be increased.863 part-of-speech tagging collection are China
One of standard part-of-speech tagging collection, embodiment of the present invention methods described is by the first treebank such as Penn Chinese Treebank words
Property mark collection be converted into the mark collection such as 863 part-of-speech tagging collection for meeting Chinese Industrial Standards (CIS) part-of-speech tagging collection requirement, so can unite
The mark of part of speech in one treebank, improves the accuracy of conversion.
Next the concrete operations of part-of-speech tagging collection conversion process will be launched to introduce.Specifically, it is possible to use described
Two treebanks carry out part-of-speech tagging to the word of the first treebank, and carry out part of speech division using the part of speech mapping model for pre-building,
Correct the part of speech of the mark.
With the first treebank as Penn Chinese Treebank, the second treebank is as a example by HIT-IR-CDT, using HIT-
IR-CDT carries out part-of-speech tagging to the word of Penn Chinese Treebank, and using the part of speech mapping model for pre-building
The part of speech division is carried out, the part of speech of the mark is corrected.
The part of speech mapping model uses the first linear logarithmic model:
Carry out the part of speech conversion;
Wherein, i=0, correspondence pos itself part of speech feature;
I=1, correspondence pos_s pos child node parts of speech, itself part of speech feature;
I=2, correspondence pos pos_f itself part of speech feature, father node part of speech;
λ0:The weights of pos features during correspondence i=0;
λ1:The weights of pos_s pos features during correspondence i=1;
λ2:The weights of pos pos_f features during correspondence i=2.
HIT-IR-LTP is the language technology platform of Harbin Institute of Technology's Research into information retrieval room exploitation, wherein comprising each
Plant and include many natural language processing modules such as participle, syntactic analysis etc., the also for example interdependent treebank HIT-IR- of some language material resources
CDT.HIT-IR-LTP is now freely shared to academia.
The precision of the part-of-speech tagging module in HIT-IR-LTP reaches 90%.Using HIT-IR-LTP part-of-speech tagging devices pair
Penn Chinese Treebank carry out part-of-speech tagging.
Although the precision comparison of HIT-IR-LTP part-of-speech tagging modules is high, inevitable or meeting is wrong.In order to
Using original correct part of speech, syntactic information in Penn Chinese Treebank, we trained a part of speech mapping mould
Annotation results are corrected by type.
Part of speech mapping model uses linear-logarithmic model, using three features:
Parameter Estimation adopts Maximum-likelihood estimation, and the model probability for training is in the following example.
F0_NN_n=0.746038, represents that NN is mapped as the probability of n;
F0_NN_v=0.1699158, represents that NN is mapped as the probability of v;
F1_VC_NN_n=0.801055, expression child node is VC, and NN is mapped as the probability of n;
F1_VC_NN_v=0.121002, expression child node is VC, and NN is mapped as the probability of v;
F2_NN_NN_n=0.776695, expression father node is NN, and NN is mapped as the probability of n;
F2_NN_NN_v=0.180412, expression father node is NN, and NN is mapped as the probability of v.
Part of speech conversion is carried out using the formula of following part of speech mapping model:
λ0=0.4, λ0The weights of pos features during correspondence i=0;
λ1=0.3, λ1The weights of pos_s pos features during correspondence i=1;
λ2=0.3, λ2The weights of pos pos_f features during correspondence i=2.
Part of speech mapping model corrects the table of comparisons of marking error for example shown in following table
As seen from the above, using original Penn Chinese Treebank treebank information, some can effectively be corrected
Part-of-speech tagging mistake.
It should be noted that S702 and S703 do not have the restriction on sequencing.
S704, using the dependence mapping model that obtains of training, the dependency structure in first treebank is carried out according to
Relation conversion is deposited, the dependency structure treebank of the second treebank type is obtained.
During training dependence mapping model, using three features in table.
Feature | Explanation | Feature | Explanation |
phrase | Itself phrase type | phrase_s | Generate itself phrase type |
phrase_f | Father's phrase type |
Probability is trained using Maximum-likelihood estimation, obtains training dependence mapping model form, using dependence
Mapping model carries out dependence conversion.
The formula of dependence mapping model is as follows:
Wherein three features phrase, phrase_s, the weights of phrase_f take respectively 0.35,0.3,0.35.
It is as a result as follows after carrying out dependence mapping:
Word | Shanghai | Pudong | Exploitation | With | Legal system | Build | It is synchronous |
Numbering | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
Dependency structure (father node numbering) | 2 | 3 | 7 | 6 | 6 | 3 | 0 |
Syntactic relation annotator result | ATT | ATT | SBV | LAD | ATT | ATT | HED |
Syntactic relation mapping model result | ATT | ATT | SBV | LAD | ATT | COO | HED |
The present invention provides a kind of dependency structure treebank acquisition methods, including by the first treebank such as Chinese phrase structure treebank turn
The dependency structure treebank of the second treebank type is changed to, the part-of-speech tagging collection in the first treebank is converted into and is met Chinese Industrial Standards (CIS) part of speech
The step of mark that mark collection is required collects, contain the conversion of syntactic structure and the conversion of part-of-speech tagging collection so that after conversion
Dependency structure treebank is more accurate, and the dependency structure treebank after conversion very easily can merge with original dependency structure treebank,
So as to increase treebank scale, and then effectively improve the performance of parser.
Referring to Fig. 9, the figure is that dependency structure treebank of the present invention obtains system first embodiment structure chart.
Dependency structure treebank described in first embodiment of the invention obtains system, including call unit 11 and converting unit 12.
The call unit 11, for calling the first treebank;First treebank is Chinese phrase structure treebank.
The converting unit 12, for the crossover tool and parser of the first treebank to be respectively adopted, by described
Phrase structure in one treebank is converted to dependency structure;Second treebank is the treebank of dependency structure.
Wherein, the phrase structure in first treebank is converted to by dependency structure bag using the crossover tool of the first treebank
Include:The rule that the phrase structure in the first treebank is converted to dependency structure provided using the crossover tool, or to institute
Rule obtained by stating after rule is modified, by the phrase structure dependency structure is converted to;And, rule-based method
Concluded, the phrase structure of the parallel construction in first treebank is converted to into dependency structure.
Wherein, using parser, the phrase structure in first treebank is converted to into dependency structure includes:Utilize
The parser, by the phrase structure of the flat structure in first treebank dependency structure is converted to.
The converting unit 12 is additionally operable to the dependence mapping model obtained using training, in first treebank
Dependency structure carries out dependence conversion, obtains the dependency structure treebank of the second treebank type.
The converting unit 12 is connected with the call unit 11.
Optionally, the converting unit specifically includes determination subelement and scanning subelement:
The determination subelement, for according to the Head core node mapping tables for pre-building, determining first treebank
Phrase structure treebank in grammar inference core node.
The scanning subelement, for using the mapping table, and according to the rule in the mapping table, for the core
Heart node is scanned, and obtains the dependence of other child nodes and the core node.
Wherein, the Head core nodes mapping table be according to the crossover tool provided will be short in the first treebank
Language structure is converted to the rule of dependency structure, or is modified what rear resulting rule was formed to the rule.
Optionally, the converting unit is specifically for using the parser, to first treebank in it is flat
The phrase structure of structure, finds maximum spanning tree in digraph, determines different phrases in the phrase structure of the flat structure
Interdependent probability;The phrase structure of the flat structure in first treebank is changed according to the interdependent probability of the different phrases
For dependency structure.
Optionally, the system further includes parser training unit, for short in using second treebank
Language is trained to the parser.
Optionally, the system further includes adjustment unit, and the phrase structure for obtaining the flat structure is converted to
The conversion accuracy of dependency structure, according to the accuracy rate, to the parser training is adjusted.
Optionally, the adjustment unit, it is described interdependent after search and statistics conversion specifically for utilizing Internet resources
The probability of occurrence of structure, according to conversion accuracy described in the determine the probability.
Optionally, the converting unit specifically includes cutting subelement and interdependent determination subelement,
The cutting subelement, for by the phrase structure cutting of the parallel construction be multiple fragments;
The interdependent determination subelement, for determining the core node of each fragment respectively, and, will remove in each fragment
Other nodes outside core node are defined as depending on the core node in the fragment;
The interdependent determination subelement, is additionally operable to each core node of other fragments in addition to first fragment,
It is defined as depending on the core node of first fragment.
Optionally, the cutting subelement, for the phrase structure of the parallel construction to be made with conjunction part of speech or pause mark
The cutting is carried out for cutting foundation.
Optionally, the cutting subelement, for obtaining input method input condition, with the input in input method input condition
Be interrupted carries out the cutting for cutting foundation.
Optionally, the cutting subelement, for closing when the different phrases in the phrase structure of the parallel construction have
During connection relation, the cutting is carried out as cutting foundation using the incidence relation.
Optionally, the interdependent determination subelement, for using phrase structure place sentence as analysis object, it is determined that
The occurrence number in the sentence context of each node of the fragment, according to the comparison feelings of different node occurrence numbers
Condition, determines that occurrence number meets desired node as the core node.
Optionally, according to the foundation of dependence mapping model, the system also includes training unit, mark unit and correction
Unit:
The training unit, for training dependence marking model using second treebank.
The mark unit, for carrying out dependence mark to first treebank using the dependence marking model
Note.
The correction unit, for using original part of speech and syntactic information of first treebank, correcting the interdependent pass
The result of system's mark, sets up the dependence mapping model.
Optionally, the dependence marking model uses the second linear-logarithmic model
Carry out dependence mark;
Wherein, i=0, correspondence word word_f words, father's word feature,
I=1, correspondence word pos_f words, father node part of speech feature,
I=2, correspondence pos word_f part of speech feature,
I=3, correspondence pos pos_f distance father node part of speech feature,
λ0:The weights of word word_f features during correspondence i=0;
λ1:The weights of word pos_f features during correspondence i=1;
λ2:The weights of pos word_f features during correspondence i=2;
λ3:The weights of pos pos_f distance features during correspondence i=3.
Dependence marking model may refer to dependence mark mould in dependency structure treebank acquisition methods described previously
The description of type.
Optionally, the dependence mapping model uses third linear logarithmic model
Carry out dependence mark;
Wherein, i=0, correspondence phrase itself phrase type features,
I=1, correspondence phrase_s generates itself phrase type feature,
I=2, correspondence father's phrase_f phrase type feature,
λ0:The weights of phrase features during correspondence i=0;
λ1:The weights of phrase_s features during correspondence i=1;
λ2:The weights of phrase_f features during correspondence i=2.
Dependence mapping model may refer to dependence mapping mould in dependency structure treebank acquisition methods described previously
The description of type.
Optionally, first treebank is Penn Chinese TreeBank Universities of Pennsylvania Chinese treebank, described
Second treebank is the interdependent treebank of HIT-IR-CDT Harbin Institute of Technologys Chinese.
Dependency structure treebank described in the embodiment of the present invention obtains system and includes the call unit 11 for being used for calling the first treebank,
Dependency structure is converted to by the phrase structure in the first treebank, and interdependent pass is carried out to the dependency structure in first treebank
System's conversion, obtains the converting unit 12 of the dependency structure treebank of the second treebank type.Dependency structure tree described in the embodiment of the present invention
Storehouse obtains system and Chinese phrase structure treebank can be converted to into dependency structure treebank, so, the dependency structure treebank after conversion
Very easily can merge with original dependency structure treebank, so as to increase treebank scale, and then effectively improve syntax
The performance of analyzer.
Meanwhile, dependency structure treebank described in the embodiment of the present invention obtains system and can utilize sentence comprising the converting unit 12
Method analyzer is converted to dependency structure to the phrase structure of the flat structure in the first treebank, solves noun compounded phrase etc. flat
The phrase structure of flat structure is converted to the difficult problem of dependency structure.
Referring to Figure 10, the figure is that dependency structure treebank of the present invention obtains system second embodiment structure chart.
Dependency structure treebank of the present invention obtains system second embodiment and increased conversion unit with respect to first embodiment
13。
Dependency structure treebank of the present invention obtains the conversion list that system further includes to be connected with the converting unit 12
Unit 13, for the part-of-speech tagging collection in first treebank to be converted into the mark for meeting Chinese Industrial Standards (CIS) part-of-speech tagging collection requirement
Collection.
Optionally, the Chinese Industrial Standards (CIS) part-of-speech tagging collection is 863 part-of-speech tagging collection.
Optionally, the conversion unit to the word of the first treebank using second treebank specifically for carrying out part of speech mark
Note, and part of speech division is carried out using the part of speech mapping model for pre-building, correct the part of speech of the mark.
Optionally, the part of speech mapping model uses the first linear logarithmic model:
Carry out part of speech conversion;
Wherein, i=0, correspondence pos itself part of speech feature,
I=1, correspondence pos_s pos child node parts of speech, itself part of speech feature,
I=2, correspondence pos pos_f itself part of speech feature, father node part of speech,
λ0:The weights of pos features during correspondence i=0;
λ1:The weights of pos_s pos features during correspondence i=1;
λ2:The weights of pos pos_f features during correspondence i=2.
Part of speech mapping model may refer to the description of part of speech mapping model in dependency structure treebank acquisition methods described previously.
Dependency structure treebank described in the embodiment of the present invention obtains system and includes the call unit 11 for being used for calling the first treebank,
Phrase structure in first treebank is converted to into dependency structure, and dependence is carried out to the dependency structure in first treebank
Conversion, obtains the converting unit 12 of the dependency structure treebank of the second treebank type, and the part-of-speech tagging collection in the first treebank is turned
Turn to the conversion unit 13 of the mark collection for meeting Chinese Industrial Standards (CIS) part-of-speech tagging collection requirement such that it is able to realize the conversion of syntactic structure
With the conversion of part-of-speech tagging collection so that the dependency structure treebank after conversion is more accurate.Dependency structure treebank after conversion can be very
Convenient and original dependency structure treebank is merged, and so as to increase treebank scale, and then effectively improves parser
Performance.
Meanwhile, dependency structure treebank described in the embodiment of the present invention obtains system and can utilize sentence comprising the converting unit 12
Method analyzer is converted to dependency structure to the phrase structure of the flat structure in the first treebank, solves noun compounded phrase etc. flat
The phrase structure of flat structure is converted to the difficult problem of dependency structure.
The preferred embodiment of the present invention is the foregoing is only, limiting the scope of the present invention is not constituted.It is any
Any modification, equivalent and improvement for being made within the spirit and principles in the present invention etc., should be included in the power of the present invention
Within the scope of profit is claimed.
Claims (24)
1. a kind of dependency structure treebank acquisition methods, it is characterised in that the method includes:
Call the first treebank;First treebank is Chinese phrase structure treebank;
The crossover tool and parser of the first treebank are respectively adopted, the phrase structure in first treebank is converted to
Dependency structure;Second treebank is the treebank of dependency structure;
Wherein, the phrase structure in first treebank is converted to into dependency structure using the crossover tool of the first treebank includes:
The rule that the phrase structure in the first treebank is converted to dependency structure provided using the crossover tool, or to the rule
Rule obtained by after being then modified, by the phrase structure dependency structure is converted to;And, rule-based method is carried out
Conclude, the phrase structure of the parallel construction in first treebank is converted to into dependency structure;
Wherein, using parser, the phrase structure in first treebank is converted to into dependency structure includes:Using syntax
Analyzer, by the phrase structure of the flat structure in first treebank dependency structure is converted to;
The dependence mapping model obtained using training, dependence is carried out to the dependency structure in first treebank and is turned
Change, obtain the dependency structure treebank of the second treebank type.
2. method according to claim 1, it is characterised in that it is described using the crossover tool provided by the first tree
Phrase structure in storehouse is converted to the rule of dependency structure, or rule resulting after being modified to the rule, will be described
Phrase structure is converted to dependency structure, including:According to the Head core node mapping tables for pre-building, first treebank is determined
Phrase structure treebank in grammar inference core node;
Using the mapping table, and according to the rule in the mapping table, it is scanned for the core node, obtains other
The dependence of child node and the core node;
Wherein, the Head core nodes mapping table is to tie the phrase in the first treebank according to what the crossover tool was provided
Structure is converted to the rule of dependency structure, or is modified what rear resulting rule was formed to the rule.
3. method according to claim 1, it is characterised in that the utilization parser, by first treebank
The phrase structure of flat structure be converted to dependency structure, specifically include:
Using the parser, the phrase structure to the flat structure in first treebank, find most in digraph
Big spanning tree, determines the interdependent probability of different phrases in the phrase structure of the flat structure;
The phrase structure of the flat structure in first treebank is converted to according to the interdependent probability of the different phrases interdependent
Structure.
4. the method according to claim 1,2 or 3, it is characterised in that using the phrase in second treebank to described
Parser is trained.
5. the method according to claim 1,2 or 3, it is characterised in that the method is further included:Obtain the flat knot
The phrase structure of structure is converted to the conversion accuracy of dependency structure, according to the accuracy rate, the parser is adjusted
Training is practiced.
6. method according to claim 5, it is characterised in that utilize Internet resources, search and counts the institute after conversion
The probability of occurrence of dependency structure is stated, according to conversion accuracy described in the determine the probability.
7. method according to claim 1, it is characterised in that the rule-based method is concluded, by described
The phrase structure of the parallel construction in one treebank is converted to dependency structure, specifically includes:
It is multiple fragments by the phrase structure cutting of the parallel construction;
The core node of each fragment is determined respectively, and, other nodes in each fragment in addition to core node are defined as
Depend on the core node in the fragment;
By each core node of other fragments in addition to first fragment, it is defined as depending on the core of first fragment
Heart node.
8. method according to claim 7, it is characterised in that described is many by the phrase structure cutting of the parallel construction
Individual fragment, specifically includes:
The cutting is carried out as cutting foundation using conjunction part of speech or pause mark.
9. method according to claim 7, it is characterised in that described is many by the phrase structure cutting of the parallel construction
Individual fragment, specifically includes:
Input method input condition is obtained, the cutting is carried out as cutting foundation with the input interruption in input method input condition.
10. method according to claim 7, it is characterised in that described to be by the phrase structure cutting of the parallel construction
Multiple fragments, specifically include:
When the different phrases in the phrase structure of the parallel construction have incidence relation, using the incidence relation as cutting
Foundation carries out the cutting.
11. methods according to claim 7, it is characterised in that the core node of each fragment of determination includes:With institute
Phrase structure place sentence is stated as analysis object, the going out in the sentence context of each node of the fragment is determined
Occurrence number, according to the comparable situation of different node occurrence numbers, determines that occurrence number meets desired node as the core
Node.
12. methods according to claim 1, it is characterised in that the foundation of the dependence mapping model includes:
Dependence marking model is trained using second treebank;
Dependence mark is carried out to first treebank using the dependence marking model;
Using original part of speech and syntactic information of first treebank, the result of the dependence mark is corrected, set up described
Dependence mapping model.
13. method according to claim 12, it is characterised in that the dependence marking model is linear right using second
Exponential model
Wherein, i=0, correspondence word word_f words, father's word feature,
I=1, correspondence word pos_f words, father node part of speech feature,
I=2, correspondence pos word_f part of speech feature,
I=3, correspondence pos pos_f distance father node part of speech feature,
λ0:The weights of word word_f features during correspondence i=0;
λ1:The weights of word pos_f features during correspondence i=1;
λ2:The weights of pos word_f features during correspondence i=2;
λ3:The weights of pos pos_f distance features during correspondence i=3.
14. methods according to claim 12 or 13, it is characterised in that the dependence mapping model uses the 3rd line
Property logarithmic model
Wherein, i=0, correspondence phrase itself phrase type features,
I=1, correspondence phrase_s generates itself phrase type feature,
I=2, correspondence father's phrase_f phrase type feature,
λ0:The weights of phrase features during correspondence i=0;
λ1:The weights of phrase_s features during correspondence i=1;
λ2:The weights of phrase_f features during correspondence i=2.
15. methods according to claim 1 to 14 any one, it is characterised in that the method is further included:
Part-of-speech tagging collection in first treebank is converted into the mark collection for meeting Chinese Industrial Standards (CIS) part-of-speech tagging collection requirement.
16. methods according to claim 15, it is characterised in that the Chinese Industrial Standards (CIS) part-of-speech tagging collection is 863 part of speech marks
Note collection.
17. methods according to claim 15 or 16, it is characterised in that the part-of-speech tagging by first treebank
Collection is converted into the mark collection for meeting Chinese Industrial Standards (CIS) part-of-speech tagging collection requirement, including:
Part-of-speech tagging is carried out to the word of the first treebank using second treebank, and using the part of speech mapping model for pre-building
Part of speech division is carried out, the part of speech of the mark is corrected.
18. methods according to claim 17, it is characterised in that the part of speech mapping model uses the first linear logarithmic mode
Type:
Wherein, i=0, correspondence pos itself part of speech feature,
I=1, correspondence pos_s pos child node parts of speech, itself part of speech feature,
I=2, correspondence pos pos_f itself part of speech feature, father node part of speech,
λ0:The weights of pos features during correspondence i=0;
λ1:The weights of pos_s pos features during correspondence i=1;
λ2:The weights of pos pos_f features during correspondence i=2.
19. methods according to claim 1 to 18 any one, it is characterised in that first treebank is Penn
Chinese TreeBank Universities of Pennsylvania Chinese treebank, second treebank is that HIT-IR-CDT Harbin Institute of Technologys Chinese is interdependent
Treebank.
A kind of 20. dependency structure treebanks obtain system, it is characterised in that the system includes call unit and converting unit:
The call unit, for calling the first treebank;First treebank is Chinese phrase structure treebank;
The converting unit, for the crossover tool and parser of the first treebank to be respectively adopted, by first treebank
In phrase structure be converted to dependency structure;Second treebank is the treebank of dependency structure;
Wherein, the phrase structure in first treebank is converted to into dependency structure using the crossover tool of the first treebank includes:
The rule that the phrase structure in the first treebank is converted to dependency structure provided using the crossover tool, or to the rule
Rule obtained by after being then modified, by the phrase structure dependency structure is converted to;And, rule-based method is carried out
Conclude, the phrase structure of the parallel construction in first treebank is converted to into dependency structure;
Wherein, using parser, the phrase structure in first treebank is converted to into dependency structure includes:Using described
Parser, by the phrase structure of the flat structure in first treebank dependency structure is converted to;
The converting unit is additionally operable to the dependence mapping model obtained using training, to the interdependent knot in first treebank
Structure carries out dependence conversion, obtains the dependency structure treebank of the second treebank type.
21. systems according to claim 20, it is characterised in that the converting unit specifically includes determination subelement and sweeps
Retouch subelement:
The determination subelement, for according to the Head core node mapping tables for pre-building, determining the short of first treebank
The core node of grammar inference in language structure treebank;
The scanning subelement, for using the mapping table, and according to the rule in the mapping table, for the core section
Point is scanned, and obtains the dependence of other child nodes and the core node;
Wherein, the Head core nodes mapping table is to tie the phrase in the first treebank according to what the crossover tool was provided
Structure is converted to the rule of dependency structure, or is modified what rear resulting rule was formed to the rule.
22. systems according to claim 20, it is characterised in that the converting unit specifically include cutting subelement and according to
Deposit determination subelement:
The cutting subelement, for by the phrase structure cutting of the parallel construction be multiple fragments;
The interdependent determination subelement, for determining the core node of each fragment respectively, and, core will be removed in each fragment
Other nodes outside node are defined as depending on the core node in the fragment;
The interdependent determination subelement, is additionally operable to each core node of other fragments in addition to first fragment, it is determined that
To depend on the core node of first fragment.
23. systems according to claim 20, it is characterised in that according to the foundation of dependence mapping model, the system
Also include training unit, mark unit and correct unit:
The training unit, for training dependence marking model using second treebank;
The mark unit, for carrying out dependence mark to first treebank using the dependence marking model;
The correction unit, for using original part of speech and syntactic information of first treebank, correcting the dependence mark
The result of note, sets up the dependence mapping model.
24. systems according to claim 20 to 23 any one, it is characterised in that the system further includes that conversion is single
Unit:
The conversion unit, for the part-of-speech tagging collection in first treebank to be converted into Chinese Industrial Standards (CIS) part-of-speech tagging collection is met
The mark collection of requirement.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611208593.6A CN106598951B (en) | 2016-12-23 | 2016-12-23 | A kind of dependency structure treebank acquisition methods and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611208593.6A CN106598951B (en) | 2016-12-23 | 2016-12-23 | A kind of dependency structure treebank acquisition methods and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106598951A true CN106598951A (en) | 2017-04-26 |
CN106598951B CN106598951B (en) | 2019-08-16 |
Family
ID=58601481
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611208593.6A Active CN106598951B (en) | 2016-12-23 | 2016-12-23 | A kind of dependency structure treebank acquisition methods and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106598951B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107391488A (en) * | 2017-07-28 | 2017-11-24 | 昆明理工大学 | A kind of interdependent syntactic analysis method of Chinese of minimum spanning tree statistics fusion |
CN107656921A (en) * | 2017-10-10 | 2018-02-02 | 上海数眼科技发展有限公司 | A kind of short text dependency analysis method based on deep learning |
CN108628829A (en) * | 2018-04-23 | 2018-10-09 | 苏州大学 | Automatic treebank method for transformation based on tree-like Recognition with Recurrent Neural Network and system |
CN109460552A (en) * | 2018-10-29 | 2019-03-12 | 朱丽莉 | Rule-based and corpus Chinese faulty wording automatic testing method and equipment |
US11769007B2 (en) | 2021-05-27 | 2023-09-26 | International Business Machines Corporation | Treebank synthesis for training production parsers |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101201819A (en) * | 2007-11-28 | 2008-06-18 | 北京金山软件有限公司 | Method and system for transferring tree bank |
CN101382844A (en) * | 2008-10-24 | 2009-03-11 | 上海埃帕信息科技有限公司 | Method for inputting spacing participle |
-
2016
- 2016-12-23 CN CN201611208593.6A patent/CN106598951B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101201819A (en) * | 2007-11-28 | 2008-06-18 | 北京金山软件有限公司 | Method and system for transferring tree bank |
CN101382844A (en) * | 2008-10-24 | 2009-03-11 | 上海埃帕信息科技有限公司 | Method for inputting spacing participle |
Non-Patent Citations (2)
Title |
---|
周惠巍等: "短语结构到依存结构树库转换研究", 《大连理工大学学报》 * |
李正华: "依存句法分析统计模型及树库转化研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107391488A (en) * | 2017-07-28 | 2017-11-24 | 昆明理工大学 | A kind of interdependent syntactic analysis method of Chinese of minimum spanning tree statistics fusion |
CN107656921A (en) * | 2017-10-10 | 2018-02-02 | 上海数眼科技发展有限公司 | A kind of short text dependency analysis method based on deep learning |
CN108628829A (en) * | 2018-04-23 | 2018-10-09 | 苏州大学 | Automatic treebank method for transformation based on tree-like Recognition with Recurrent Neural Network and system |
CN108628829B (en) * | 2018-04-23 | 2022-03-15 | 苏州大学 | Automatic tree bank transformation method and system based on tree-shaped cyclic neural network |
CN109460552A (en) * | 2018-10-29 | 2019-03-12 | 朱丽莉 | Rule-based and corpus Chinese faulty wording automatic testing method and equipment |
US11769007B2 (en) | 2021-05-27 | 2023-09-26 | International Business Machines Corporation | Treebank synthesis for training production parsers |
Also Published As
Publication number | Publication date |
---|---|
CN106598951B (en) | 2019-08-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106598951A (en) | Dependency structure treebank acquisition method and system | |
CN102214166B (en) | Machine translation system and machine translation method based on syntactic analysis and hierarchical model | |
CN105045778B (en) | A kind of Chinese homonym mistake auto-collation | |
US7088949B2 (en) | Automated essay scoring | |
CN105975625A (en) | Chinglish inquiring correcting method and system oriented to English search engine | |
CN109918640B (en) | Chinese text proofreading method based on knowledge graph | |
CN109062892A (en) | A kind of Chinese sentence similarity calculating method based on Word2Vec | |
CN101866337A (en) | Part-or-speech tagging system, and device and method thereof for training part-or-speech tagging model | |
CN101201819B (en) | Method and system for transferring tree bank | |
CN104102630B (en) | A kind of method for normalizing for Chinese and English mixing text in Chinese social networks | |
CN104881402A (en) | Method and device for analyzing semantic orientation of Chinese network topic comment text | |
CN102929870A (en) | Method for establishing word segmentation model, word segmentation method and devices using methods | |
CN103699529A (en) | Method and device for fusing machine translation systems by aid of word sense disambiguation | |
CN106844348B (en) | Method for analyzing functional components of Chinese sentences | |
CN105022806B (en) | The method and system of the internet web page construction movement page based on translation template | |
CN107133223A (en) | A kind of machine translation optimization method for exploring more reference translation information automatically | |
CN111340661A (en) | Automatic application problem solving method based on graph neural network | |
CN105868187B (en) | The construction method of more translation Parallel Corpus | |
CN102646091A (en) | Dependence relationship labeling method, device and system | |
CN107391495A (en) | A kind of sentence alignment schemes of bilingual parallel corporas | |
CN104050255A (en) | Joint graph model-based error correction method and system | |
CN105243053B (en) | Extract the method and device of document critical sentence | |
CN102929864B (en) | A kind of tone-character conversion method and device | |
CN105677639A (en) | English word sense disambiguation method based on phrase structure syntax tree | |
CN104008301A (en) | Automatic construction method for hierarchical structure of domain concepts |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |