CN106844348A - A kind of Chinese sentence functional component analysis method - Google Patents

A kind of Chinese sentence functional component analysis method Download PDF

Info

Publication number
CN106844348A
CN106844348A CN201710077125.8A CN201710077125A CN106844348A CN 106844348 A CN106844348 A CN 106844348A CN 201710077125 A CN201710077125 A CN 201710077125A CN 106844348 A CN106844348 A CN 106844348A
Authority
CN
China
Prior art keywords
functional component
sentence
chinese
component analysis
training data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710077125.8A
Other languages
Chinese (zh)
Other versions
CN106844348B (en
Inventor
赵铁军
曹海龙
王亚楠
徐冰
朱聪慧
杨沐昀
郑德权
马春鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Heilongjiang Industrial Technology Research Institute Asset Management Co ltd
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN201710077125.8A priority Critical patent/CN106844348B/en
Publication of CN106844348A publication Critical patent/CN106844348A/en
Application granted granted Critical
Publication of CN106844348B publication Critical patent/CN106844348B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • G06F40/16Automatic learning of transformation rules, e.g. from examples

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

A kind of Chinese sentence functional component analysis method, the present invention relates to Chinese sentence functional component analysis method.The invention aims to solve the problems, such as that prior art does not account for the functional component of Chinese sentence.Process is:First, training corpus is processed, CTB5.0 is converted, change into the form with functional component label, be modified and obtain revised language material;The form based on word granularity is changed into, as A;2nd, A input syntactic function component analysers are trained and obtain Chinese sentence functional component analysis model C;3rd, pure Chinese language text data are processed, obtains the sentence with functional component label, change into the form based on word granularity, as B, be combined A with B as final training data;4th, Chinese sentence to be tested is tested using Chinese sentence functional component analysis model D, obtains test result.The present invention is used for function of sentence constituent analysis field.

Description

A kind of Chinese sentence functional component analysis method
Technical field
The present invention relates to Chinese sentence functional component analysis method, it is related to machine translation mothod field.
Background technology
Syntactic analysis is a key issue of natural language processing, and the effect for reaching at present is not fully up to expectations, is in one In the state of individual bottleneck.Syntactic analysis is still a much-talked-about topic in present research, and reason is that syntactic analysis is in Crossover position in whole natural language processing task, a lot of other natural language processing tasks can use the result, have Many researchs have all used this partial analysis content, and either superstructure or understructure all can be to apply to be somebody's turn to do Result, main syntactic analysis method can be divided into two classes, and a class is shallow parsing, that is, chunk parsing, no longer With word as cell processing, but processed by elementary cell of language block, directly generation one is divided into this alanysis new The analysis method of sequence result, also there is the method for being analyzed different language blocks again, and syntactic analysis is carried out by unit of language block, And ignore the information of language block inner structure, the result for producing is still a Partial Parsing tree;Another kind of is complete syntax Analysis, processing unit herein is then each word in sentence, and generation is a complete syntax tree, and this syntactic analysis Task can also be divided into the syntax tree analysis and the analysis of dependency structure syntax tree of phrase structure, in phrase structure syntactic analysis In, basic sentence uniterm is passed through into its form and relation in phrase, for gradually setting up by different level is complete to be had The syntax tree of phrase hierarchical structure, it is similar, in having interdependent syntactic analysis, according to the definition of dependency grammar, built by model What is stood out is the complete syntax tree that set up out by dependence between word one has dependence.
But, the function information all not having in sentence in these researchs is considered wherein, phrase structure syntax point What analysis considered is the information of phrase level, and what is considered is the dependence between word in interdependent syntactic analysis, and these all do not have Play the role of to embody word or set of words shows (such as SVO etc.) in sentence, Zhou Qiang of Tsing-Hua University et al. is carried for the first time Similar concept is gone out, functional component is extracted task and is converted into a kind of chunk parsing task by they, therewith before phrase language block The difference is that functional component of the label for sentence, and inter-related task has been issued in the task of CIPS-2009, but after Several years in, related research is substantially at the lag phase, only has an article related to the task to deliver in 2011 In Journal of Chinese Information Processing.
Function of sentence composition all has very important significance in many practical problems, and for example the word alignment in machine translation is appointed In business, using function of sentence composition information, we can accelerate word alignment speed and accuracy rate, that is, allow the word of identical component to carry out Corresponding, such method was both easy, and the rule in linguistics is met again;Similarly, in interdependent syntactic analysis, we can be with Illegal path is directly deleted during post is searched for as qualifications by the use of function of sentence composition information, so as to carry The speed of height search, similarly, such rule also has the advantages that simple and easy to do;Research in semantic analysis.More attach most importance to Want be in whole natural language processing task, it can as syntactic analysis and a transition task of semantic analysis, from Seen in granularity, it is less than semantic analysis higher than syntactic analysis, the task obtains preferable effect can all can to the two tasks Effect is improved, from introduction before it can be seen that this research has critically important application prospect, is worth carrying out this direction Concern.
But existing correlative study is in the very primary stage, not many work can be used for reference, main The function chunk parsing of the analysis method or Zhou Qiang wanted et al., but these methods also have many defects.First, Chinese function The data volume of treebank is not very many, while being asked with the presence of the artificial accuracy certain using the regular composition treebank being converted into Topic, and do not have the renewal of data afterwards;Secondly, whether Zhou Qiang et al. or old hundred million research are all only for Chinese sentence Son marks out its function language block, produces a result for individual layer linear structure and is not a hierarchical structure, in order to Serve the structure of parsing tree;In addition, for from specific Task, there is presently no work(specially to Chinese sentence Energy composition has the development of correlative study.Therefore, it is proposed that Chinese functional component analyzes baseline model and based on shift-in reduction Act the analysis method of transfer.From above-mentioned work contribution and work meaning, our work is that have good background to anticipate Justice.
The content of the invention
The invention aims to solve the problems, such as that prior art does not account for the functional component of Chinese sentence, and propose A kind of Chinese sentence functional component analysis method.
A kind of Chinese sentence functional component analysis method detailed process is:
Step one, training corpus is processed, wherein training corpus uses CTB5.0, by way of canonical is matched pair CTB5.0 is converted, and changes into the form with functional component label, and the sentence with functional component label form is carried out Amendment, obtains revised language material;Language material after being modified changes into the form based on word granularity, used as training data A;
CTB5.0 is Chinese Binzhou treebank;
Step 2, a series of process that whole functional component analysis process is considered into state transfers, obtain syntactic function Component analyser, training data A input syntactic function component analysers is trained and obtains the analysis of Chinese sentence functional component MODEL C;
Step 3, pure Chinese language text data are processed according to Chinese sentence functional component analysis model C, carried The sentence of functional component label, is modified to the sentence with functional component label, obtains revised language material;To be repaiied Language material after just changes into the form based on word granularity, as training data B, training data A is combined into work with training data B It is final training data;
Step 4, final training data input syntactic function component analyser is trained obtains Chinese sentence function Constituent analysis model D, is tested Chinese sentence to be tested using Chinese sentence functional component analysis model D, is tested As a result.
Beneficial effects of the present invention are:
The present invention uses a kind of Chinese sentence functional component analysis method, and whole functional component analysis process is considered into one The process of sequence of states transfer, obtains syntactic function component analyser, and a training corpus part is using CTB5.0 (Chinese Bin Zhoushu Storehouse), another part carries out a series of result after treatment using pure Chinese language text data, uses syntactic function component analyser Training corpus is trained, functional component analysis model is obtained, using Chinese sentence functional component analysis model to be tested Chinese sentence (500 sentences) is tested, and obtains accurate rate higher, recall rate, F values.
The accurate rate of present invention syntactic function composition tree whole when testing 500 Chinese sentences as shown in table 1 It is 97.38%, recall rate is that 97.79%, F values are 90.90%.
Brief description of the drawings
Fig. 1 is the method frame figure of whole syntactic function constituent analysis;
Fig. 2 is to illustrate the result figure that functional component analysis is carried out to a Chinese sentence with dendrogram, wherein, [SBJ] It is subject, [PRE] is predicate, [OBJ] is object, [ADV] is the adverbial modifier, [ADJ] is modifier, language centered on [HEAD], IP is Sentence, NP is nominal phrase, and VP is verb character phrase, and ADVP is adverbial phrase, and PP is prepositional phrase, and CP is supplement phrase, ADJP is adjunctival, and QP is numeral classifier phrase, and PN is pronoun, and AD is adverbial word, and VV is action verb, and VA is dynamic for Adjective Word, JJ is adjective, and NN is noun, and AS is auxiliary verb, and P is preposition, and CD is numeral-classifier compound, and OD is with sequential numeral-classifier compound, DEC For, CC is conjunction, and PU is punctuation mark.
Specific embodiment
Specific embodiment one:A kind of Chinese sentence functional component analysis method detailed process of present embodiment is:
Step one, training corpus is processed, wherein training corpus uses CTB5.0 (Chinese Binzhou treebank), CTB5.0 Language material is in itself that the result of syntactic analysis is converted by way of canonical is matched to CTB5.0, is changed into functional component The form of label, is modified to the sentence with functional component label form, obtains revised language material;After being modified Language material change into the form based on word granularity, as training data A;
Step 2, the syntactic analysis method () that will be based on shifting are applied in functional component analysis, by whole functional component Analysis process considers into a series of process of state transfers, obtains syntactic function component analyser, and training data A is input into syntax Functional component analyzer is trained and obtains Chinese sentence functional component analysis model C;Such as Fig. 1;
Step 3, according to Chinese sentence functional component analysis model C to pure Chinese language text data (not including letter, English) (People's Net obtain on news, 10000 of editorial) processed, the sentence with functional component label is obtained, to general Store-through mistake be modified, the sentence with functional component label is modified, obtain revised language material;Will carry out Revised language material changes into the form based on word granularity, as training data B, training data A is combined with training data B As final training data;
Step 4, final training data input syntactic function component analyser is trained obtains Chinese sentence function Constituent analysis model D, is surveyed using Chinese sentence functional component analysis model D to Chinese sentence to be tested (500 sentences) Examination, obtains test result.
Specific embodiment two:Present embodiment from unlike specific embodiment one:To training in the step one Language material is processed, and wherein training corpus uses CTB5.0 (Chinese Binzhou treebank), and CTB5.0 language materials are in itself syntactic analyses As a result, CTB5.0 is converted by way of canonical is matched, changes into the form with functional component label, it is active to band The sentence of energy composition label form is modified, and obtains revised language material;Language material after being modified is changed into based on word The form of granularity, as training data A;Detailed process is:
Training corpus is processed, wherein training corpus uses CTB5.0 (Chinese Binzhou treebank), CTB5.0 language material sheets Body is the result of syntactic analysis, and CTB5.0 is converted by way of canonical is matched, and is changed into functional component label Form, subject, predicate, object, the adverbial modifier, attribute, complement, the head functional component of functional component label including sentence, with And the hypotaxis of sentence;Functional component label in sentence with functional component label form is mislabeled or the carrying out of spill tag is repaiied Just, revised language material is obtained;
Directional information will be added between the Chinese character of revised language material inside, generate the syntax tree of Chinese character granularity, as syntax Each node increases directional information in tree, used as training data A.
Direction has three kinds:Left (l), right (r), (c) arranged side by side, represent the semantic node of core in two child nodes of expression respectively It is left child node, right child node and two status identical situations of child node.Such as, word:Science, left child node is Section, right node is to learn, and they are coordinations, and mark here is mended in simple, and this relation is not sentence;
The syntax tree of syntactic analysis and generation Chinese character granularity is instructed using the structural information between the Chinese character of word inside, We are labeled to the relation between the Chinese character of word inside, are that each node increased " direction " information.
Other steps and parameter are identical with specific embodiment one.
Specific embodiment three:Present embodiment from unlike specific embodiment one or two:Sentence in the step 2 The analysis process of method functional component analyzer is;
Each sentence inside data A once enters enqueue, whole functional component analysis process is considered into a series of The process of state transfer, each state is made up of a stack and a queue, in stack the in store syntactic function for having generated into Divide tree fragment (part in a syntactic function composition tree), in store still untreated Chinese character in queue;
Under original state, stack is sky, and the number of element is identical with the number of Chinese character in sentence in queue;
The action of each state transfer is selected according to average perceived device in the set of actions for pre-defining,
The set of actions for defining be shift-in-division, shift-in-attachment, reduction-unitary, reduction-binary, reduction-word, Reduction-sub-word, pause, termination, average perceived device search for plan by calculating the score that each is acted under current state using post Slightly selected;
Average perceived device acts the power for being scored at characteristic vector and average perceived device by calculating under current state each It is worth the dot product of vector, the feature templates defined according to characteristic vector carry out characteristic vector pickup and obtain to Chinese sentence to be detected Arrive, general architectural feature template is as follows:
The architectural feature template related to Chinese character is as follows:
It is as follows that syntactic function component analyser performs the character string feature used when shift-in-division is acted
It is as follows that syntactic function component analyser performs the character string feature used when shift-in-attachment is acted
z-1.z0 z-1.z0.t-1 z0.y-1 start(ω-1).z0.t-1
It is as follows that syntactic function component analyser performs the character string feature used when reduction-word is acted
Under final state, queue is sky, and it is the root node of syntactic function composition tree there was only unique IP, IP in stack, in instruction Practice and Chinese sentence functional component analysis model C obtained after terminating, decoding obtains a complete syntactic function composition tree after terminating, Such as Fig. 2.
Whole Chinese sentence functional component analysis process mainly the treatment including training corpus, the writing of training program, The parameter selection of training pattern.Training corpus treatment i.e. correct corpus in itself exist analysis marking error and will Corpus changes into the form based on word granular information.The key component of training program is feature extraction and average perceived device reality It is existing.The parameter selection of training pattern mainly includes iteration wheel number.
Average perceived device is, to the Decision Classfication for acting, to use average perceived device principle, averagely under a certain state Perceptron strategy can avoid the generation of over-fitting to a certain extent.If iteration always takes turns number for T, the index for often taking turns iteration is t, Wherein 0<t<T+1, the sentence sum in corpus is N, and the index of sentence is n, wherein 0<n<N+1.If during t wheel iteration, place N-th is managed afterwards, the weights of model are wt,n, then the weights of the model that traditional average perceived device Algorithm for Training is obtained are wT,N
This weights can cause that model obtains precision of prediction higher on training set, but easily cause over-fitting and show As so that precision of prediction of the model on test set be not high.Average perceived device strategy does not use w to prevent over-fittingT,N As final weights, but useAs the weights of model.Average perceived device algorithm is as follows
Other steps and parameter are identical with specific embodiment one or two.
Specific embodiment four:Unlike one of present embodiment and specific embodiment one to three:The step 3 It is middle according to Chinese sentence functional component analysis model C to data (pure Chinese language text) (People's Net obtain on news, editorial 10000) functional component analysis is carried out, the sentence with functional component label is obtained, the mistake to generally existing is modified, Sentence with functional component label is modified, revised language material is obtained;Revised language material is changed into based on word The form of granularity, as training data B, is combined training data A with training data B as final training data;Specifically Process is:
According to Chinese sentence functional component analysis model C to data (pure Chinese language text) (People's Net obtain on news, 10000 of editorial) functional component analysis is carried out, the sentence with functional component label is obtained, the mistake to generally existing is entered Row amendment, functional component label includes subject, predicate, object, the adverbial modifier, attribute, complement, the head functional component of sentence, with And the hypotaxis of sentence;Functional component in functional component label is mislabeled or spill tag is modified, obtain revised language Material;
Directional information will be added between the Chinese character of revised language material inside, generate the syntax tree of Chinese character granularity, as syntax Each node increases directional information in tree, used as training data B;
Direction has three kinds:Left (l), right (r), (c) arranged side by side, represent the semantic node of core in two child nodes of expression respectively It is left child node, right child node and two status identical situations of child node.
The syntax tree of syntactic analysis and generation Chinese character granularity is instructed using the structural information between the Chinese character of word inside, We are labeled to the relation between the Chinese character of word inside, are that each node increased " direction " information.
Training data A is added as final training data with training data B.
Other steps and parameter are identical with one of specific embodiment one to three.
Specific embodiment five:Unlike one of present embodiment and specific embodiment one to four:The step 4 Middle be trained final training data input syntactic function component analyser obtains Chinese sentence functional component analysis model D, is tested Chinese sentence to be tested (500 sentences) using Chinese sentence functional component analysis model D, obtains test knot Really;Detailed process is:
Whole functional component analysis process is considered into a series of process of state transfers, syntactic function constituent analysis is obtained Device, by being specially that final training data input syntactic function component analyser is trained:
Each state is made up of a stack and a queue, the in store syntactic function composition tree fragment for having generated in stack (part in a syntactic function composition tree), in store still untreated Chinese character in queue;
Under original state, stack is sky, and the number of element is identical with the number of Chinese character in sentence in queue;
The action of each state transfer is selected according to average perceived device in the set of actions for pre-defining, and is defined Set of actions be shift-in-division, shift-in-attachment, reduction unitary, reduction-binary, reduction-word, reduction-sub-word, pause, Termination, average perceived device is selected by calculating the score that each is acted under current state using post search strategy;
Under final state, queue is sky, and it is the root node of syntactic function composition tree there was only unique IP, IP in stack, in instruction Practice and Chinese sentence functional component analysis model D is obtained after terminating, decoding obtains a complete syntactic function composition tree after terminating.
Other steps and parameter are identical with one of specific embodiment one to four.
Beneficial effects of the present invention are verified using following examples:
Embodiment one:
A kind of Chinese sentence functional component analysis method of the present embodiment is specifically to be prepared according to following steps:
(1) training corpus
CTB (Binzhou treebank) more than 13000 sentences and People's Net obtain on news, 10000 of editorial;It is processed to Into the form of word granularity.
(2) training process
Initial model 1 is trained using CTB language materials;Parse is carried out using 10000 new sentences of initial model 1 pair, sentence is obtained Method functional component result, also serves as training corpus;With reference to two parts training corpus, training pattern 2 again.
(3) test set
500 sentences different from training corpus are randomly selected, by after the model parse for training, carrying out artificial school It is right, it is ensured that the accuracy of test set.
The experimental result on 500 test sets after calibration is as shown in the table:
F=2P*Q/ (P+Q).
The present invention can also have other various embodiments, in the case of without departing substantially from spirit of the invention and its essence, this area Technical staff works as can make various corresponding changes and deformation according to the present invention, but these corresponding changes and deformation should all belong to The protection domain of appended claims of the invention.

Claims (5)

1. a kind of Chinese sentence functional component analysis method, it is characterised in that:A kind of Chinese sentence functional component analysis method tool Body process is:
Step one, training corpus is processed, wherein training corpus uses CTB5.0, by way of canonical is matched pair CTB5.0 is converted, and changes into the form with functional component label, and the sentence with functional component label form is carried out Amendment, obtains revised language material;Language material after being modified changes into the form based on word granularity, used as training data A;
CTB5.0 is Chinese Binzhou treebank;
Step 2, a series of process that whole functional component analysis process is considered into state transfers, obtain syntactic function composition Analyzer, training data A input syntactic function component analysers is trained and obtains Chinese sentence functional component analysis model C;
Step 3, pure Chinese language text data are processed according to Chinese sentence functional component analysis model C, obtained with functional The sentence of composition label, is modified to the sentence with functional component label, obtains revised language material;After being modified Language material change into the form based on word granularity, as training data B, be combined training data A with training data B as most Whole training data;
Step 4, final training data input syntactic function component analyser is trained obtains Chinese sentence functional component Analysis model D, is tested Chinese sentence to be tested using Chinese sentence functional component analysis model D, obtains test result.
2. a kind of Chinese sentence functional component analysis method according to claim 1, it is characterised in that:It is right in the step one Training corpus is processed, and wherein training corpus uses CTB5.0, and CTB5.0 is converted by way of canonical is matched, and is turned Form of the chemical conversion with functional component label, is modified to the sentence with functional component label form, obtains revised Language material;Language material after being modified changes into the form based on word granularity, used as training data A;Detailed process is:
Training corpus is processed, wherein training corpus uses CTB5.0, CTB5.0 is carried out by way of canonical is matched Conversion, changes into the form with functional component label, functional component label include the subject of sentence, predicate, object, the adverbial modifier, Attribute, complement, head functional component;Functional component label in sentence with functional component label form is mislabeled or spill tag Be modified, obtain revised language material;
Directional information will be added between the Chinese character of revised language material inside, the syntax tree of Chinese character granularity is generated, as training data A。
3. a kind of Chinese sentence functional component analysis method according to claim 2, it is characterised in that:Sentence in the step 2 The analysis process of method functional component analyzer is;
Each state is made up of a stack and a queue, the in store syntactic function composition tree fragment for having generated, team in stack In store still untreated Chinese character in row;
Under original state, stack is sky, and the number of element is identical with the number of Chinese character in sentence in queue;
The action of each state transfer is selected according to average perceived device in the set of actions for pre-defining,
The set of actions for defining be shift-in-division, shift-in-attachment, reduction-unitary, reduction-binary, reduction-word, reduction- Sub-word, pause, termination, average perceived device are entered by calculating the score that each is acted under current state using post search strategy Row selection;
Average perceived device by calculate under current state each act be scored at the weights of characteristic vector and average perceived device to The dot product of amount, the feature templates defined according to characteristic vector carry out characteristic vector pickup and obtain to Chinese sentence to be detected 's;
Under final state, queue is sky, and it is the root node of syntactic function composition tree there was only unique IP, IP in stack, in training eventually Chinese sentence functional component analysis model C is obtained after only, decoding obtains a complete syntactic function composition tree after terminating.
4. a kind of Chinese sentence functional component analysis method according to claim 3, it is characterised in that:Root in the step 3 Functional component analysis is carried out to pure Chinese language text data according to Chinese sentence functional component analysis model C, is obtained with functional component The sentence of label, is modified to the sentence with functional component label, obtains revised language material;Revised language material is turned Form of the chemical conversion based on word granularity, as training data B, is combined training data A with training data B as final training Data;Detailed process is:
Functional component analysis is carried out to the pure Chinese language text of data according to Chinese sentence functional component analysis model C, is obtained with active The sentence of energy composition label, functional component label includes subject, predicate, object, the adverbial modifier, attribute, complement, the head work(of sentence Can composition;Functional component in functional component label is mislabeled or spill tag is modified, obtain revised language material;After correcting Language material inside Chinese character between add directional information, the syntax tree of Chinese character granularity is generated, as training data B;By training data A It is added as final training data with training data B.
5. a kind of Chinese sentence functional component analysis method according to claim 4, it is characterised in that:Will in the step 4 Final training data input syntactic function component analyser is trained and obtains Chinese sentence functional component analysis model D, adopts Chinese sentence to be tested is tested with Chinese sentence functional component analysis model D, obtains test result;Detailed process is:
Whole functional component analysis process is considered into a series of process of state transfers, syntactic function component analyser is obtained, By being specially that final training data input syntactic function component analyser is trained:
Each state is made up of a stack and a queue, the in store syntactic function composition tree fragment for having generated, team in stack In store still untreated Chinese character in row;
Under original state, stack is sky, and the number of element is identical with the number of Chinese character in sentence in queue;
The action of each state transfer is selected according to average perceived device in the set of actions for pre-defining, the action for defining Collection is combined into shift-in-division, shift-in-attachment, reduction unitary, reduction-binary, reduction-word, reduction-sub-word, pause, termination, Average perceived device is selected by calculating the score that each is acted under current state using post search strategy;
Under final state, queue is sky, and it is the root node of syntactic function composition tree there was only unique IP, IP in stack, in training eventually Chinese sentence functional component analysis model D is obtained after only, decoding obtains a complete syntactic function composition tree after terminating.
CN201710077125.8A 2017-02-13 2017-02-13 Method for analyzing functional components of Chinese sentences Active CN106844348B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710077125.8A CN106844348B (en) 2017-02-13 2017-02-13 Method for analyzing functional components of Chinese sentences

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710077125.8A CN106844348B (en) 2017-02-13 2017-02-13 Method for analyzing functional components of Chinese sentences

Publications (2)

Publication Number Publication Date
CN106844348A true CN106844348A (en) 2017-06-13
CN106844348B CN106844348B (en) 2020-01-17

Family

ID=59127414

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710077125.8A Active CN106844348B (en) 2017-02-13 2017-02-13 Method for analyzing functional components of Chinese sentences

Country Status (1)

Country Link
CN (1) CN106844348B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109344244A (en) * 2018-10-29 2019-02-15 山东大学 A kind of the neural network relationship classification method and its realization system of fusion discrimination information
CN109460552A (en) * 2018-10-29 2019-03-12 朱丽莉 Rule-based and corpus Chinese faulty wording automatic testing method and equipment
WO2019095899A1 (en) * 2017-11-17 2019-05-23 中兴通讯股份有限公司 Material annotation method and apparatus, terminal, and computer readable storage medium
CN110428817A (en) * 2019-08-06 2019-11-08 上海上班族电子商务有限公司 A kind of garbage classification speech recognition system based on artificial intelligence
CN110472040A (en) * 2019-06-26 2019-11-19 平安科技(深圳)有限公司 Extracting method and device, storage medium, the computer equipment of evaluation information
CN111523302A (en) * 2020-07-06 2020-08-11 成都晓多科技有限公司 Syntax analysis method and device, storage medium and electronic equipment
CN112528641A (en) * 2020-12-10 2021-03-19 北京百度网讯科技有限公司 Method and device for establishing information extraction model, electronic equipment and readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101013421A (en) * 2007-02-02 2007-08-08 清华大学 Rule-based automatic analysis method of Chinese basic block
CN101021842A (en) * 2007-03-09 2007-08-22 清华大学 Automatic learning and extending evolution handling method for Chinese basic block descriptive rule
US20140229159A1 (en) * 2013-02-11 2014-08-14 Appsense Limited Document summarization using noun and sentence ranking
JP2015018146A (en) * 2013-07-12 2015-01-29 株式会社Nttドコモ Function management system and function management method
JP2016110452A (en) * 2014-12-08 2016-06-20 Kddi株式会社 Program, device, and method for updating dictionary of words for which psychological states should be extracted

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101013421A (en) * 2007-02-02 2007-08-08 清华大学 Rule-based automatic analysis method of Chinese basic block
CN101021842A (en) * 2007-03-09 2007-08-22 清华大学 Automatic learning and extending evolution handling method for Chinese basic block descriptive rule
US20140229159A1 (en) * 2013-02-11 2014-08-14 Appsense Limited Document summarization using noun and sentence ranking
JP2015018146A (en) * 2013-07-12 2015-01-29 株式会社Nttドコモ Function management system and function management method
JP2016110452A (en) * 2014-12-08 2016-06-20 Kddi株式会社 Program, device, and method for updating dictionary of words for which psychological states should be extracted

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019095899A1 (en) * 2017-11-17 2019-05-23 中兴通讯股份有限公司 Material annotation method and apparatus, terminal, and computer readable storage medium
CN109344244A (en) * 2018-10-29 2019-02-15 山东大学 A kind of the neural network relationship classification method and its realization system of fusion discrimination information
CN109460552A (en) * 2018-10-29 2019-03-12 朱丽莉 Rule-based and corpus Chinese faulty wording automatic testing method and equipment
CN109344244B (en) * 2018-10-29 2019-11-08 山东大学 A kind of the neural network relationship classification method and its realization system of fusion discrimination information
CN110472040A (en) * 2019-06-26 2019-11-19 平安科技(深圳)有限公司 Extracting method and device, storage medium, the computer equipment of evaluation information
CN110428817A (en) * 2019-08-06 2019-11-08 上海上班族电子商务有限公司 A kind of garbage classification speech recognition system based on artificial intelligence
CN111523302A (en) * 2020-07-06 2020-08-11 成都晓多科技有限公司 Syntax analysis method and device, storage medium and electronic equipment
CN112528641A (en) * 2020-12-10 2021-03-19 北京百度网讯科技有限公司 Method and device for establishing information extraction model, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
CN106844348B (en) 2020-01-17

Similar Documents

Publication Publication Date Title
CN106844348A (en) A kind of Chinese sentence functional component analysis method
Gardent et al. Creating training corpora for nlg micro-planning
Wilson et al. Recognizing contextual polarity in phrase-level sentiment analysis
CN106096664B (en) A kind of sentiment analysis method based on social network data
CN107291795A (en) A kind of dynamic word insertion of combination and the file classification method of part-of-speech tagging
CN106294322A (en) A kind of Chinese based on LSTM zero reference resolution method
Suleiman et al. The use of hidden Markov model in natural ARABIC language processing: a survey
Hoang et al. Incorporating side information into recurrent neural network language models
CN106126620A (en) Method of Chinese Text Automatic Abstraction based on machine learning
CN107122349A (en) A kind of feature word of text extracting method based on word2vec LDA models
CN103365838A (en) Method for automatically correcting syntax errors in English composition based on multivariate features
Singhal et al. Borrow a little from your rich cousin: Using embeddings and polarities of english words for multilingual sentiment classification
CN106446147A (en) Emotion analysis method based on structuring features
Jiang et al. Hierarchical macro discourse parsing based on topic segmentation
Chen et al. Chinese Weibo sentiment analysis based on character embedding with dual-channel convolutional neural network
Zhu et al. Machine Learning‐Based Grammar Error Detection Method in English Composition
CN114781376A (en) News text abstract generation method based on deep learning
Zhao Research and design of automatic scoring algorithm for English composition based on machine learning
Dang Investigations into the role of lexical semantics in word sense disambiguation
Antony et al. A survey of advanced methods for efficient text summarization
Li et al. Community question answering entity linking via leveraging auxiliary data
CN107894977A (en) With reference to the Vietnamese part of speech labeling method of conversion of parts of speech part of speech disambiguation model and dictionary
Takala Word Embeddings for Morphologically Rich Languages.
CN113128199B (en) Word vector generation method based on pre-training language model and multiple word information embedding
Chakkarwar et al. A Review on BERT and Its Implementation in Various NLP Tasks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20210114

Address after: Building 9, accelerator, 14955 Zhongyuan Avenue, Songbei District, Harbin City, Heilongjiang Province

Patentee after: INDUSTRIAL TECHNOLOGY Research Institute OF HEILONGJIANG PROVINCE

Address before: 150001 No. 92 West straight street, Nangang District, Heilongjiang, Harbin

Patentee before: HARBIN INSTITUTE OF TECHNOLOGY

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230412

Address after: 150027 Room 412, Unit 1, No. 14955, Zhongyuan Avenue, Building 9, Innovation and Entrepreneurship Plaza, Science and Technology Innovation City, Harbin Hi tech Industrial Development Zone, Heilongjiang Province

Patentee after: Heilongjiang Industrial Technology Research Institute Asset Management Co.,Ltd.

Address before: Building 9, accelerator, 14955 Zhongyuan Avenue, Songbei District, Harbin City, Heilongjiang Province

Patentee before: INDUSTRIAL TECHNOLOGY Research Institute OF HEILONGJIANG PROVINCE