CN106844333A - A kind of statement analytical method and system based on semantic and syntactic structure - Google Patents
A kind of statement analytical method and system based on semantic and syntactic structure Download PDFInfo
- Publication number
- CN106844333A CN106844333A CN201611183668.XA CN201611183668A CN106844333A CN 106844333 A CN106844333 A CN 106844333A CN 201611183668 A CN201611183668 A CN 201611183668A CN 106844333 A CN106844333 A CN 106844333A
- Authority
- CN
- China
- Prior art keywords
- language material
- semantic
- model
- training
- middle trained
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Abstract
The invention provides a kind of statement analytical method and system based on semantic and syntactic structure, wherein inventive method is comprised the following steps:Step 1:Input original sentence;Step 2:Initial training language material is produced using the original sentence;Step 3:The training corpus after artificial correction is obtained, middle trained language material is defined as;Step 4:The correctness of the middle trained corpus labeling is verified, if the mark of the middle trained language material is correct, the middle trained language material final training corpus is defined as, and enter step 5;The otherwise circulation of return to step 3 is performed;Step 5:Bring the final training corpus into training pattern;Training corpus needed for producing user with semiautomatic fashion, improves the efficiency for producing training corpus;Training pattern, the correctness of training corpus can be come using customized training corpus;The result for visualizing semantic character labeling is provided;Can be in each model of continuous training under same system, so that lifting system overall efficiency.
Description
Technical field
The invention belongs to computer application field, and in particular to a kind of statement analytical method based on semantic and syntactic structure
And system.
Background technology
It is a large amount of in natural language to have to the description of all kinds of events in human lives that (small to one action, goes through greatly to one
Historical event part), while the time, place, the role for participating in, the content etc. the relation between state and event that are also produced including event
With the description of feature.With the rise of internet correlation technique, people more and more depend on network to obtain information, and mutual
The information of networking shows magnanimity, increases severely and the characteristic such as redundancy, in order to be able to more preferable monitoring and uses information therein, allows machine
The event in text can be analyzed, event-oriented Sentence analysis research seems more and more important.Sentence analysis refer to just to language
Each composition function and semanteme in sentence are analyzed, and will be input into the linear word order in sentence between word, become one it is non-linear
Data structure.
Current main theory of the natural language processing field on Sentence analysis includes:Interdependent syntax, Chomsky development
Formal grammar theory be phrase structure grammar and its extension, such as:Lexical-Functional Grammar, functional unification grammar, broad sense phrase knot
The phrase structure grammar that structure grammer, centre word drive.The thought of these methods is built upon on the knowledge of grammar of English basis,
The composition in sentence is not divided into event and event role and the relation between them is analyzed from the angle for understanding event.Mesh
The preceding research for event has focused largely on from text identification and extraction event and event role extracts, based on event from
Dynamic digest and text such as automatically generate at the aspect, and these are studied all in the urgent need to the Sentence analysis side based on event structure of the invention
The support of method.
Semantic character labeling is a core technology in natural language processing.Traditionally semantic character labeling is using instruction
Practice part-of-speech tagging model, interdependent syntactic analysis model etc. to reach the semantic role in parsing sentence.However, these models are point
Dissipate and non-be present in same system.Additionally, existing semantic character labeling is only capable of providing the system for having trained completion, it is impossible to accord with
Close the different demand of user to provide different types of training corpus, cannot also allow user voluntarily persistently to lift efficiency.
The content of the invention
For defect of the prior art, invention system combines various models, reaches independent generation and trains language
Material, and can independently repair each model and lift the efficiency of semantic character labeling.
A kind of statement analytical method based on semantic and syntactic structure, it is it is critical that comprise the following steps:
Step 1:Input original sentence;
Step 2:Initial training language material is produced using the original sentence;
Step 3:The training corpus after artificial correction is obtained, middle trained language material is defined as;
Step 4:The correctness of the middle trained corpus labeling is verified, if the mark of the middle trained language material is just
Really, the middle trained language material is defined as final training corpus, and enters step 5;The otherwise circulation of return to step 3 is performed;
Step 5:Bring the final training corpus into training pattern.
The inventive method principle:The present invention allows the user independently to produce training corpus, and can independently repair each
Model lifts the efficiency of semantic character labeling.When user is estimated, and any one sentence is used as into training corpus, can carry out
Procedure below:Original sentence is first input into current Sentence analysis system, preliminary training corpus is produced;Then by with language
The expert for learning background is manually marked and is changed, and verifies the correctness of corpus labeling, and artificial mark can be returned if wrong
The step of note;Final training corpus after confirmation can again be input into system, and the model to be trained may be selected, for example:Word
Property marking model, interdependent syntactic analysis model, semantic character labeling model, and then lifting system overall efficiency.
More preferably to realize the present invention, may further be:Original sentence produces concretely comprising the following steps for initial training language material:
Step 2.1:Participle;
Step 2.2:Part-of-speech tagging;
Step 2.3:Interdependent syntactic analysis;
Step 2.4:Semantic role is analyzed.
Optionally:In the step 3, the mark of the initial training language material is carried out by the expert with linguistics background
Manual amendment and correction.
Optionally:The checking middle trained corpus labeling correctness in the step 4 is concretely comprised the following steps:
Step 11:Whether information box bit quantity is correct in judging the middle trained language material;It is then to carry out step 12;It is no,
The then circulation of return to step 3 is performed;
Step 12:Whether judge in the middle trained language material comprising verb;It is then to carry out step 13;It is no, then return to step
Rapid 3 circulation is performed;
Step 13:Whether verb has corresponding semantic role to mark in judging the middle trained language material;It is then to carry out
Step 14;No, then the circulation of return to step 3 is performed;
Step 14:Whether judge the dependence of each participle in the middle trained language material has correct link;It is then to enter
Row step 5;No, then the circulation of return to step 3 is performed.
Optionally:The training pattern is part-of-speech tagging model, or is interdependent syntactic analysis model, or is semantic angle
Color marking model.
Sentence analysis system based on the inventive method, including Sentence analysis module, for original sentence generation is preliminary
Training corpus;
Language material authentication module, the correctness for verifying the middle trained corpus labeling.
Optionally:In the Sentence analysis module containing participle model, part-of-speech tagging model, interdependent syntactic analysis model and
Semantic character labeling model.
Optionally:Contain information box digit interrogation model, verb interrogation model, semantic role in the language material authentication module
Tag query model and dependence checking model.
Beneficial effects of the present invention:Training corpus needed for producing user with semiautomatic fashion, improves and produces training language
The efficiency of material;Training pattern, the correctness of training corpus can be come using customized training corpus;There is provided and visualize semantic role
The result of mark;Can be in each model of continuous training under same system, such as:Part-of-speech tagging and interdependent syntactic analysis model, so that
Lifting system overall efficiency.
Brief description of the drawings
Fig. 1 shows the flow chart of the inventive method;
Fig. 2 shows implementation process flow chart of the present invention.
Specific embodiment
The embodiment of technical solution of the present invention is described in detail below in conjunction with accompanying drawing.Following examples are only used for
Technical scheme is clearly illustrated, therefore is intended only as example, and protection of the invention can not be limited with this
Scope.
As depicted in figs. 1 and 2:A kind of statement analytical method based on semantic and syntactic structure, comprises the following steps:
Step S101:Input original sentence;
Step S102:Initial training language material is produced using the original sentence;
Step S103:The training corpus after artificial correction is obtained, middle trained language material is defined as;
Step S104:The correctness of the middle trained corpus labeling is verified, if the mark of the middle trained language material
It is correct, the middle trained language material is defined as final training corpus, and enter step S105;Otherwise return to step 3 is circulated
Perform;
Step S105:Bring the final training corpus into training pattern.
Wherein, original sentence produces concretely comprising the following steps for initial training language material:
Step 2.1:Participle;
Step 2.2:Part-of-speech tagging;
Step 2.3:Interdependent syntactic analysis;
Step 2.4:Semantic role is analyzed.
In addition, verifying concretely comprising the following steps for the middle trained corpus labeling correctness:
Step 11:Whether information box bit quantity is correct in judging the middle trained language material;It is then to carry out step 12;It is no,
Then return to step S103 circulations are performed;
Step 12:Whether judge in the middle trained language material comprising verb;It is then to carry out step 13;It is no, then return to step
Rapid S103 circulations are performed;
Step 13:Whether verb has corresponding semantic role to mark in judging the middle trained language material;It is then to carry out
Step 14;No, then return to step S103 circulations are performed;
Step 14:Whether judge the dependence of each participle in the middle trained language material has correct link;It is then to enter
Row step 5;No, then return to step S103 circulations are performed.
The training pattern is part-of-speech tagging model, or is interdependent syntactic analysis model, or is semantic character labeling
Model.
In addition, the Sentence analysis system based on the inventive method, including Sentence analysis module, for original sentence to be generated
Initial training language material;
Language material authentication module, the correctness for verifying the middle trained corpus labeling;Wherein, the Sentence analysis mould
Contain participle model, part-of-speech tagging model, interdependent syntactic analysis model and semantic character labeling model in block;
Looked into containing information box digit interrogation model, verb interrogation model, semantic role mark in the language material authentication module
Ask model and dependence checking model.
The inventive method is implemented:By taking sentence " I likes playing basketball " as an example:
By initial one read statement analysis system, current system is obtained to the analysis of sentence:1st, participle:I/like/beat/
Basketball;2nd, part-of-speech tagging:My r/ likes v/ to beat v/ basketballs n;3rd, interdependent syntactic analysis:My 2SBV/ likes 0HED/ to make 2VOB/ baskets
Ball 3VOB;4th, semantic role analysis:Agent (like, I) agent (beat, I) patient (beating, basketball) ATP (likes, basket
Ball) AFT (beating, basketball);This analysis is initial training language material.Will be via manually making corrections its analysing content to optimize overall system
System;Wherein, the expert with linguistics background carries out manual amendment and correction to the mark of the initial training language material.
Initial training language material is transferred into artificial correction, ATP (liking, basketball) is changed to ATP (liking, play basketball):AFT (beat,
Basketball) it is changed to AFT (beating, like);
Language material checking is carried out to the training corpus after correction, checks whether wrong on annotation formatting;It is errorless, then use the language
Material training Sentence analysis system, so as to realize reaching the effect of optimization total system.
Finally it should be noted that:Various embodiments above is merely illustrative of the technical solution of the present invention, rather than its limitations;To the greatest extent
Pipe has been described in detail with reference to foregoing embodiments to the present invention, it will be understood by those within the art that:Its according to
The technical scheme described in foregoing embodiments can so be modified, or which part or all technical characteristic are entered
Row equivalent;And these modifications or replacement, the essence of appropriate technical solution is departed from various embodiments of the present invention technology
The scope of scheme, it all should cover in the middle of the scope of claim of the invention and specification.
Claims (8)
1. a kind of statement analytical method based on semantic and syntactic structure, it is characterised in that comprise the following steps:
Step 1:Input original sentence;
Step 2:Initial training language material is produced using the original sentence;
Step 3:The training corpus after artificial correction is obtained, middle trained language material is defined as;
Step 4:The correctness of the middle trained corpus labeling is verified, if the mark of the middle trained language material is correct,
The middle trained language material is defined as final training corpus, and enters step 5;The otherwise circulation of return to step 3 is performed;
Step 5:Bring the final training corpus into training pattern.
2. the statement analytical method based on semantic and syntactic structure according to claim 1, it is characterised in that original sentence
Produce concretely comprising the following steps for initial training language material:
Step 2.1:Participle;
Step 2.2:Part-of-speech tagging;
Step 2.3:Interdependent syntactic analysis;
Step 2.4:Semantic role is analyzed.
3. the statement analytical method based on semantic and syntactic structure according to claim 1, it is characterised in that:The step
In 3, manual amendment and correction are carried out to the mark of the initial training language material by the expert with linguistics background.
4. the statement analytical method based on semantic and syntactic structure according to claim 1, it is characterised in that:The step
The checking middle trained corpus labeling correctness in 4 is concretely comprised the following steps:
Step 11:Whether information box bit quantity is correct in judging the middle trained language material;It is then to carry out step 12;It is no, then return
Step 3 circulation is returned to perform;
Step 12:Whether judge in the middle trained language material comprising verb;It is then to carry out step 13;It is no, then return to step 3
Circulation is performed;
Step 13:Whether verb has corresponding semantic role to mark in judging the middle trained language material;It is then to carry out step
14;No, then the circulation of return to step 3 is performed;
Step 14:Whether judge the dependence of each participle in the middle trained language material has correct link;It is then to be walked
Rapid 5;No, then the circulation of return to step 3 is performed.
5. the statement analytical method based on semantic and syntactic structure according to claim 1, it is characterised in that:The training
Model is part-of-speech tagging model, or is interdependent syntactic analysis model, or is semantic character labeling model.
6. the Sentence analysis system of claim 1 methods described is based on, it is characterised in that:Including Sentence analysis module, for inciting somebody to action
Original sentence generates initial training language material;
Language material authentication module, the correctness for verifying the middle trained corpus labeling.
7. Sentence analysis system according to claim 6, it is characterised in that:Contain participle mould in the Sentence analysis module
Type, part-of-speech tagging model, interdependent syntactic analysis model and semantic character labeling model.
8. Sentence analysis system according to claim 7, it is characterised in that:Contain information box in the language material authentication module
Digit interrogation model, verb interrogation model, semantic role tag query model and dependence checking model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611183668.XA CN106844333A (en) | 2016-12-20 | 2016-12-20 | A kind of statement analytical method and system based on semantic and syntactic structure |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611183668.XA CN106844333A (en) | 2016-12-20 | 2016-12-20 | A kind of statement analytical method and system based on semantic and syntactic structure |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106844333A true CN106844333A (en) | 2017-06-13 |
Family
ID=59140632
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611183668.XA Pending CN106844333A (en) | 2016-12-20 | 2016-12-20 | A kind of statement analytical method and system based on semantic and syntactic structure |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106844333A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111566637A (en) * | 2018-02-01 | 2020-08-21 | 国际商业机器公司 | Dynamically building and configuring a session proxy learning model |
CN114462387A (en) * | 2022-02-10 | 2022-05-10 | 北京易聊科技有限公司 | Sentence pattern automatic discrimination method under no-label corpus |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103150381A (en) * | 2013-03-14 | 2013-06-12 | 北京理工大学 | High-precision Chinese predicate identification method |
CN105740235A (en) * | 2016-01-29 | 2016-07-06 | 昆明理工大学 | Phrase tree to dependency tree transformation method capable of combining Vietnamese grammatical features |
-
2016
- 2016-12-20 CN CN201611183668.XA patent/CN106844333A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103150381A (en) * | 2013-03-14 | 2013-06-12 | 北京理工大学 | High-precision Chinese predicate identification method |
CN105740235A (en) * | 2016-01-29 | 2016-07-06 | 昆明理工大学 | Phrase tree to dependency tree transformation method capable of combining Vietnamese grammatical features |
Non-Patent Citations (1)
Title |
---|
魏莉等: "汉语句法树库一致性验证方法研究", 《广西师范大学学报:自然科学版》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111566637A (en) * | 2018-02-01 | 2020-08-21 | 国际商业机器公司 | Dynamically building and configuring a session proxy learning model |
US11886823B2 (en) | 2018-02-01 | 2024-01-30 | International Business Machines Corporation | Dynamically constructing and configuring a conversational agent learning model |
CN114462387A (en) * | 2022-02-10 | 2022-05-10 | 北京易聊科技有限公司 | Sentence pattern automatic discrimination method under no-label corpus |
CN114462387B (en) * | 2022-02-10 | 2022-09-02 | 北京易聊科技有限公司 | Sentence pattern automatic discrimination method under no-label corpus |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Hu et al. | Parabank: Monolingual bitext generation and sentential paraphrasing via lexically-constrained neural machine translation | |
Siddharthan et al. | Hybrid text simplification using synchronous dependency grammars with hand-written and automatically harvested rules | |
Ciravegna et al. | User-system cooperation in document annotation based on information extraction | |
Leopold et al. | Generating natural language texts from business process models | |
CN102708098B (en) | Dependency coherence constraint-based automatic alignment method for bilingual words | |
KR20190113965A (en) | Systems and methods for using machine learning and rule-based algorithms to create patent specifications based on human-provided patent claims such that patent specifications are created without human intervention | |
Rudzewitz et al. | Generating feedback for English foreign language exercises | |
Zong | Research on the relations between machine translation and human translation | |
CN106156013A (en) | The two-part machine translation method that a kind of regular collocation type phrase is preferential | |
US10885274B1 (en) | Platform for administering and evaluating narrative essay examinations | |
Hana et al. | Building a learner corpus | |
CN106844333A (en) | A kind of statement analytical method and system based on semantic and syntactic structure | |
CN112836525B (en) | Machine translation system based on man-machine interaction and automatic optimization method thereof | |
Putri et al. | Types of errors found in Google Translation: A model of MT evaluation | |
Zacharis et al. | AiCEF: an AI-assisted cyber exercise content generation framework using named entity recognition | |
Lokman et al. | Extension and prerequisite: An algorithm to enable relations between responses in chatbot technology | |
Ciravegna et al. | Timely and Non-Intrusive Active Document Annotation via Adaptive Information Extraction. | |
Douglas et al. | Controlled language support for perkins approved clear english (pace) | |
Jezek et al. | Capturing Coercions in Texts: a First Annotation Exercise. | |
Frigière | PEDAGOGICAL USE OF NOOJ DEALING WITH FRENCH AS A FOREIGN LANGUAGE JULIA FRIGIÈRE AND SANDRINE FUENTES | |
CN110955768A (en) | Question-answering system answer generating method based on syntactic analysis | |
Beale | Documenting endangered languages with Linguist’s Assistant | |
Strik et al. | Practicing syntax in spoken interaction: Automatic detection of syntactic errors in non-native utterances | |
CN116187339B (en) | Automatic composition scoring method based on feature semantic fusion of double-tower model | |
CN109800419A (en) | A kind of game sessions lines generation method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170613 |
|
RJ01 | Rejection of invention patent application after publication |