CN106844333A - A kind of statement analytical method and system based on semantic and syntactic structure - Google Patents

A kind of statement analytical method and system based on semantic and syntactic structure Download PDF

Info

Publication number
CN106844333A
CN106844333A CN201611183668.XA CN201611183668A CN106844333A CN 106844333 A CN106844333 A CN 106844333A CN 201611183668 A CN201611183668 A CN 201611183668A CN 106844333 A CN106844333 A CN 106844333A
Authority
CN
China
Prior art keywords
language material
semantic
model
training
middle trained
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611183668.XA
Other languages
Chinese (zh)
Inventor
简仁贤
梅森傑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intelligent Technology (shanghai) Co Ltd
Original Assignee
Intelligent Technology (shanghai) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intelligent Technology (shanghai) Co Ltd filed Critical Intelligent Technology (shanghai) Co Ltd
Priority to CN201611183668.XA priority Critical patent/CN106844333A/en
Publication of CN106844333A publication Critical patent/CN106844333A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The invention provides a kind of statement analytical method and system based on semantic and syntactic structure, wherein inventive method is comprised the following steps:Step 1:Input original sentence;Step 2:Initial training language material is produced using the original sentence;Step 3:The training corpus after artificial correction is obtained, middle trained language material is defined as;Step 4:The correctness of the middle trained corpus labeling is verified, if the mark of the middle trained language material is correct, the middle trained language material final training corpus is defined as, and enter step 5;The otherwise circulation of return to step 3 is performed;Step 5:Bring the final training corpus into training pattern;Training corpus needed for producing user with semiautomatic fashion, improves the efficiency for producing training corpus;Training pattern, the correctness of training corpus can be come using customized training corpus;The result for visualizing semantic character labeling is provided;Can be in each model of continuous training under same system, so that lifting system overall efficiency.

Description

A kind of statement analytical method and system based on semantic and syntactic structure
Technical field
The invention belongs to computer application field, and in particular to a kind of statement analytical method based on semantic and syntactic structure And system.
Background technology
It is a large amount of in natural language to have to the description of all kinds of events in human lives that (small to one action, goes through greatly to one Historical event part), while the time, place, the role for participating in, the content etc. the relation between state and event that are also produced including event With the description of feature.With the rise of internet correlation technique, people more and more depend on network to obtain information, and mutual The information of networking shows magnanimity, increases severely and the characteristic such as redundancy, in order to be able to more preferable monitoring and uses information therein, allows machine The event in text can be analyzed, event-oriented Sentence analysis research seems more and more important.Sentence analysis refer to just to language Each composition function and semanteme in sentence are analyzed, and will be input into the linear word order in sentence between word, become one it is non-linear Data structure.
Current main theory of the natural language processing field on Sentence analysis includes:Interdependent syntax, Chomsky development Formal grammar theory be phrase structure grammar and its extension, such as:Lexical-Functional Grammar, functional unification grammar, broad sense phrase knot The phrase structure grammar that structure grammer, centre word drive.The thought of these methods is built upon on the knowledge of grammar of English basis, The composition in sentence is not divided into event and event role and the relation between them is analyzed from the angle for understanding event.Mesh The preceding research for event has focused largely on from text identification and extraction event and event role extracts, based on event from Dynamic digest and text such as automatically generate at the aspect, and these are studied all in the urgent need to the Sentence analysis side based on event structure of the invention The support of method.
Semantic character labeling is a core technology in natural language processing.Traditionally semantic character labeling is using instruction Practice part-of-speech tagging model, interdependent syntactic analysis model etc. to reach the semantic role in parsing sentence.However, these models are point Dissipate and non-be present in same system.Additionally, existing semantic character labeling is only capable of providing the system for having trained completion, it is impossible to accord with Close the different demand of user to provide different types of training corpus, cannot also allow user voluntarily persistently to lift efficiency.
The content of the invention
For defect of the prior art, invention system combines various models, reaches independent generation and trains language Material, and can independently repair each model and lift the efficiency of semantic character labeling.
A kind of statement analytical method based on semantic and syntactic structure, it is it is critical that comprise the following steps:
Step 1:Input original sentence;
Step 2:Initial training language material is produced using the original sentence;
Step 3:The training corpus after artificial correction is obtained, middle trained language material is defined as;
Step 4:The correctness of the middle trained corpus labeling is verified, if the mark of the middle trained language material is just Really, the middle trained language material is defined as final training corpus, and enters step 5;The otherwise circulation of return to step 3 is performed;
Step 5:Bring the final training corpus into training pattern.
The inventive method principle:The present invention allows the user independently to produce training corpus, and can independently repair each Model lifts the efficiency of semantic character labeling.When user is estimated, and any one sentence is used as into training corpus, can carry out Procedure below:Original sentence is first input into current Sentence analysis system, preliminary training corpus is produced;Then by with language The expert for learning background is manually marked and is changed, and verifies the correctness of corpus labeling, and artificial mark can be returned if wrong The step of note;Final training corpus after confirmation can again be input into system, and the model to be trained may be selected, for example:Word Property marking model, interdependent syntactic analysis model, semantic character labeling model, and then lifting system overall efficiency.
More preferably to realize the present invention, may further be:Original sentence produces concretely comprising the following steps for initial training language material:
Step 2.1:Participle;
Step 2.2:Part-of-speech tagging;
Step 2.3:Interdependent syntactic analysis;
Step 2.4:Semantic role is analyzed.
Optionally:In the step 3, the mark of the initial training language material is carried out by the expert with linguistics background Manual amendment and correction.
Optionally:The checking middle trained corpus labeling correctness in the step 4 is concretely comprised the following steps:
Step 11:Whether information box bit quantity is correct in judging the middle trained language material;It is then to carry out step 12;It is no, The then circulation of return to step 3 is performed;
Step 12:Whether judge in the middle trained language material comprising verb;It is then to carry out step 13;It is no, then return to step Rapid 3 circulation is performed;
Step 13:Whether verb has corresponding semantic role to mark in judging the middle trained language material;It is then to carry out Step 14;No, then the circulation of return to step 3 is performed;
Step 14:Whether judge the dependence of each participle in the middle trained language material has correct link;It is then to enter Row step 5;No, then the circulation of return to step 3 is performed.
Optionally:The training pattern is part-of-speech tagging model, or is interdependent syntactic analysis model, or is semantic angle Color marking model.
Sentence analysis system based on the inventive method, including Sentence analysis module, for original sentence generation is preliminary Training corpus;
Language material authentication module, the correctness for verifying the middle trained corpus labeling.
Optionally:In the Sentence analysis module containing participle model, part-of-speech tagging model, interdependent syntactic analysis model and Semantic character labeling model.
Optionally:Contain information box digit interrogation model, verb interrogation model, semantic role in the language material authentication module Tag query model and dependence checking model.
Beneficial effects of the present invention:Training corpus needed for producing user with semiautomatic fashion, improves and produces training language The efficiency of material;Training pattern, the correctness of training corpus can be come using customized training corpus;There is provided and visualize semantic role The result of mark;Can be in each model of continuous training under same system, such as:Part-of-speech tagging and interdependent syntactic analysis model, so that Lifting system overall efficiency.
Brief description of the drawings
Fig. 1 shows the flow chart of the inventive method;
Fig. 2 shows implementation process flow chart of the present invention.
Specific embodiment
The embodiment of technical solution of the present invention is described in detail below in conjunction with accompanying drawing.Following examples are only used for Technical scheme is clearly illustrated, therefore is intended only as example, and protection of the invention can not be limited with this Scope.
As depicted in figs. 1 and 2:A kind of statement analytical method based on semantic and syntactic structure, comprises the following steps:
Step S101:Input original sentence;
Step S102:Initial training language material is produced using the original sentence;
Step S103:The training corpus after artificial correction is obtained, middle trained language material is defined as;
Step S104:The correctness of the middle trained corpus labeling is verified, if the mark of the middle trained language material It is correct, the middle trained language material is defined as final training corpus, and enter step S105;Otherwise return to step 3 is circulated Perform;
Step S105:Bring the final training corpus into training pattern.
Wherein, original sentence produces concretely comprising the following steps for initial training language material:
Step 2.1:Participle;
Step 2.2:Part-of-speech tagging;
Step 2.3:Interdependent syntactic analysis;
Step 2.4:Semantic role is analyzed.
In addition, verifying concretely comprising the following steps for the middle trained corpus labeling correctness:
Step 11:Whether information box bit quantity is correct in judging the middle trained language material;It is then to carry out step 12;It is no, Then return to step S103 circulations are performed;
Step 12:Whether judge in the middle trained language material comprising verb;It is then to carry out step 13;It is no, then return to step Rapid S103 circulations are performed;
Step 13:Whether verb has corresponding semantic role to mark in judging the middle trained language material;It is then to carry out Step 14;No, then return to step S103 circulations are performed;
Step 14:Whether judge the dependence of each participle in the middle trained language material has correct link;It is then to enter Row step 5;No, then return to step S103 circulations are performed.
The training pattern is part-of-speech tagging model, or is interdependent syntactic analysis model, or is semantic character labeling Model.
In addition, the Sentence analysis system based on the inventive method, including Sentence analysis module, for original sentence to be generated Initial training language material;
Language material authentication module, the correctness for verifying the middle trained corpus labeling;Wherein, the Sentence analysis mould Contain participle model, part-of-speech tagging model, interdependent syntactic analysis model and semantic character labeling model in block;
Looked into containing information box digit interrogation model, verb interrogation model, semantic role mark in the language material authentication module Ask model and dependence checking model.
The inventive method is implemented:By taking sentence " I likes playing basketball " as an example:
By initial one read statement analysis system, current system is obtained to the analysis of sentence:1st, participle:I/like/beat/ Basketball;2nd, part-of-speech tagging:My r/ likes v/ to beat v/ basketballs n;3rd, interdependent syntactic analysis:My 2SBV/ likes 0HED/ to make 2VOB/ baskets Ball 3VOB;4th, semantic role analysis:Agent (like, I) agent (beat, I) patient (beating, basketball) ATP (likes, basket Ball) AFT (beating, basketball);This analysis is initial training language material.Will be via manually making corrections its analysing content to optimize overall system System;Wherein, the expert with linguistics background carries out manual amendment and correction to the mark of the initial training language material.
Initial training language material is transferred into artificial correction, ATP (liking, basketball) is changed to ATP (liking, play basketball):AFT (beat, Basketball) it is changed to AFT (beating, like);
Language material checking is carried out to the training corpus after correction, checks whether wrong on annotation formatting;It is errorless, then use the language Material training Sentence analysis system, so as to realize reaching the effect of optimization total system.
Finally it should be noted that:Various embodiments above is merely illustrative of the technical solution of the present invention, rather than its limitations;To the greatest extent Pipe has been described in detail with reference to foregoing embodiments to the present invention, it will be understood by those within the art that:Its according to The technical scheme described in foregoing embodiments can so be modified, or which part or all technical characteristic are entered Row equivalent;And these modifications or replacement, the essence of appropriate technical solution is departed from various embodiments of the present invention technology The scope of scheme, it all should cover in the middle of the scope of claim of the invention and specification.

Claims (8)

1. a kind of statement analytical method based on semantic and syntactic structure, it is characterised in that comprise the following steps:
Step 1:Input original sentence;
Step 2:Initial training language material is produced using the original sentence;
Step 3:The training corpus after artificial correction is obtained, middle trained language material is defined as;
Step 4:The correctness of the middle trained corpus labeling is verified, if the mark of the middle trained language material is correct, The middle trained language material is defined as final training corpus, and enters step 5;The otherwise circulation of return to step 3 is performed;
Step 5:Bring the final training corpus into training pattern.
2. the statement analytical method based on semantic and syntactic structure according to claim 1, it is characterised in that original sentence Produce concretely comprising the following steps for initial training language material:
Step 2.1:Participle;
Step 2.2:Part-of-speech tagging;
Step 2.3:Interdependent syntactic analysis;
Step 2.4:Semantic role is analyzed.
3. the statement analytical method based on semantic and syntactic structure according to claim 1, it is characterised in that:The step In 3, manual amendment and correction are carried out to the mark of the initial training language material by the expert with linguistics background.
4. the statement analytical method based on semantic and syntactic structure according to claim 1, it is characterised in that:The step The checking middle trained corpus labeling correctness in 4 is concretely comprised the following steps:
Step 11:Whether information box bit quantity is correct in judging the middle trained language material;It is then to carry out step 12;It is no, then return Step 3 circulation is returned to perform;
Step 12:Whether judge in the middle trained language material comprising verb;It is then to carry out step 13;It is no, then return to step 3 Circulation is performed;
Step 13:Whether verb has corresponding semantic role to mark in judging the middle trained language material;It is then to carry out step 14;No, then the circulation of return to step 3 is performed;
Step 14:Whether judge the dependence of each participle in the middle trained language material has correct link;It is then to be walked Rapid 5;No, then the circulation of return to step 3 is performed.
5. the statement analytical method based on semantic and syntactic structure according to claim 1, it is characterised in that:The training Model is part-of-speech tagging model, or is interdependent syntactic analysis model, or is semantic character labeling model.
6. the Sentence analysis system of claim 1 methods described is based on, it is characterised in that:Including Sentence analysis module, for inciting somebody to action Original sentence generates initial training language material;
Language material authentication module, the correctness for verifying the middle trained corpus labeling.
7. Sentence analysis system according to claim 6, it is characterised in that:Contain participle mould in the Sentence analysis module Type, part-of-speech tagging model, interdependent syntactic analysis model and semantic character labeling model.
8. Sentence analysis system according to claim 7, it is characterised in that:Contain information box in the language material authentication module Digit interrogation model, verb interrogation model, semantic role tag query model and dependence checking model.
CN201611183668.XA 2016-12-20 2016-12-20 A kind of statement analytical method and system based on semantic and syntactic structure Pending CN106844333A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611183668.XA CN106844333A (en) 2016-12-20 2016-12-20 A kind of statement analytical method and system based on semantic and syntactic structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611183668.XA CN106844333A (en) 2016-12-20 2016-12-20 A kind of statement analytical method and system based on semantic and syntactic structure

Publications (1)

Publication Number Publication Date
CN106844333A true CN106844333A (en) 2017-06-13

Family

ID=59140632

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611183668.XA Pending CN106844333A (en) 2016-12-20 2016-12-20 A kind of statement analytical method and system based on semantic and syntactic structure

Country Status (1)

Country Link
CN (1) CN106844333A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111566637A (en) * 2018-02-01 2020-08-21 国际商业机器公司 Dynamically building and configuring a session proxy learning model
CN114462387A (en) * 2022-02-10 2022-05-10 北京易聊科技有限公司 Sentence pattern automatic discrimination method under no-label corpus

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150381A (en) * 2013-03-14 2013-06-12 北京理工大学 High-precision Chinese predicate identification method
CN105740235A (en) * 2016-01-29 2016-07-06 昆明理工大学 Phrase tree to dependency tree transformation method capable of combining Vietnamese grammatical features

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150381A (en) * 2013-03-14 2013-06-12 北京理工大学 High-precision Chinese predicate identification method
CN105740235A (en) * 2016-01-29 2016-07-06 昆明理工大学 Phrase tree to dependency tree transformation method capable of combining Vietnamese grammatical features

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
魏莉等: "汉语句法树库一致性验证方法研究", 《广西师范大学学报:自然科学版》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111566637A (en) * 2018-02-01 2020-08-21 国际商业机器公司 Dynamically building and configuring a session proxy learning model
US11886823B2 (en) 2018-02-01 2024-01-30 International Business Machines Corporation Dynamically constructing and configuring a conversational agent learning model
CN114462387A (en) * 2022-02-10 2022-05-10 北京易聊科技有限公司 Sentence pattern automatic discrimination method under no-label corpus
CN114462387B (en) * 2022-02-10 2022-09-02 北京易聊科技有限公司 Sentence pattern automatic discrimination method under no-label corpus

Similar Documents

Publication Publication Date Title
Hu et al. Parabank: Monolingual bitext generation and sentential paraphrasing via lexically-constrained neural machine translation
Siddharthan et al. Hybrid text simplification using synchronous dependency grammars with hand-written and automatically harvested rules
Ciravegna et al. User-system cooperation in document annotation based on information extraction
Leopold et al. Generating natural language texts from business process models
CN102708098B (en) Dependency coherence constraint-based automatic alignment method for bilingual words
KR20190113965A (en) Systems and methods for using machine learning and rule-based algorithms to create patent specifications based on human-provided patent claims such that patent specifications are created without human intervention
Rudzewitz et al. Generating feedback for English foreign language exercises
Zong Research on the relations between machine translation and human translation
CN106156013A (en) The two-part machine translation method that a kind of regular collocation type phrase is preferential
US10885274B1 (en) Platform for administering and evaluating narrative essay examinations
Hana et al. Building a learner corpus
CN106844333A (en) A kind of statement analytical method and system based on semantic and syntactic structure
CN112836525B (en) Machine translation system based on man-machine interaction and automatic optimization method thereof
Putri et al. Types of errors found in Google Translation: A model of MT evaluation
Zacharis et al. AiCEF: an AI-assisted cyber exercise content generation framework using named entity recognition
Lokman et al. Extension and prerequisite: An algorithm to enable relations between responses in chatbot technology
Ciravegna et al. Timely and Non-Intrusive Active Document Annotation via Adaptive Information Extraction.
Douglas et al. Controlled language support for perkins approved clear english (pace)
Jezek et al. Capturing Coercions in Texts: a First Annotation Exercise.
Frigière PEDAGOGICAL USE OF NOOJ DEALING WITH FRENCH AS A FOREIGN LANGUAGE JULIA FRIGIÈRE AND SANDRINE FUENTES
CN110955768A (en) Question-answering system answer generating method based on syntactic analysis
Beale Documenting endangered languages with Linguist’s Assistant
Strik et al. Practicing syntax in spoken interaction: Automatic detection of syntactic errors in non-native utterances
CN116187339B (en) Automatic composition scoring method based on feature semantic fusion of double-tower model
CN109800419A (en) A kind of game sessions lines generation method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170613

RJ01 Rejection of invention patent application after publication