CN109992651A - A kind of problem target signature automatic identification and abstracting method - Google Patents
A kind of problem target signature automatic identification and abstracting method Download PDFInfo
- Publication number
- CN109992651A CN109992651A CN201910192494.0A CN201910192494A CN109992651A CN 109992651 A CN109992651 A CN 109992651A CN 201910192494 A CN201910192494 A CN 201910192494A CN 109992651 A CN109992651 A CN 109992651A
- Authority
- CN
- China
- Prior art keywords
- chain
- sample
- analyzed
- candidate
- dependence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 62
- 238000012216 screening Methods 0.000 claims abstract description 10
- 238000001914 filtration Methods 0.000 claims abstract description 6
- 238000004458 analytical method Methods 0.000 claims description 46
- 238000013480 data collection Methods 0.000 claims description 9
- 238000012549 training Methods 0.000 description 28
- 238000003860 storage Methods 0.000 description 11
- 238000010586 diagram Methods 0.000 description 9
- 238000005516 engineering process Methods 0.000 description 7
- 238000012545 processing Methods 0.000 description 6
- 239000000284 extract Substances 0.000 description 5
- 238000013473 artificial intelligence Methods 0.000 description 3
- 238000004590 computer program Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 239000004615 ingredient Substances 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000005267 amalgamation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 239000010410 layer Substances 0.000 description 1
- 238000013433 optimization analysis Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000007873 sieving Methods 0.000 description 1
- 239000002344 surface layer Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Abstract
The invention discloses a kind of problem target signature automatic identification and abstracting methods.This method comprises: generating at least one dependence chain sample according to preset create-rule;The matched candidate dependence chain of the question text to be analyzed is matched from least one described dependence sample according to preset matching rule;According to preset screening rule, from the candidate dependence chain, the problem of filtering out the question text to be analyzed target word.Effect of the invention is that the problem of providing through the invention target word recognition method, identification is putd question to for user from text to be analyzed is intended to relative words, so as to identify the intention of user more accurately.
Description
Technical field
The present embodiments relate to field of computer technology, more particularly to a kind of problem target signature automatic identification and
Abstracting method.
Background technique
Currently, with the development of artificial intelligence technology and big data technology, question answering system as information retrieval system one
The advanced application form of kind, is prevalent in each research field such as professional service, education, life.Wherein, user is proposed
The target of problem precisely identify and accurately identifies customer information requirement for question answering system with classification, filtering candidate answers, mentions
High user, which has the satisfaction of answer, to be directly affected.
However, the candidate answers that existing question answering system provides are not accurate enough, its reason is analyzed, mainly existing question and answer system
System carries out problem topic identification and classification by screening the descriptor of customer problem, and ignores problem target identification and divide
The importance of class.And problem theme is not identical as the identification and classification of problem target, problem theme is laid particular emphasis in description problem
The main object of appearance, and problem target then stresses to describe the type that user it is expected answer, therefore the master of customer problem is selected in sieving all
It writes inscription to carry out the question answering system of problem topic identification and classification and can not propose the intention of problem for user and provide accurate
Answer.
Summary of the invention
The purpose of the present invention is intended to solve above-mentioned one of technical problem at least to a certain extent.
For this purpose, the first purpose of this invention is to propose a kind of problem target signature automatic identification and abstracting method, it should
Method is based on any background, and the problem of reflection user is intended to target signature can be identified from the text information of customer problem
Word, and then according to problem target signature word, information related with problem target signature is searched for from the data of magnanimity, to be
User provides answer more accurately, satisfied.
First aspect present invention provides a kind of based on the identification of problem target signature and abstracting method, wherein this method packet
It includes:
At least one dependence chain sample is generated according to preset create-rule;
According to preset matching rule from least one described dependence sample matches described in question text to be analyzed
The candidate dependence chain matched;
Asking for the question text to be analyzed is filtered out from the candidate dependence chain according to preset screening rule
Inscribe target word.
It is optionally, described that at least one dependence chain sample is generated according to preset create-rule, comprising:
Mark go wrong the problems in sample text target signature vocabulary and word relevant to described problem target signature word
It converges, forms labeled data collection;
The part of speech of the problems in described problem sample text target signature vocabulary is marked out, part of speech sample set is formed;It uses
Parsing algorithm carries out syntactic analysis to the labeled data collection, generates at least one syntax dependence chain sample;
Semantic analysis is carried out to the labeled data collection using semantic dependency algorithm, generates at least one semantic dependency pass
It is sample.
Optionally, it is described according to preset matching rule from least one described dependence sample matches described in point
Analyse the matched candidate dependence chain of question text, comprising:
According to default frequency value, frequency is filtered out from least one described syntax dependence chain sample greater than described
The syntax dependence chain sample of default frequency value;
Based on the syntax dependence chain sample filtered out, the major syntactical of the question text to be analyzed is generated
Dependence chain sample set, the major syntactical dependence chain sample set include at least one major syntactical dependence chain
Sample;
Syntactic analysis is carried out to the question text to be analyzed, generates the interdependent pass of syntax of the question text to be analyzed
Tethers collection, the syntax dependence chain collection include at least one syntax dependence chain;
The syntax dependence chain collection is compared with the major syntactical dependence chain sample set, is filtered out
At least one shared syntax dependence chain;
Based at least one described shared syntax dependence chain, at least the one of the question text to be analyzed is generated
A candidate's syntax dependence chain.
Optionally, the method also includes:
According to default frequency value, frequency is filtered out from least one described semantic dependency relations chain sample greater than described
The semantic dependency relations chain sample of default frequency value;
Based on the semantic dependency relations chain sample mark filtered out, the main language of the question text to be analyzed is generated
Adopted dependence chain sample set, the main semantic dependency relations chain sample set include at least one main semantic dependency relations
Chain sample.
Optionally, the method also includes:
Semantic analysis is carried out to the question text to be analyzed, the semantic dependency for generating the question text to be analyzed closes
Tethers collection, the semantic dependency relations chain collection include at least one semantic dependency relations chain;
The semantic dependency relations chain collection is compared with the main semantic dependency relations chain sample set, is filtered out
At least one shared semantic dependency relations chain;
Based at least one described shared semantic dependency relations chain, at least the one of the question text to be analyzed is generated
A candidate semantic dependence chain.
Optionally, it is described filtered out from the candidate dependence chain according to preset screening rule it is described to be analyzed
The problem of question text target word, comprising:
The first node of each candidate syntax dependence chain at least one described candidate syntax dependence chain of interception
Vocabulary generates the candidate problem target word of the question text to be analyzed;
The candidate problem target word is compared with problem target word sample set, generates the problem text to be analyzed
This problem of target word.
Optionally, the method also includes:
Intercept the first node of each candidate semantic dependence chain at least one described candidate semantic dependence chain
Vocabulary generates at least one candidate problem target word of the question text to be analyzed;
The part of speech of at least one candidate problem target word is compared with part of speech sample set, is generated described wait divide
The problem of analysing question text target word.
Optionally, the method also includes:
It intercepts at least one described candidate semantic dependence chain and at least one described candidate semantic dependence chain is total
Some vocabulary, by the shared vocabulary be labeled as the question text to be analyzed the problem of target word.
" the problem target word " mentioned in the present invention for needing to illustrate refers to the crucial letter for being able to reflect quizmaster's intention
Breath, critical data.
Second aspect of the present invention provides a kind of problem target word identification device, states device and includes:
At least one storage unit;
The processing unit coupled at least one storage unit;
Wherein, at least one storage unit is for storing computer instruction;
The processing unit is for calling the computer instruction, to execute method described in first aspect present invention.
Third aspect present invention provides a kind of computer storage medium, and the computer storage medium is stored with computer
Instruction, when the computer instruction is called, for executing method described in first aspect present invention.
Compared with prior art, the invention has the following beneficial effects:
Existing question answering system is extraction and the related keyword message of user's intention from question text, but existing
Question answering system extraction keyword message reflection be subject information in entire question text, although subject information is one
It can include the intent information of user under fixed condition, but the probability of happening of such case is unstable and low.It is further existing
The training sample of technology be also based on the sample that the theme word information of problem is formed, rather than for user's meaning in problem
The sample that figure information is formed, therefore in the sample analysis problem formed using problem-targeted theme word information, user cannot be directed to
Intention analysis.
And the present invention is based on manually mark out be able to reflect out in question text user be intended to the problem of target signature word and
Part of speech carries out dependency analysis to question text, generates the dependence chain collection of question text, and then dependence chain collection
In, find the candidate dependence chain of question text, finally from candidate dependence chain extract question text the problem of target
Word, so that it is guaranteed that the problem of extracting text question target word is formed for user intent information, i.e. the problem target
Word can reflect that the enquirement of user is intended to greatest extent.
It should be noted that, a kind of problem target signature automatic identification provided by the invention and abstracting method not only may be used
To be used in artificial intelligence customer service question answering system, the problem system or user that can also be used under search application scenarios are needed
It asks in analysis system.For example, with technical solution of the present invention, user demand point can be improved in user requirements analysis system
Analysis volume is accurately.What though technical solution of the present invention was proposed in the case where searching for Questions system application scenarios, not to skill of the present invention
The application scenarios of art scheme do any restriction.It will be appreciated by those skilled in the art that technical solution of the present invention can be used in
In multiple scenes.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, required in being described below to embodiment
The attached drawing used is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention,
For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings
Other attached drawings.
Fig. 1 is that the process of a kind of problem target signature automatic identification that the embodiment of the present invention one provides and abstracting method is shown
It is intended to;
Fig. 2 is that the process of a kind of problem target signature automatic identification provided by Embodiment 2 of the present invention and abstracting method is shown
It is intended to;
Fig. 3 is that the process of a kind of problem target signature automatic identification that the embodiment of the present invention three provides and abstracting method is shown
It is intended to;
Fig. 4 a is syntax dependency tree signal provided by Embodiment 2 of the present invention;
Fig. 4 b is the semantic dependency relations tree schematic diagram that the embodiment of the present invention three provides;
Fig. 5 a is the part-of-speech tagging schematic diagram of the embodiment of the present invention;
Fig. 5 b is the syntax dependence schematic diagram of the embodiment of the present invention;
Fig. 5 c is the semantic dependency relations schematic diagram of the embodiment of the present invention;
Fig. 6 is a kind of problem target word identification device structural schematic diagram that the embodiment of the present invention six provides.
Specific embodiment
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention
Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only
It is a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill
Personnel's every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.
Description and claims of this specification and term " first " in above-mentioned attached drawing, " second " etc. are for area
Not different objects, are not use to describe a particular order.In addition, term " includes " and " having " and their any deformations, meaning
Figure, which is to cover, non-exclusive includes.Such as contain process, method, device, product or the equipment of a series of steps or units
It is not limited to listed step or unit, but optionally further comprising the step of not listing or unit, or optionally also
Including other step or units intrinsic for these process, methods, product or equipment.
Referenced herein " embodiment " is it is meant that a particular feature, structure, or characteristic described can wrap in conjunction with the embodiments
Containing at least one embodiment of the present invention.The phrase, which occurs, in each position in the description might not each mean phase
Same embodiment, nor the independent or alternative embodiment with other embodiments mutual exclusion.Those skilled in the art are explicitly
Implicitly understand, embodiment described herein can be combined with other embodiments.
Technical solution provided by embodiment to facilitate the understanding of the present invention, below to involved in the embodiment of the present invention
Application scenarios are introduced.
With the development of artificial intelligence technology and big data technology, question answering system is as a kind of high of information retrieval system
Grade application form, is prevalent in each research field such as professional service, education, life.Wherein, problem proposed to user
Target carry out precisely identification with classification for question answering system accurately identify customer information requirement, filtering candidate answers, improve use
Family, which has the satisfaction of answer, to be directly affected.
The invention proposes a kind of problem target signature automatic identification and abstracting methods, can extract from question text
It is more accurate out, can show user put question to be intended to the problem of target word.
With reference to the accompanying drawing, the embodiment of the present invention is introduced.
Referring to Fig. 1, the embodiment of the present invention one provides a kind of problem target signature automatic identification and abstracting method, the party
Method can problem target signature automatic identification and draw-out device execute, as shown in Figure 1, this method at least includes the following steps,
Step S101 generates at least one dependence chain sample according to preset create-rule.
Step S102, according to preset matching rule, from least one described dependence sample, matching is described wait divide
Analyse the matched candidate dependence chain of question text.
Step S103 is filtered out described to be analyzed from the candidate dependence chain according to preset screening rule
The problem of question text target word.
Referring to Fig. 2, second embodiment of the present invention provides another problem target signature automatic identification and abstracting method,
Including
Step S201, mark go wrong the problems in sample text target signature vocabulary and with described problem target signature
The relevant vocabulary of word forms labeled data collection.
Step S202 marks out the part of speech of the problems in described problem sample text target signature vocabulary, forms part of speech sample
This collection.
Step S203 carries out syntactic analysis to the labeled data collection, generates at least one syntax dependence chain sample
This.
Step S204 filters out frequency from least one described syntax dependence chain sample according to default frequency value
Greater than the syntax dependence chain sample of the default frequency value.
Step S205 generates the question text to be analyzed based on the syntax dependence chain sample filtered out
Major syntactical dependence chain sample set, the major syntactical dependence chain sample set include at least one major syntactical according to
Deposit relation chain sample.
Step 206, syntactic analysis is carried out to the question text to be analyzed, generates the sentence of the question text to be analyzed
Method dependence chain collection, the syntax dependence chain collection include at least one syntax dependence chain.
Step 207, the syntax dependence chain collection and the major syntactical dependence chain sample set are compared
Compared with filtering out at least one shared syntax dependence chain.
Step 208, based at least one described shared syntax dependence chain, the question text to be analyzed is generated
At least one candidate syntax dependence chain.
Step 209, each candidate syntax dependence chain at least one described candidate syntax dependence chain of interception
First node vocabulary, generate the candidate problem target word of the question text to be analyzed.
Step 210, the candidate problem target word is compared with problem target word sample set, is generated described wait divide
The problem of analysing question text target word.
Syntax dependency parsing (Dependency Parsing, DP) passes through interdependent between ingredient in metalanguage unit
Relationship discloses its syntactic structure.For intuitive, " Subject, Predicate and Object ", " determine shape to mend " in interdependent syntactic analysis identification sentence these according to
Deposit syntactic relation, and then the relationship between each ingredient of metalanguage unit.Fig. 5 b is that syntax provided in an embodiment of the present invention is interdependent
Relation schematic diagram.
Optionally, after obtaining training problem text, training problem text is segmented.For training problem text
The problem of " where can rent a house in Fanyu District ", leads to the problem of word finder after participle, text word finder is { " ", " Fanyu
Area ", " where ", " can with " " renting a house " }.
Illustratively, training problem text " where can rent a house in Fanyu District " is obtained, using the interdependent algorithm of syntax to instruction
Practice that question text " where can rent a house in Fanyu District " carries out syntactic analysis, generates at least one training problem text data
Syntax dependency tree, syntax dependency tree is as shown in fig. 4 a.Based on syntax dependency tree, choose and training problem text
This problem of the relevant syntax dependence of target signature word, i.e., word " where " with " renting a house ", word " where " with word " can with " it
Between syntactic relation, and then be based on syntax dependence, define syntax dependence chain, syntax dependence chain expression formula
LinkSD { " wi ← Dk ← wj " } expression, the first node word for the syntax dependence chain that wherein symbol " wi " represents, symbol " wj "
The distal point word of the semantic dependency relations chain of representative, the syntactic relation between symbol " Dk " expression word " wi " and word " wj ", therefore word
Syntax dependence syntax dependence chain between " can with " and word " Fanyu District ", word " can with " and word " renting a house " can be determined
Justice is LinkD1 { " where ← ADV ← rents a house " }, LinkD2 { " where ← ADV ← can be with " }, i.e., raw based on training problem text
It is LinkD1 { " where ← ADV ← rents a house " } at syntax dependence chain sample, LinkD2 { " where ← ADV ← can be with " }.
Optionally, right according to scheduled rule from multiple syntax dependence chain samples of a training problem text
Multiple syntax dependence chain samples are screened, and scheduled rule is based on program to multiple syntax dependence chain samples
The result that counted, analyze, is fitted and be arranged.
Illustratively, the syntax dependence chain sample of training problem text " where can rent a house in Fanyu District " is counted,
According to Machine self-learning program to the statistics of sample, analysis, fitting, as a result syntax dependence chain LinkD1 " where
← ADV ← rents a house " } frequency is greater than 70%, therefore syntax dependence chain LinkD1 is chosen as training problem text " in Fanyu
Where area can rent a house " syntax dependence chain sample.
Exemplary, according to statistics, relationship (ADV), fixed middle relationship (ATT) are as preferred point in subject-predicate relationship (SBV), shape
The interdependent syntactic relation of analysis selects relationship in shape (ADV) as the interdependent syntactic relation of Optimization Analysis, i.e., in the present embodiment
Analysing word " where " and " renting a house " between ADV syntactic relation, ADV syntactic relation between word " can with " and word " renting a house ".
The embodiment of the present invention is based on manually marking out the problems in question text sample target word, to question text sample
Syntactic analysis is carried out, and new syntax is generated for Chinese syntactic features based on the syntactic relation tree that syntactic analysis generates
Dependence chain sample, the syntax dependence chain can will will be intended to phase with user in question text sample more accurately
The vocabulary of pass gets up with other vocabulary associations in question text sample.The embodiment of the present invention is being based on syntax dependence chain
Sample, the syntax dependence chain for treating problem analysis text are screened, to screen from syntax dependence chain sample
The higher syntax dependence chain of correlation out, makes it possible to based on the higher syntax dependence chain of correlation, extract with
User, which puts question to, is intended to the higher problem target word of correlation.
Such as Fig. 3, the embodiment of the present invention three provides another problem target signature automatic identification and abstracting method comprising
Following steps,
Step S301, mark go wrong the problems in sample text target signature vocabulary and with described problem target signature
The relevant vocabulary of word forms labeled data collection.
Step S302 marks out the part of speech of the problems in described problem sample text target signature vocabulary, forms part of speech sample
This collection.
Step S303 filters out frequency from least one described semantic dependency relations chain sample according to default frequency value
Greater than the semantic dependency relations chain sample of the default frequency value.
Step S304 generates the question text to be analyzed based on the semantic dependency relations chain sample mark filtered out
Main semantic dependency relations chain sample set, the main semantic dependency relations chain sample set includes that at least one is main semantic
Dependence chain sample.
Step S305 defines at least one syntax dependence sample based at least one described syntax dependence sample
This.
Step S306 carries out semantic analysis to the question text to be analyzed, generates the language of the question text to be analyzed
Adopted dependence chain collection, the semantic dependency relations chain collection include at least one semantic dependency relations chain.
Step S307 compares the semantic dependency relations chain collection and the main semantic dependency relations chain sample set
Compared with filtering out at least one shared semantic dependency relations chain.
Step S308 generates the question text to be analyzed based at least one described shared semantic dependency relations chain
At least one candidate semantic dependence chain.
Step S309 intercepts each candidate semantic dependence at least one described candidate semantic dependence chain
The first node vocabulary of chain generates at least one candidate problem target word of the question text to be analyzed.
The part of speech of at least one candidate problem target word is compared with part of speech sample set, generates by step S310
The problem of question text to be analyzed target word.
Semantic dependency analysis target is across the constraint of sentence surface layer syntactic structure, directly acquires the semantic information of deep layer,
Its analysis is not influenced by syntactic structure, will have the associated linguistic unit of direct semantics to be directly connected on interdependent arc and label
Corresponding semantic relation.If Fig. 5 c is the semantic dependency relations schematic diagram of the embodiment of the present invention.
Part of speech is extensive as one kind to word, carries out word to each vocabulary of training problem text in the present embodiment
Property mark, interdependent syntactic analysis, interdependent semantic analysis can be conducive to, improve the accuracy of interdependent syntactic analysis, improved interdependent
The accuracy of semantic analysis.Fig. 5 a is part-of-speech tagging schematic diagram provided by the invention, in the present embodiment, by manually marking
The part of speech of the problem of user is intended to target signature word and problem target signature word is able to reflect in training problem text out, it can
Building more accurately, training sample database associated with user's intention.
Illustratively, in embodiments of the present invention, asking for training problem text " where can renting a house in Fanyu District " is marked
Topic mark Feature Words be " where " and word " where " part of speech be pronoun (Pronoun, pron), adverbial word (Adverb, adv), company
Word (Conjugate, conj).
It is exemplary, it is question text to be analyzed " where can rent a house in Fanyu District " by semantic point as shown in Figure 4 b
Analysis, the semantic dependency relations tree of generation.The semantic dependency relations tree includes multi-semantic meaning dependency tree.
Illustratively, training problem text " where can rent a house in Fanyu District " is obtained, using semantic dependency algorithm to extremely
A few training problem text data " where can rent a house in Fanyu District " carries out semantic analysis, generates at least one training and asks
Inscribe text data semantic dependency relations tree, be based on semantic dependency relations tree, selection and with target the problem of training problem text
Semanteme between the relevant semantic dependency relations of Feature Words, i.e. word " can with " and word " Fanyu District ", word " can with " and word " renting a house "
Relationship, and then it is based on semantic dependency relations tree, define semantic dependency relations chain, semantic dependency relations chain expression formula LinkSD
{ " wi-POSwi:SDk:POSwj-wj " } expression, the first node word for the semantic dependency relations chain that wherein symbol " wi " represents,
The distal point word for the semantic dependency relations chain that symbol " wj " represents, the semantic dependency relations that symbol " SDk: " indicates, symbol
The part of speech for the word " wi " that " POSwi " is indicated, the part of speech for the word " wj " that symbol " POSwj " indicates.Therefore word " can with " and word " Fanyu
Semantic relation between area ", word " can with " and word " renting a house " be expressed as with semantic relation chain LinkSD1 " can be with-VV:
Prep:NN-Fanyu District ", LinkSD2 { " can with-VV:nsubj:PN-where " };
Specifically, if a training problem text has two or more semantic dependency relations chains, and the distal point of chain a
If the first node word wi of word wj and chain b is equal, to merged list a and chain b.
Illustratively, by semantic relation chain " can be with-VV:prep:NN-Fanyu District " and semantic relation chain " can with-
VV:nsubj:PN-is where " merge, amalgamation result is that the semantic of training problem text " where can rent a house in Fanyu District " closes
Tethers is LinkSD { " where-PN:nsubj:VV-can be with-VV:prep:NN-Fanyu District " }.
The embodiment of the present invention is based on manually marking out the problems in question text sample target word, to question text sample
Semantic analysis is carried out, and new semanteme is generated for Chinese syntactic features based on the semantic relation tree that syntactic analysis generates
Dependence chain sample, the semantic dependency relations chain can will will be intended to phase with user in question text sample more accurately
The vocabulary of pass gets up with other vocabulary associations in question text sample.The embodiment of the present invention is being based on semantic dependency relations chain
Sample, the semantic dependency relations chain for treating problem analysis text are screened, to screen from semantic dependency relations chain sample
The higher semantic dependency relations chain of correlation out, makes it possible to based on the higher semantic dependency relations chain of correlation, extract with
User, which puts question to, is intended to the higher problem target word of correlation.
The embodiment of the present invention four provides another problem target signature automatic identification and abstracting method comprising the present invention
Further include step in embodiment two and the embodiment of the present invention three other than all steps, intercept at least one described candidate semantic according to
The vocabulary that relation chain is shared at least one described candidate semantic dependence chain is deposited, the shared vocabulary is labeled as institute
The problem of stating question text to be analyzed target word.
Illustratively, it for another training problem text " defining biomedical ", is generated based on this problem
Candidate semantic dependence chain Q_LinkSD is { biology ← ATT ← medicine }, and candidate syntax dependence chain Q_LinkD is { fixed
Justice-VV:advmod:AD-is once }, identical vocabulary is not present in Q_LinkSD and Q_LinkD, therefore traverse respectively this two
A dependence chain collection extracts the first node word of each relation chain, and obtaining candidate problem target signature word set is that { biology is determined
Justice }, corresponding part of speech is { NN, VV }, and according to preparatory statistics generate target word part of speech collection PN, VV, DT, WP, WDT, WRB,
NR, NT }, choose the corresponding candidate problem target signature word " definition " of part of speech VV.
It is exemplary, question text to be analyzed can be obtained from mobile client, the end PC or data storage server, needed
To illustrate that the data type of question text to be analyzed is not limited to text formatting, extended formatting can also be, the present invention is implemented
The format that example does not treat problem analysis text is limited.
Specifically, at least one dependence includes at least one syntax dependence, at least one semantic dependency pass
System.By treating problem analysis text syntax dependency parsing, can generate the syntax comprising at least one syntax dependence according to
Relational tree is deposited, by treating problem analysis text semantic dependency analysis, can be generated including at least one semantic dependency relations
Semantic dependency relations tree.
It should be noted that interdependent syntactic analysis used in the embodiment of the present invention can for PCFG algorithm, Lexical PCFG,
Transition-based Parsing algorithm.In the present embodiment, specific syntax dependency parsing, hereafter will be to it into one
Walk explanation.
Specifically, at least one problem training sample includes at least one syntax dependence chain sample, at least one language
Adopted dependence chain sample, at least one problem target signature word part of speech sample.
Specifically, the relationship between each vocabulary for treating problem analysis text using Parsing algorithm is divided
Before analysis further include: the problem of marking out at least one training problem text data target signature word and at least one training are asked
The part of speech data of the problem of inscribing text data target signature word.
It should be noted that part-of-speech tagging (Part-of-speech Tagging, POS) is to word one each in sentence
The task of a part of speech classification.Here part of speech classification may be noun, verb, adjective or other part of speech classifications.Wherein, v generation
Table verb, n representation noun, c represent conjunction, d represents adverbial word, wp represents punctuation mark.Specific part-of-speech tagging refers to such as Fig. 5 a
It is shown.
It should be noted that there are many modes for the problem of manually marking out training problem text target signature word, for example,
The problem of interrogative in question text is as training problem text target signature word is marked, or when interrogative in question text
, can be according to syntactic relation related with interrogative and/or semantic relation when can not reflect that user is intended to, and therefrom obtain standard
True problem target signature word.
It should be noted that a vocabulary can correspond to multiple parts of speech, the i.e. more property of a word.For example, vocabulary " renting a house " is corresponding
Part of speech can be a verb, be also possible to a noun.In the present embodiment, in the case where a word more property, according to default
Preference rule, the target part of speech collection of training problem sample is screened, formed selected objective target part of speech collection.Meanwhile same
The different parts of speech of a vocabulary can generate different dependence chains.
The embodiment of the present invention five provides a kind of problem target word identification device, below with reference to Fig. 6 to a kind of problem target word
Identification device is illustrated.
Referring to Fig. 6, Fig. 6 is a kind of structural schematic diagram of the assessment device of Item Value provided in an embodiment of the present invention.
The device 60 may include at least one storage unit 601, at least one processing unit 603, at least one communication interface 602, when
The right device can also fexible units, the herein not restriction such as including input/output unit.
Wherein, at least one storage unit 601 may be respectively used for storage computer instruction, program, functional module, thing
Part, database etc. not limit herein.Wherein, at least one storage unit 601 can integrate in a storage equipment, or
Person's separate configurations not limit herein in device 60.
At least one processing unit 603 can be by computer, server, central processing unit, microprocessing unit, data
Unit or big data specialized processing units etc. is managed to realize.
In the present embodiment, a kind of problem target word identification device executes following methods:
At least one dependence chain sample is generated according to preset create-rule;
The question text to be analyzed is matched from least one described dependence sample according to preset matching rule
Matched candidate's dependence chain;
The question text to be analyzed is filtered out from the candidate dependence chain according to preset screening rule
Problem target word.
The present invention implements five and provides a kind of problem target word identification device is provided in addition to executing the embodiment of the present invention one one
Other than kind of problem target signature automatic identification and abstracting method, be also used to execute the present invention implement the two, present invention implement three, this
A kind of problem target signature automatic identification and abstracting method that four provide are implemented in invention.
The present invention implements six and provides a kind of problem target word identification terminal, can interact with user, for receiving user's
Instruction, and returned the result according to the instruction of user, it is automatic to execute a kind of problem target signature that the embodiment of the present invention one provides
Identification and abstracting method:
At least one dependence chain sample is generated according to preset create-rule;
The question text to be analyzed is matched from least one described dependence sample according to preset matching rule
Matched candidate's dependence chain;
The question text to be analyzed is filtered out from the candidate dependence chain according to preset screening rule
Problem target word.
The present invention implements six and provides a kind of problem target word identification terminal is provided in addition to executing the embodiment of the present invention one one
Other than kind of problem target signature automatic identification and abstracting method, be also used to execute the present invention implement the two, present invention implement three, this
A kind of problem target signature automatic identification and abstracting method that four provide are implemented in invention.
In order to realize above-described embodiment, the present invention also proposes a kind of problem target word identification terminal, non-transitory computer
Readable storage medium storing program for executing executes step by functional module:
At least one dependence chain sample is generated according to preset create-rule;
The question text to be analyzed is matched from least one described dependence sample according to preset matching rule
Matched candidate's dependence chain;
The question text to be analyzed is filtered out from the candidate dependence chain according to preset screening rule
Problem target word.
In order to realize above-described embodiment, the embodiment of the present invention seven also proposed a kind of computer program product, work as computer
When instruction in program product is executed by processor, it is special to execute a kind of problem target that first aspect present invention embodiment proposes
Levy automatic identification and abstracting method.The present invention implements a kind of computer program product that seven provide and implements in addition to executing the present invention
Other than a kind of problem target signature automatic identification and abstracting method that example one provides, it is also used to execute the present invention and implements two, this hair
The bright three, present invention that implements implements a kind of problem target signature automatic identification and abstracting method that four provide.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a systems
The combination of actions of column, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described,
Because according to the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art also answer
This knows that the embodiments described in the specification are all preferred embodiments, and related actions and modules is not necessarily originally
Necessary to invention.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment
Point, reference can be made to the related descriptions of other embodiments.
The unit as illustrated by the separation member may or may not be physically separated, as unit
The component of display may or may not be physical unit, it can and it is in one place, or may be distributed over more
In a network unit.Some or all of unit therein can be selected to realize this embodiment scheme according to the actual needs
Purpose.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product
When, it can store in a computer-readable access to memory.Based on this understanding, technical solution of the present invention is substantially
The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words
It embodies, which is stored in a memory, including some instructions are with so that a computer is set
Standby (can be personal computer, server or network equipment etc.) execute each embodiment the method for the present invention whole or
Part steps.And memory above-mentioned includes: that USB flash disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited
Reservoir (RAM, Random Access Memory), mobile hard disk, magnetic or disk etc. be various to can store program code
Medium.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can
It is completed with instructing relevant hardware by program, which can store in a computer-readable memory, memory
May include: flash disk, read-only memory (English: Read-Only Memory, referred to as: ROM), random access device (English:
Random Access Memory, referred to as: RAM), disk or CD etc..
The embodiment of the present invention has been described in detail above, and specific case used herein is to the principle of the present invention
And embodiment is expounded, the above embodiments are only used to help understand, and method and its core of the invention is thought
Think;At the same time, for those skilled in the art, according to the thought of the present invention, in specific embodiments and applications
There will be changes, in conclusion the contents of this specification are not to be construed as limiting the invention.
And it tests, adjust to word segmentation result in turn again in annotation process, to greatly improve cutting
Accuracy rate.
The part of speech collection of the present embodiment also helps semantic dependency relations analysis, syntax has dependency analysis.
Claims (8)
1. a kind of problem target signature automatic identification and abstracting method characterized by comprising
At least one dependence chain sample is generated according to preset create-rule;
According to preset matching rule from least one described dependence sample matches described in question text to be analyzed it is matched
Candidate dependence chain;
The problem of filtering out the question text to be analyzed from the candidate dependence chain according to preset screening rule mesh
Mark word.
2. the method as described in claim 1, which is characterized in that described according to preset create-rule to generate at least one interdependent
Relation chain sample, comprising:
Mark go wrong the problems in sample text target signature vocabulary and vocabulary relevant to described problem target signature word, shape
At labeled data collection;
The part of speech of the problems in described problem sample text target signature vocabulary is marked out, part of speech sample set is formed;
Syntactic analysis is carried out to the labeled data collection using Parsing algorithm, generates at least one syntax dependence chain sample
This;
Semantic analysis is carried out to the labeled data collection using semantic dependency algorithm, generates at least one semantic dependency relations sample
This.
3. method according to claim 2, which is characterized in that it is described according to preset matching rule from it is described at least one according to
Deposit the matched candidate dependence chain of question text to be analyzed described in relationship sample matches, comprising:
According to default frequency value, frequency is filtered out from least one described syntax dependence chain sample greater than the default frequency
The syntax dependence chain sample of angle value;
Based on the syntax dependence chain sample filtered out, the interdependent pass of major syntactical of the question text to be analyzed is generated
Tethers sample set, the major syntactical dependence chain sample set include at least one major syntactical dependence chain sample;
Syntactic analysis is carried out to the question text to be analyzed, generates the syntax dependence chain of the question text to be analyzed
Collection, the syntax dependence chain collection includes at least one syntax dependence chain;
The syntax dependence chain collection is compared with the major syntactical dependence chain sample set, filters out at least one
A shared syntax dependence chain;
Based at least one described shared syntax dependence chain, at least one candidate of the question text to be analyzed is generated
Syntax dependence chain.
4. method as claimed in claim 3, which is characterized in that the method also includes:
According to default frequency value, frequency is filtered out from least one described semantic dependency relations chain sample greater than the default frequency
The semantic dependency relations chain sample of angle value;
Based on the semantic dependency relations chain sample mark filtered out, the main semantic dependency of the question text to be analyzed is generated
Relation chain sample set, the main semantic dependency relations chain sample set include at least one main semantic dependency relations chain sample.
5. method as claimed in claim 4, which is characterized in that the method also includes:
Semantic analysis is carried out to the question text to be analyzed, generates the semantic dependency relations chain of the question text to be analyzed
Collection, the semantic dependency relations chain collection includes at least one semantic dependency relations chain;
The semantic dependency relations chain collection is compared with the main semantic dependency relations chain sample set, filters out at least one
A shared semantic dependency relations chain;
Based at least one described shared semantic dependency relations chain, at least one candidate of the question text to be analyzed is generated
Semantic dependency relations chain.
6. the method as described in claim 1, which is characterized in that it is described according to preset screening rule from the interdependent pass of candidate
The problem of question text to be analyzed is filtered out in tethers target word, comprising:
The first node vocabulary of each candidate syntax dependence chain at least one described candidate syntax dependence chain of interception,
Generate the candidate problem target word of the question text to be analyzed;
The candidate problem target word is compared with problem target word sample set, generates asking for the question text to be analyzed
Inscribe target word.
7. method as claimed in claim 6, which is characterized in that the method also includes:
The first node vocabulary of each candidate semantic dependence chain at least one described candidate semantic dependence chain is intercepted,
Generate at least one candidate problem target word of the question text to be analyzed;
The part of speech of at least one candidate problem target word is compared with part of speech sample set, generates the problem to be analyzed
The problem of text target word.
8. the method for claim 7, which is characterized in that the method also includes:
Intercept what at least one described candidate semantic dependence chain was shared at least one described candidate semantic dependence chain
Vocabulary, by the shared vocabulary be labeled as the question text to be analyzed the problem of target word.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910192494.0A CN109992651B (en) | 2019-03-14 | 2019-03-14 | Automatic identification and extraction method for problem target features |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910192494.0A CN109992651B (en) | 2019-03-14 | 2019-03-14 | Automatic identification and extraction method for problem target features |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109992651A true CN109992651A (en) | 2019-07-09 |
CN109992651B CN109992651B (en) | 2024-01-02 |
Family
ID=67130423
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910192494.0A Active CN109992651B (en) | 2019-03-14 | 2019-03-14 | Automatic identification and extraction method for problem target features |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109992651B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110765759A (en) * | 2019-10-21 | 2020-02-07 | 普信恒业科技发展(北京)有限公司 | Intention identification method and device |
CN116050412A (en) * | 2023-03-07 | 2023-05-02 | 江西风向标智能科技有限公司 | Method and system for dividing high-school mathematics questions based on mathematical semantic logic relationship |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102866989A (en) * | 2012-08-30 | 2013-01-09 | 北京航空航天大学 | Viewpoint extracting method based on word dependence relationship |
CN103646112A (en) * | 2013-12-26 | 2014-03-19 | 中国科学院自动化研究所 | Dependency parsing field self-adaption method based on web search |
US20140114649A1 (en) * | 2006-10-10 | 2014-04-24 | Abbyy Infopoisk Llc | Method and system for semantic searching |
CN105005557A (en) * | 2015-08-06 | 2015-10-28 | 电子科技大学 | Chinese ambiguity word processing method based on dependency parsing |
CN108304466A (en) * | 2017-12-27 | 2018-07-20 | 中国银联股份有限公司 | A kind of user view recognition methods and user view identifying system |
CN109241538A (en) * | 2018-09-26 | 2019-01-18 | 上海德拓信息技术股份有限公司 | Based on the interdependent Chinese entity relation extraction method of keyword and verb |
-
2019
- 2019-03-14 CN CN201910192494.0A patent/CN109992651B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140114649A1 (en) * | 2006-10-10 | 2014-04-24 | Abbyy Infopoisk Llc | Method and system for semantic searching |
CN102866989A (en) * | 2012-08-30 | 2013-01-09 | 北京航空航天大学 | Viewpoint extracting method based on word dependence relationship |
CN103646112A (en) * | 2013-12-26 | 2014-03-19 | 中国科学院自动化研究所 | Dependency parsing field self-adaption method based on web search |
CN105005557A (en) * | 2015-08-06 | 2015-10-28 | 电子科技大学 | Chinese ambiguity word processing method based on dependency parsing |
CN108304466A (en) * | 2017-12-27 | 2018-07-20 | 中国银联股份有限公司 | A kind of user view recognition methods and user view identifying system |
CN109241538A (en) * | 2018-09-26 | 2019-01-18 | 上海德拓信息技术股份有限公司 | Based on the interdependent Chinese entity relation extraction method of keyword and verb |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110765759A (en) * | 2019-10-21 | 2020-02-07 | 普信恒业科技发展(北京)有限公司 | Intention identification method and device |
CN116050412A (en) * | 2023-03-07 | 2023-05-02 | 江西风向标智能科技有限公司 | Method and system for dividing high-school mathematics questions based on mathematical semantic logic relationship |
CN116050412B (en) * | 2023-03-07 | 2024-01-26 | 江西风向标智能科技有限公司 | Method and system for dividing high-school mathematics questions based on mathematical semantic logic relationship |
Also Published As
Publication number | Publication date |
---|---|
CN109992651B (en) | 2024-01-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Williams et al. | A broad-coverage challenge corpus for sentence understanding through inference | |
JP5825676B2 (en) | Non-factoid question answering system and computer program | |
KR101130444B1 (en) | System for identifying paraphrases using machine translation techniques | |
US9501467B2 (en) | Systems, methods, software and interfaces for entity extraction and resolution and tagging | |
CN107247707B (en) | Enterprise association relation information extraction method and device based on completion strategy | |
JP2022539138A (en) | Systems and methods for performing semantic search using a natural language understanding (NLU) framework | |
US8296309B2 (en) | System and method for high precision and high recall relevancy searching | |
US20150363384A1 (en) | System and method of grouping and extracting information from data corpora | |
JP6729095B2 (en) | Information processing device and program | |
Rozovskaya et al. | Correcting grammatical verb errors | |
CN111382571A (en) | Information extraction method, system, server and storage medium | |
Patel et al. | Extractive Based Automatic Text Summarization. | |
KR20120064559A (en) | Apparatus and method for question analysis for open web question-answering | |
CN109992651A (en) | A kind of problem target signature automatic identification and abstracting method | |
Malik et al. | NLP techniques, tools, and algorithms for data science | |
KR20190131270A (en) | The syntax grammar rules automatic generation method of understanding user query intention | |
CN111046168B (en) | Method, apparatus, electronic device and medium for generating patent summary information | |
Ogrodniczuk et al. | Rule-based coreference resolution module for Polish | |
Janani et al. | Text mining research: A survey | |
CN110489740A (en) | Semantic analytic method and Related product | |
CN110750989B (en) | Statement analysis method and device | |
Ceglarek | Semantic compression for text document processing | |
Tanaka et al. | Acquiring and generalizing causal inference rules from deverbal noun constructions | |
Zouaoui et al. | Ontological Approach Based on Multi-Agent System for Indexing and Filtering Arabic Docu-ments | |
Takale et al. | An intelligent web search using multi-document summarization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |