CN109992651A - A kind of problem target signature automatic identification and abstracting method - Google Patents

A kind of problem target signature automatic identification and abstracting method Download PDF

Info

Publication number
CN109992651A
CN109992651A CN201910192494.0A CN201910192494A CN109992651A CN 109992651 A CN109992651 A CN 109992651A CN 201910192494 A CN201910192494 A CN 201910192494A CN 109992651 A CN109992651 A CN 109992651A
Authority
CN
China
Prior art keywords
chain
sample
analyzed
candidate
dependence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910192494.0A
Other languages
Chinese (zh)
Other versions
CN109992651B (en
Inventor
郝天永
谢文秀
瞿瑛瑛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Zhiyu Information Technology Co Ltd
Original Assignee
Guangzhou Zhiyu Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Zhiyu Information Technology Co Ltd filed Critical Guangzhou Zhiyu Information Technology Co Ltd
Priority to CN201910192494.0A priority Critical patent/CN109992651B/en
Publication of CN109992651A publication Critical patent/CN109992651A/en
Application granted granted Critical
Publication of CN109992651B publication Critical patent/CN109992651B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The invention discloses a kind of problem target signature automatic identification and abstracting methods.This method comprises: generating at least one dependence chain sample according to preset create-rule;The matched candidate dependence chain of the question text to be analyzed is matched from least one described dependence sample according to preset matching rule;According to preset screening rule, from the candidate dependence chain, the problem of filtering out the question text to be analyzed target word.Effect of the invention is that the problem of providing through the invention target word recognition method, identification is putd question to for user from text to be analyzed is intended to relative words, so as to identify the intention of user more accurately.

Description

A kind of problem target signature automatic identification and abstracting method
Technical field
The present embodiments relate to field of computer technology, more particularly to a kind of problem target signature automatic identification and Abstracting method.
Background technique
Currently, with the development of artificial intelligence technology and big data technology, question answering system as information retrieval system one The advanced application form of kind, is prevalent in each research field such as professional service, education, life.Wherein, user is proposed The target of problem precisely identify and accurately identifies customer information requirement for question answering system with classification, filtering candidate answers, mentions High user, which has the satisfaction of answer, to be directly affected.
However, the candidate answers that existing question answering system provides are not accurate enough, its reason is analyzed, mainly existing question and answer system System carries out problem topic identification and classification by screening the descriptor of customer problem, and ignores problem target identification and divide The importance of class.And problem theme is not identical as the identification and classification of problem target, problem theme is laid particular emphasis in description problem The main object of appearance, and problem target then stresses to describe the type that user it is expected answer, therefore the master of customer problem is selected in sieving all It writes inscription to carry out the question answering system of problem topic identification and classification and can not propose the intention of problem for user and provide accurate Answer.
Summary of the invention
The purpose of the present invention is intended to solve above-mentioned one of technical problem at least to a certain extent.
For this purpose, the first purpose of this invention is to propose a kind of problem target signature automatic identification and abstracting method, it should Method is based on any background, and the problem of reflection user is intended to target signature can be identified from the text information of customer problem Word, and then according to problem target signature word, information related with problem target signature is searched for from the data of magnanimity, to be User provides answer more accurately, satisfied.
First aspect present invention provides a kind of based on the identification of problem target signature and abstracting method, wherein this method packet It includes:
At least one dependence chain sample is generated according to preset create-rule;
According to preset matching rule from least one described dependence sample matches described in question text to be analyzed The candidate dependence chain matched;
Asking for the question text to be analyzed is filtered out from the candidate dependence chain according to preset screening rule Inscribe target word.
It is optionally, described that at least one dependence chain sample is generated according to preset create-rule, comprising:
Mark go wrong the problems in sample text target signature vocabulary and word relevant to described problem target signature word It converges, forms labeled data collection;
The part of speech of the problems in described problem sample text target signature vocabulary is marked out, part of speech sample set is formed;It uses Parsing algorithm carries out syntactic analysis to the labeled data collection, generates at least one syntax dependence chain sample;
Semantic analysis is carried out to the labeled data collection using semantic dependency algorithm, generates at least one semantic dependency pass It is sample.
Optionally, it is described according to preset matching rule from least one described dependence sample matches described in point Analyse the matched candidate dependence chain of question text, comprising:
According to default frequency value, frequency is filtered out from least one described syntax dependence chain sample greater than described The syntax dependence chain sample of default frequency value;
Based on the syntax dependence chain sample filtered out, the major syntactical of the question text to be analyzed is generated Dependence chain sample set, the major syntactical dependence chain sample set include at least one major syntactical dependence chain Sample;
Syntactic analysis is carried out to the question text to be analyzed, generates the interdependent pass of syntax of the question text to be analyzed Tethers collection, the syntax dependence chain collection include at least one syntax dependence chain;
The syntax dependence chain collection is compared with the major syntactical dependence chain sample set, is filtered out At least one shared syntax dependence chain;
Based at least one described shared syntax dependence chain, at least the one of the question text to be analyzed is generated A candidate's syntax dependence chain.
Optionally, the method also includes:
According to default frequency value, frequency is filtered out from least one described semantic dependency relations chain sample greater than described The semantic dependency relations chain sample of default frequency value;
Based on the semantic dependency relations chain sample mark filtered out, the main language of the question text to be analyzed is generated Adopted dependence chain sample set, the main semantic dependency relations chain sample set include at least one main semantic dependency relations Chain sample.
Optionally, the method also includes:
Semantic analysis is carried out to the question text to be analyzed, the semantic dependency for generating the question text to be analyzed closes Tethers collection, the semantic dependency relations chain collection include at least one semantic dependency relations chain;
The semantic dependency relations chain collection is compared with the main semantic dependency relations chain sample set, is filtered out At least one shared semantic dependency relations chain;
Based at least one described shared semantic dependency relations chain, at least the one of the question text to be analyzed is generated A candidate semantic dependence chain.
Optionally, it is described filtered out from the candidate dependence chain according to preset screening rule it is described to be analyzed The problem of question text target word, comprising:
The first node of each candidate syntax dependence chain at least one described candidate syntax dependence chain of interception Vocabulary generates the candidate problem target word of the question text to be analyzed;
The candidate problem target word is compared with problem target word sample set, generates the problem text to be analyzed This problem of target word.
Optionally, the method also includes:
Intercept the first node of each candidate semantic dependence chain at least one described candidate semantic dependence chain Vocabulary generates at least one candidate problem target word of the question text to be analyzed;
The part of speech of at least one candidate problem target word is compared with part of speech sample set, is generated described wait divide The problem of analysing question text target word.
Optionally, the method also includes:
It intercepts at least one described candidate semantic dependence chain and at least one described candidate semantic dependence chain is total Some vocabulary, by the shared vocabulary be labeled as the question text to be analyzed the problem of target word.
" the problem target word " mentioned in the present invention for needing to illustrate refers to the crucial letter for being able to reflect quizmaster's intention Breath, critical data.
Second aspect of the present invention provides a kind of problem target word identification device, states device and includes:
At least one storage unit;
The processing unit coupled at least one storage unit;
Wherein, at least one storage unit is for storing computer instruction;
The processing unit is for calling the computer instruction, to execute method described in first aspect present invention.
Third aspect present invention provides a kind of computer storage medium, and the computer storage medium is stored with computer Instruction, when the computer instruction is called, for executing method described in first aspect present invention.
Compared with prior art, the invention has the following beneficial effects:
Existing question answering system is extraction and the related keyword message of user's intention from question text, but existing Question answering system extraction keyword message reflection be subject information in entire question text, although subject information is one It can include the intent information of user under fixed condition, but the probability of happening of such case is unstable and low.It is further existing The training sample of technology be also based on the sample that the theme word information of problem is formed, rather than for user's meaning in problem The sample that figure information is formed, therefore in the sample analysis problem formed using problem-targeted theme word information, user cannot be directed to Intention analysis.
And the present invention is based on manually mark out be able to reflect out in question text user be intended to the problem of target signature word and Part of speech carries out dependency analysis to question text, generates the dependence chain collection of question text, and then dependence chain collection In, find the candidate dependence chain of question text, finally from candidate dependence chain extract question text the problem of target Word, so that it is guaranteed that the problem of extracting text question target word is formed for user intent information, i.e. the problem target Word can reflect that the enquirement of user is intended to greatest extent.
It should be noted that, a kind of problem target signature automatic identification provided by the invention and abstracting method not only may be used To be used in artificial intelligence customer service question answering system, the problem system or user that can also be used under search application scenarios are needed It asks in analysis system.For example, with technical solution of the present invention, user demand point can be improved in user requirements analysis system Analysis volume is accurately.What though technical solution of the present invention was proposed in the case where searching for Questions system application scenarios, not to skill of the present invention The application scenarios of art scheme do any restriction.It will be appreciated by those skilled in the art that technical solution of the present invention can be used in In multiple scenes.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, required in being described below to embodiment The attached drawing used is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings Other attached drawings.
Fig. 1 is that the process of a kind of problem target signature automatic identification that the embodiment of the present invention one provides and abstracting method is shown It is intended to;
Fig. 2 is that the process of a kind of problem target signature automatic identification provided by Embodiment 2 of the present invention and abstracting method is shown It is intended to;
Fig. 3 is that the process of a kind of problem target signature automatic identification that the embodiment of the present invention three provides and abstracting method is shown It is intended to;
Fig. 4 a is syntax dependency tree signal provided by Embodiment 2 of the present invention;
Fig. 4 b is the semantic dependency relations tree schematic diagram that the embodiment of the present invention three provides;
Fig. 5 a is the part-of-speech tagging schematic diagram of the embodiment of the present invention;
Fig. 5 b is the syntax dependence schematic diagram of the embodiment of the present invention;
Fig. 5 c is the semantic dependency relations schematic diagram of the embodiment of the present invention;
Fig. 6 is a kind of problem target word identification device structural schematic diagram that the embodiment of the present invention six provides.
Specific embodiment
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only It is a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill Personnel's every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.
Description and claims of this specification and term " first " in above-mentioned attached drawing, " second " etc. are for area Not different objects, are not use to describe a particular order.In addition, term " includes " and " having " and their any deformations, meaning Figure, which is to cover, non-exclusive includes.Such as contain process, method, device, product or the equipment of a series of steps or units It is not limited to listed step or unit, but optionally further comprising the step of not listing or unit, or optionally also Including other step or units intrinsic for these process, methods, product or equipment.
Referenced herein " embodiment " is it is meant that a particular feature, structure, or characteristic described can wrap in conjunction with the embodiments Containing at least one embodiment of the present invention.The phrase, which occurs, in each position in the description might not each mean phase Same embodiment, nor the independent or alternative embodiment with other embodiments mutual exclusion.Those skilled in the art are explicitly Implicitly understand, embodiment described herein can be combined with other embodiments.
Technical solution provided by embodiment to facilitate the understanding of the present invention, below to involved in the embodiment of the present invention Application scenarios are introduced.
With the development of artificial intelligence technology and big data technology, question answering system is as a kind of high of information retrieval system Grade application form, is prevalent in each research field such as professional service, education, life.Wherein, problem proposed to user Target carry out precisely identification with classification for question answering system accurately identify customer information requirement, filtering candidate answers, improve use Family, which has the satisfaction of answer, to be directly affected.
The invention proposes a kind of problem target signature automatic identification and abstracting methods, can extract from question text It is more accurate out, can show user put question to be intended to the problem of target word.
With reference to the accompanying drawing, the embodiment of the present invention is introduced.
Referring to Fig. 1, the embodiment of the present invention one provides a kind of problem target signature automatic identification and abstracting method, the party Method can problem target signature automatic identification and draw-out device execute, as shown in Figure 1, this method at least includes the following steps,
Step S101 generates at least one dependence chain sample according to preset create-rule.
Step S102, according to preset matching rule, from least one described dependence sample, matching is described wait divide Analyse the matched candidate dependence chain of question text.
Step S103 is filtered out described to be analyzed from the candidate dependence chain according to preset screening rule The problem of question text target word.
Referring to Fig. 2, second embodiment of the present invention provides another problem target signature automatic identification and abstracting method, Including
Step S201, mark go wrong the problems in sample text target signature vocabulary and with described problem target signature The relevant vocabulary of word forms labeled data collection.
Step S202 marks out the part of speech of the problems in described problem sample text target signature vocabulary, forms part of speech sample This collection.
Step S203 carries out syntactic analysis to the labeled data collection, generates at least one syntax dependence chain sample This.
Step S204 filters out frequency from least one described syntax dependence chain sample according to default frequency value Greater than the syntax dependence chain sample of the default frequency value.
Step S205 generates the question text to be analyzed based on the syntax dependence chain sample filtered out Major syntactical dependence chain sample set, the major syntactical dependence chain sample set include at least one major syntactical according to Deposit relation chain sample.
Step 206, syntactic analysis is carried out to the question text to be analyzed, generates the sentence of the question text to be analyzed Method dependence chain collection, the syntax dependence chain collection include at least one syntax dependence chain.
Step 207, the syntax dependence chain collection and the major syntactical dependence chain sample set are compared Compared with filtering out at least one shared syntax dependence chain.
Step 208, based at least one described shared syntax dependence chain, the question text to be analyzed is generated At least one candidate syntax dependence chain.
Step 209, each candidate syntax dependence chain at least one described candidate syntax dependence chain of interception First node vocabulary, generate the candidate problem target word of the question text to be analyzed.
Step 210, the candidate problem target word is compared with problem target word sample set, is generated described wait divide The problem of analysing question text target word.
Syntax dependency parsing (Dependency Parsing, DP) passes through interdependent between ingredient in metalanguage unit Relationship discloses its syntactic structure.For intuitive, " Subject, Predicate and Object ", " determine shape to mend " in interdependent syntactic analysis identification sentence these according to Deposit syntactic relation, and then the relationship between each ingredient of metalanguage unit.Fig. 5 b is that syntax provided in an embodiment of the present invention is interdependent Relation schematic diagram.
Optionally, after obtaining training problem text, training problem text is segmented.For training problem text The problem of " where can rent a house in Fanyu District ", leads to the problem of word finder after participle, text word finder is { " ", " Fanyu Area ", " where ", " can with " " renting a house " }.
Illustratively, training problem text " where can rent a house in Fanyu District " is obtained, using the interdependent algorithm of syntax to instruction Practice that question text " where can rent a house in Fanyu District " carries out syntactic analysis, generates at least one training problem text data Syntax dependency tree, syntax dependency tree is as shown in fig. 4 a.Based on syntax dependency tree, choose and training problem text This problem of the relevant syntax dependence of target signature word, i.e., word " where " with " renting a house ", word " where " with word " can with " it Between syntactic relation, and then be based on syntax dependence, define syntax dependence chain, syntax dependence chain expression formula LinkSD { " wi ← Dk ← wj " } expression, the first node word for the syntax dependence chain that wherein symbol " wi " represents, symbol " wj " The distal point word of the semantic dependency relations chain of representative, the syntactic relation between symbol " Dk " expression word " wi " and word " wj ", therefore word Syntax dependence syntax dependence chain between " can with " and word " Fanyu District ", word " can with " and word " renting a house " can be determined Justice is LinkD1 { " where ← ADV ← rents a house " }, LinkD2 { " where ← ADV ← can be with " }, i.e., raw based on training problem text It is LinkD1 { " where ← ADV ← rents a house " } at syntax dependence chain sample, LinkD2 { " where ← ADV ← can be with " }.
Optionally, right according to scheduled rule from multiple syntax dependence chain samples of a training problem text Multiple syntax dependence chain samples are screened, and scheduled rule is based on program to multiple syntax dependence chain samples The result that counted, analyze, is fitted and be arranged.
Illustratively, the syntax dependence chain sample of training problem text " where can rent a house in Fanyu District " is counted, According to Machine self-learning program to the statistics of sample, analysis, fitting, as a result syntax dependence chain LinkD1 " where ← ADV ← rents a house " } frequency is greater than 70%, therefore syntax dependence chain LinkD1 is chosen as training problem text " in Fanyu Where area can rent a house " syntax dependence chain sample.
Exemplary, according to statistics, relationship (ADV), fixed middle relationship (ATT) are as preferred point in subject-predicate relationship (SBV), shape The interdependent syntactic relation of analysis selects relationship in shape (ADV) as the interdependent syntactic relation of Optimization Analysis, i.e., in the present embodiment Analysing word " where " and " renting a house " between ADV syntactic relation, ADV syntactic relation between word " can with " and word " renting a house ".
The embodiment of the present invention is based on manually marking out the problems in question text sample target word, to question text sample Syntactic analysis is carried out, and new syntax is generated for Chinese syntactic features based on the syntactic relation tree that syntactic analysis generates Dependence chain sample, the syntax dependence chain can will will be intended to phase with user in question text sample more accurately The vocabulary of pass gets up with other vocabulary associations in question text sample.The embodiment of the present invention is being based on syntax dependence chain Sample, the syntax dependence chain for treating problem analysis text are screened, to screen from syntax dependence chain sample The higher syntax dependence chain of correlation out, makes it possible to based on the higher syntax dependence chain of correlation, extract with User, which puts question to, is intended to the higher problem target word of correlation.
Such as Fig. 3, the embodiment of the present invention three provides another problem target signature automatic identification and abstracting method comprising Following steps,
Step S301, mark go wrong the problems in sample text target signature vocabulary and with described problem target signature The relevant vocabulary of word forms labeled data collection.
Step S302 marks out the part of speech of the problems in described problem sample text target signature vocabulary, forms part of speech sample This collection.
Step S303 filters out frequency from least one described semantic dependency relations chain sample according to default frequency value Greater than the semantic dependency relations chain sample of the default frequency value.
Step S304 generates the question text to be analyzed based on the semantic dependency relations chain sample mark filtered out Main semantic dependency relations chain sample set, the main semantic dependency relations chain sample set includes that at least one is main semantic Dependence chain sample.
Step S305 defines at least one syntax dependence sample based at least one described syntax dependence sample This.
Step S306 carries out semantic analysis to the question text to be analyzed, generates the language of the question text to be analyzed Adopted dependence chain collection, the semantic dependency relations chain collection include at least one semantic dependency relations chain.
Step S307 compares the semantic dependency relations chain collection and the main semantic dependency relations chain sample set Compared with filtering out at least one shared semantic dependency relations chain.
Step S308 generates the question text to be analyzed based at least one described shared semantic dependency relations chain At least one candidate semantic dependence chain.
Step S309 intercepts each candidate semantic dependence at least one described candidate semantic dependence chain The first node vocabulary of chain generates at least one candidate problem target word of the question text to be analyzed.
The part of speech of at least one candidate problem target word is compared with part of speech sample set, generates by step S310 The problem of question text to be analyzed target word.
Semantic dependency analysis target is across the constraint of sentence surface layer syntactic structure, directly acquires the semantic information of deep layer, Its analysis is not influenced by syntactic structure, will have the associated linguistic unit of direct semantics to be directly connected on interdependent arc and label Corresponding semantic relation.If Fig. 5 c is the semantic dependency relations schematic diagram of the embodiment of the present invention.
Part of speech is extensive as one kind to word, carries out word to each vocabulary of training problem text in the present embodiment Property mark, interdependent syntactic analysis, interdependent semantic analysis can be conducive to, improve the accuracy of interdependent syntactic analysis, improved interdependent The accuracy of semantic analysis.Fig. 5 a is part-of-speech tagging schematic diagram provided by the invention, in the present embodiment, by manually marking The part of speech of the problem of user is intended to target signature word and problem target signature word is able to reflect in training problem text out, it can Building more accurately, training sample database associated with user's intention.
Illustratively, in embodiments of the present invention, asking for training problem text " where can renting a house in Fanyu District " is marked Topic mark Feature Words be " where " and word " where " part of speech be pronoun (Pronoun, pron), adverbial word (Adverb, adv), company Word (Conjugate, conj).
It is exemplary, it is question text to be analyzed " where can rent a house in Fanyu District " by semantic point as shown in Figure 4 b Analysis, the semantic dependency relations tree of generation.The semantic dependency relations tree includes multi-semantic meaning dependency tree.
Illustratively, training problem text " where can rent a house in Fanyu District " is obtained, using semantic dependency algorithm to extremely A few training problem text data " where can rent a house in Fanyu District " carries out semantic analysis, generates at least one training and asks Inscribe text data semantic dependency relations tree, be based on semantic dependency relations tree, selection and with target the problem of training problem text Semanteme between the relevant semantic dependency relations of Feature Words, i.e. word " can with " and word " Fanyu District ", word " can with " and word " renting a house " Relationship, and then it is based on semantic dependency relations tree, define semantic dependency relations chain, semantic dependency relations chain expression formula LinkSD { " wi-POSwi:SDk:POSwj-wj " } expression, the first node word for the semantic dependency relations chain that wherein symbol " wi " represents, The distal point word for the semantic dependency relations chain that symbol " wj " represents, the semantic dependency relations that symbol " SDk: " indicates, symbol The part of speech for the word " wi " that " POSwi " is indicated, the part of speech for the word " wj " that symbol " POSwj " indicates.Therefore word " can with " and word " Fanyu Semantic relation between area ", word " can with " and word " renting a house " be expressed as with semantic relation chain LinkSD1 " can be with-VV: Prep:NN-Fanyu District ", LinkSD2 { " can with-VV:nsubj:PN-where " };
Specifically, if a training problem text has two or more semantic dependency relations chains, and the distal point of chain a If the first node word wi of word wj and chain b is equal, to merged list a and chain b.
Illustratively, by semantic relation chain " can be with-VV:prep:NN-Fanyu District " and semantic relation chain " can with- VV:nsubj:PN-is where " merge, amalgamation result is that the semantic of training problem text " where can rent a house in Fanyu District " closes Tethers is LinkSD { " where-PN:nsubj:VV-can be with-VV:prep:NN-Fanyu District " }.
The embodiment of the present invention is based on manually marking out the problems in question text sample target word, to question text sample Semantic analysis is carried out, and new semanteme is generated for Chinese syntactic features based on the semantic relation tree that syntactic analysis generates Dependence chain sample, the semantic dependency relations chain can will will be intended to phase with user in question text sample more accurately The vocabulary of pass gets up with other vocabulary associations in question text sample.The embodiment of the present invention is being based on semantic dependency relations chain Sample, the semantic dependency relations chain for treating problem analysis text are screened, to screen from semantic dependency relations chain sample The higher semantic dependency relations chain of correlation out, makes it possible to based on the higher semantic dependency relations chain of correlation, extract with User, which puts question to, is intended to the higher problem target word of correlation.
The embodiment of the present invention four provides another problem target signature automatic identification and abstracting method comprising the present invention Further include step in embodiment two and the embodiment of the present invention three other than all steps, intercept at least one described candidate semantic according to The vocabulary that relation chain is shared at least one described candidate semantic dependence chain is deposited, the shared vocabulary is labeled as institute The problem of stating question text to be analyzed target word.
Illustratively, it for another training problem text " defining biomedical ", is generated based on this problem Candidate semantic dependence chain Q_LinkSD is { biology ← ATT ← medicine }, and candidate syntax dependence chain Q_LinkD is { fixed Justice-VV:advmod:AD-is once }, identical vocabulary is not present in Q_LinkSD and Q_LinkD, therefore traverse respectively this two A dependence chain collection extracts the first node word of each relation chain, and obtaining candidate problem target signature word set is that { biology is determined Justice }, corresponding part of speech is { NN, VV }, and according to preparatory statistics generate target word part of speech collection PN, VV, DT, WP, WDT, WRB, NR, NT }, choose the corresponding candidate problem target signature word " definition " of part of speech VV.
It is exemplary, question text to be analyzed can be obtained from mobile client, the end PC or data storage server, needed To illustrate that the data type of question text to be analyzed is not limited to text formatting, extended formatting can also be, the present invention is implemented The format that example does not treat problem analysis text is limited.
Specifically, at least one dependence includes at least one syntax dependence, at least one semantic dependency pass System.By treating problem analysis text syntax dependency parsing, can generate the syntax comprising at least one syntax dependence according to Relational tree is deposited, by treating problem analysis text semantic dependency analysis, can be generated including at least one semantic dependency relations Semantic dependency relations tree.
It should be noted that interdependent syntactic analysis used in the embodiment of the present invention can for PCFG algorithm, Lexical PCFG, Transition-based Parsing algorithm.In the present embodiment, specific syntax dependency parsing, hereafter will be to it into one Walk explanation.
Specifically, at least one problem training sample includes at least one syntax dependence chain sample, at least one language Adopted dependence chain sample, at least one problem target signature word part of speech sample.
Specifically, the relationship between each vocabulary for treating problem analysis text using Parsing algorithm is divided Before analysis further include: the problem of marking out at least one training problem text data target signature word and at least one training are asked The part of speech data of the problem of inscribing text data target signature word.
It should be noted that part-of-speech tagging (Part-of-speech Tagging, POS) is to word one each in sentence The task of a part of speech classification.Here part of speech classification may be noun, verb, adjective or other part of speech classifications.Wherein, v generation Table verb, n representation noun, c represent conjunction, d represents adverbial word, wp represents punctuation mark.Specific part-of-speech tagging refers to such as Fig. 5 a It is shown.
It should be noted that there are many modes for the problem of manually marking out training problem text target signature word, for example, The problem of interrogative in question text is as training problem text target signature word is marked, or when interrogative in question text , can be according to syntactic relation related with interrogative and/or semantic relation when can not reflect that user is intended to, and therefrom obtain standard True problem target signature word.
It should be noted that a vocabulary can correspond to multiple parts of speech, the i.e. more property of a word.For example, vocabulary " renting a house " is corresponding Part of speech can be a verb, be also possible to a noun.In the present embodiment, in the case where a word more property, according to default Preference rule, the target part of speech collection of training problem sample is screened, formed selected objective target part of speech collection.Meanwhile same The different parts of speech of a vocabulary can generate different dependence chains.
The embodiment of the present invention five provides a kind of problem target word identification device, below with reference to Fig. 6 to a kind of problem target word Identification device is illustrated.
Referring to Fig. 6, Fig. 6 is a kind of structural schematic diagram of the assessment device of Item Value provided in an embodiment of the present invention. The device 60 may include at least one storage unit 601, at least one processing unit 603, at least one communication interface 602, when The right device can also fexible units, the herein not restriction such as including input/output unit.
Wherein, at least one storage unit 601 may be respectively used for storage computer instruction, program, functional module, thing Part, database etc. not limit herein.Wherein, at least one storage unit 601 can integrate in a storage equipment, or Person's separate configurations not limit herein in device 60.
At least one processing unit 603 can be by computer, server, central processing unit, microprocessing unit, data Unit or big data specialized processing units etc. is managed to realize.
In the present embodiment, a kind of problem target word identification device executes following methods:
At least one dependence chain sample is generated according to preset create-rule;
The question text to be analyzed is matched from least one described dependence sample according to preset matching rule Matched candidate's dependence chain;
The question text to be analyzed is filtered out from the candidate dependence chain according to preset screening rule Problem target word.
The present invention implements five and provides a kind of problem target word identification device is provided in addition to executing the embodiment of the present invention one one Other than kind of problem target signature automatic identification and abstracting method, be also used to execute the present invention implement the two, present invention implement three, this A kind of problem target signature automatic identification and abstracting method that four provide are implemented in invention.
The present invention implements six and provides a kind of problem target word identification terminal, can interact with user, for receiving user's Instruction, and returned the result according to the instruction of user, it is automatic to execute a kind of problem target signature that the embodiment of the present invention one provides Identification and abstracting method:
At least one dependence chain sample is generated according to preset create-rule;
The question text to be analyzed is matched from least one described dependence sample according to preset matching rule Matched candidate's dependence chain;
The question text to be analyzed is filtered out from the candidate dependence chain according to preset screening rule Problem target word.
The present invention implements six and provides a kind of problem target word identification terminal is provided in addition to executing the embodiment of the present invention one one Other than kind of problem target signature automatic identification and abstracting method, be also used to execute the present invention implement the two, present invention implement three, this A kind of problem target signature automatic identification and abstracting method that four provide are implemented in invention.
In order to realize above-described embodiment, the present invention also proposes a kind of problem target word identification terminal, non-transitory computer Readable storage medium storing program for executing executes step by functional module:
At least one dependence chain sample is generated according to preset create-rule;
The question text to be analyzed is matched from least one described dependence sample according to preset matching rule Matched candidate's dependence chain;
The question text to be analyzed is filtered out from the candidate dependence chain according to preset screening rule Problem target word.
In order to realize above-described embodiment, the embodiment of the present invention seven also proposed a kind of computer program product, work as computer When instruction in program product is executed by processor, it is special to execute a kind of problem target that first aspect present invention embodiment proposes Levy automatic identification and abstracting method.The present invention implements a kind of computer program product that seven provide and implements in addition to executing the present invention Other than a kind of problem target signature automatic identification and abstracting method that example one provides, it is also used to execute the present invention and implements two, this hair The bright three, present invention that implements implements a kind of problem target signature automatic identification and abstracting method that four provide.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a systems The combination of actions of column, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described, Because according to the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art also answer This knows that the embodiments described in the specification are all preferred embodiments, and related actions and modules is not necessarily originally Necessary to invention.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, reference can be made to the related descriptions of other embodiments.
The unit as illustrated by the separation member may or may not be physically separated, as unit The component of display may or may not be physical unit, it can and it is in one place, or may be distributed over more In a network unit.Some or all of unit therein can be selected to realize this embodiment scheme according to the actual needs Purpose.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in a computer-readable access to memory.Based on this understanding, technical solution of the present invention is substantially The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words It embodies, which is stored in a memory, including some instructions are with so that a computer is set Standby (can be personal computer, server or network equipment etc.) execute each embodiment the method for the present invention whole or Part steps.And memory above-mentioned includes: that USB flash disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited Reservoir (RAM, Random Access Memory), mobile hard disk, magnetic or disk etc. be various to can store program code Medium.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can It is completed with instructing relevant hardware by program, which can store in a computer-readable memory, memory May include: flash disk, read-only memory (English: Read-Only Memory, referred to as: ROM), random access device (English: Random Access Memory, referred to as: RAM), disk or CD etc..
The embodiment of the present invention has been described in detail above, and specific case used herein is to the principle of the present invention And embodiment is expounded, the above embodiments are only used to help understand, and method and its core of the invention is thought Think;At the same time, for those skilled in the art, according to the thought of the present invention, in specific embodiments and applications There will be changes, in conclusion the contents of this specification are not to be construed as limiting the invention.
And it tests, adjust to word segmentation result in turn again in annotation process, to greatly improve cutting Accuracy rate.
The part of speech collection of the present embodiment also helps semantic dependency relations analysis, syntax has dependency analysis.

Claims (8)

1. a kind of problem target signature automatic identification and abstracting method characterized by comprising
At least one dependence chain sample is generated according to preset create-rule;
According to preset matching rule from least one described dependence sample matches described in question text to be analyzed it is matched Candidate dependence chain;
The problem of filtering out the question text to be analyzed from the candidate dependence chain according to preset screening rule mesh Mark word.
2. the method as described in claim 1, which is characterized in that described according to preset create-rule to generate at least one interdependent Relation chain sample, comprising:
Mark go wrong the problems in sample text target signature vocabulary and vocabulary relevant to described problem target signature word, shape At labeled data collection;
The part of speech of the problems in described problem sample text target signature vocabulary is marked out, part of speech sample set is formed;
Syntactic analysis is carried out to the labeled data collection using Parsing algorithm, generates at least one syntax dependence chain sample This;
Semantic analysis is carried out to the labeled data collection using semantic dependency algorithm, generates at least one semantic dependency relations sample This.
3. method according to claim 2, which is characterized in that it is described according to preset matching rule from it is described at least one according to Deposit the matched candidate dependence chain of question text to be analyzed described in relationship sample matches, comprising:
According to default frequency value, frequency is filtered out from least one described syntax dependence chain sample greater than the default frequency The syntax dependence chain sample of angle value;
Based on the syntax dependence chain sample filtered out, the interdependent pass of major syntactical of the question text to be analyzed is generated Tethers sample set, the major syntactical dependence chain sample set include at least one major syntactical dependence chain sample;
Syntactic analysis is carried out to the question text to be analyzed, generates the syntax dependence chain of the question text to be analyzed Collection, the syntax dependence chain collection includes at least one syntax dependence chain;
The syntax dependence chain collection is compared with the major syntactical dependence chain sample set, filters out at least one A shared syntax dependence chain;
Based at least one described shared syntax dependence chain, at least one candidate of the question text to be analyzed is generated Syntax dependence chain.
4. method as claimed in claim 3, which is characterized in that the method also includes:
According to default frequency value, frequency is filtered out from least one described semantic dependency relations chain sample greater than the default frequency The semantic dependency relations chain sample of angle value;
Based on the semantic dependency relations chain sample mark filtered out, the main semantic dependency of the question text to be analyzed is generated Relation chain sample set, the main semantic dependency relations chain sample set include at least one main semantic dependency relations chain sample.
5. method as claimed in claim 4, which is characterized in that the method also includes:
Semantic analysis is carried out to the question text to be analyzed, generates the semantic dependency relations chain of the question text to be analyzed Collection, the semantic dependency relations chain collection includes at least one semantic dependency relations chain;
The semantic dependency relations chain collection is compared with the main semantic dependency relations chain sample set, filters out at least one A shared semantic dependency relations chain;
Based at least one described shared semantic dependency relations chain, at least one candidate of the question text to be analyzed is generated Semantic dependency relations chain.
6. the method as described in claim 1, which is characterized in that it is described according to preset screening rule from the interdependent pass of candidate The problem of question text to be analyzed is filtered out in tethers target word, comprising:
The first node vocabulary of each candidate syntax dependence chain at least one described candidate syntax dependence chain of interception, Generate the candidate problem target word of the question text to be analyzed;
The candidate problem target word is compared with problem target word sample set, generates asking for the question text to be analyzed Inscribe target word.
7. method as claimed in claim 6, which is characterized in that the method also includes:
The first node vocabulary of each candidate semantic dependence chain at least one described candidate semantic dependence chain is intercepted, Generate at least one candidate problem target word of the question text to be analyzed;
The part of speech of at least one candidate problem target word is compared with part of speech sample set, generates the problem to be analyzed The problem of text target word.
8. the method for claim 7, which is characterized in that the method also includes:
Intercept what at least one described candidate semantic dependence chain was shared at least one described candidate semantic dependence chain Vocabulary, by the shared vocabulary be labeled as the question text to be analyzed the problem of target word.
CN201910192494.0A 2019-03-14 2019-03-14 Automatic identification and extraction method for problem target features Active CN109992651B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910192494.0A CN109992651B (en) 2019-03-14 2019-03-14 Automatic identification and extraction method for problem target features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910192494.0A CN109992651B (en) 2019-03-14 2019-03-14 Automatic identification and extraction method for problem target features

Publications (2)

Publication Number Publication Date
CN109992651A true CN109992651A (en) 2019-07-09
CN109992651B CN109992651B (en) 2024-01-02

Family

ID=67130423

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910192494.0A Active CN109992651B (en) 2019-03-14 2019-03-14 Automatic identification and extraction method for problem target features

Country Status (1)

Country Link
CN (1) CN109992651B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110765759A (en) * 2019-10-21 2020-02-07 普信恒业科技发展(北京)有限公司 Intention identification method and device
CN116050412A (en) * 2023-03-07 2023-05-02 江西风向标智能科技有限公司 Method and system for dividing high-school mathematics questions based on mathematical semantic logic relationship

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102866989A (en) * 2012-08-30 2013-01-09 北京航空航天大学 Viewpoint extracting method based on word dependence relationship
CN103646112A (en) * 2013-12-26 2014-03-19 中国科学院自动化研究所 Dependency parsing field self-adaption method based on web search
US20140114649A1 (en) * 2006-10-10 2014-04-24 Abbyy Infopoisk Llc Method and system for semantic searching
CN105005557A (en) * 2015-08-06 2015-10-28 电子科技大学 Chinese ambiguity word processing method based on dependency parsing
CN108304466A (en) * 2017-12-27 2018-07-20 中国银联股份有限公司 A kind of user view recognition methods and user view identifying system
CN109241538A (en) * 2018-09-26 2019-01-18 上海德拓信息技术股份有限公司 Based on the interdependent Chinese entity relation extraction method of keyword and verb

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140114649A1 (en) * 2006-10-10 2014-04-24 Abbyy Infopoisk Llc Method and system for semantic searching
CN102866989A (en) * 2012-08-30 2013-01-09 北京航空航天大学 Viewpoint extracting method based on word dependence relationship
CN103646112A (en) * 2013-12-26 2014-03-19 中国科学院自动化研究所 Dependency parsing field self-adaption method based on web search
CN105005557A (en) * 2015-08-06 2015-10-28 电子科技大学 Chinese ambiguity word processing method based on dependency parsing
CN108304466A (en) * 2017-12-27 2018-07-20 中国银联股份有限公司 A kind of user view recognition methods and user view identifying system
CN109241538A (en) * 2018-09-26 2019-01-18 上海德拓信息技术股份有限公司 Based on the interdependent Chinese entity relation extraction method of keyword and verb

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110765759A (en) * 2019-10-21 2020-02-07 普信恒业科技发展(北京)有限公司 Intention identification method and device
CN116050412A (en) * 2023-03-07 2023-05-02 江西风向标智能科技有限公司 Method and system for dividing high-school mathematics questions based on mathematical semantic logic relationship
CN116050412B (en) * 2023-03-07 2024-01-26 江西风向标智能科技有限公司 Method and system for dividing high-school mathematics questions based on mathematical semantic logic relationship

Also Published As

Publication number Publication date
CN109992651B (en) 2024-01-02

Similar Documents

Publication Publication Date Title
Williams et al. A broad-coverage challenge corpus for sentence understanding through inference
JP5825676B2 (en) Non-factoid question answering system and computer program
KR101130444B1 (en) System for identifying paraphrases using machine translation techniques
US9501467B2 (en) Systems, methods, software and interfaces for entity extraction and resolution and tagging
CN107247707B (en) Enterprise association relation information extraction method and device based on completion strategy
JP2022539138A (en) Systems and methods for performing semantic search using a natural language understanding (NLU) framework
US8296309B2 (en) System and method for high precision and high recall relevancy searching
US20150363384A1 (en) System and method of grouping and extracting information from data corpora
JP6729095B2 (en) Information processing device and program
Rozovskaya et al. Correcting grammatical verb errors
CN111382571A (en) Information extraction method, system, server and storage medium
Patel et al. Extractive Based Automatic Text Summarization.
KR20120064559A (en) Apparatus and method for question analysis for open web question-answering
CN109992651A (en) A kind of problem target signature automatic identification and abstracting method
Malik et al. NLP techniques, tools, and algorithms for data science
KR20190131270A (en) The syntax grammar rules automatic generation method of understanding user query intention
CN111046168B (en) Method, apparatus, electronic device and medium for generating patent summary information
Ogrodniczuk et al. Rule-based coreference resolution module for Polish
Janani et al. Text mining research: A survey
CN110489740A (en) Semantic analytic method and Related product
CN110750989B (en) Statement analysis method and device
Ceglarek Semantic compression for text document processing
Tanaka et al. Acquiring and generalizing causal inference rules from deverbal noun constructions
Zouaoui et al. Ontological Approach Based on Multi-Agent System for Indexing and Filtering Arabic Docu-ments
Takale et al. An intelligent web search using multi-document summarization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant