CN110502621A

CN110502621A - Answering method, question and answer system, computer equipment and storage medium

Info

Publication number: CN110502621A
Application number: CN201910593110.6A
Authority: CN
Inventors: 朱威; 梁欣; 李春宇; 丁佳佳; 倪渊; 谢国彤
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-07-03
Filing date: 2019-07-03
Publication date: 2019-11-26
Anticipated expiration: 2039-07-03
Also published as: CN110502621B; WO2021000676A1

Abstract

The invention discloses answering method, device, computer equipment and storage mediums, comprising: obtains the input information of user；Name entity in identification input information, and entity link candidate entity corresponding with name entity into Chinese knowledge mapping will be named, entity pair is formed, wherein entity is to including naming entity and candidate entity；The candidate relationship of candidate entity is matched in Chinese knowledge mapping by relational model；According to entity to and candidate relationship, candidate triple is formed, wherein candidate triple includes name entity, candidate entity and candidate relationship；Based on study order models, the corresponding ranking results of each candidate triple are obtained；And according to ranking results, the Chinese knowledge mapping is inquired, to obtain the answer of input information.This method can efficiently use external resource, can provide a large amount of language ambience information by text mining, and based on study order models, preferable answer can also be obtained when question and answer corpus data is less.

Description

Answering method, question and answer system, computer equipment and storage medium

Technical field

The present invention relates to field of computer technology more particularly to a kind of answering method, question and answer system, computer equipment and deposit Storage media.

Background technique

Question answering system is a kind of advanced form of information retrieval system, it can be answered with accurate, succinct natural language and be used The problem of family is proposed with natural language.Traditional question answering system is divided into question sentence processing and two large divisions is retrieved in answer.Wherein, question sentence The basis of processing is participle.Answer retrieval mostly uses scoring, i.e., a series of candidate answers are chosen from mass text data, Then building selection function chooses immediate answer from candidate answers.And this traditional question and answer system in processing because grow Text noun and building selection function difference and there is different degrees of mistake.

In this case, the question answering system of knowledge based map is come into being.Currently, the question and answer system of knowledge based map Main research direction of uniting has three classes.The first kind: regular pattern composite, this type determine user's question sentence by fixed rule It whether is some fact in inquiry knowledge base.Second class: Template Learning, this type collects a large amount of templates, by The good corresponding knowledge base of mark is true, and learns the probability that a natural language question sentence corresponds to some template by mass data. Third class: the semantic matches based on deep learning learn some pass in a question sentence and knowledge mapping by neural network model The semantic similarity of system, wherein question sentence had done Entity recognition and had replaced entity in question sentence with additional character.

The knowledge base question answering system accuracy of regular pattern composite is very high, but and it is not flexible, it is every one kind problem require to write one Rule, and Template Learning and deep learning method generally require to be learnt according to extensive question and answer corpus, it is difficult to it is first in exploitation Phase is just applied in the vertical field of a question and answer data scarcity.

Summary of the invention

In view of this, the present invention proposes a kind of answering method, question and answer system, computer equipment and storage medium, Neng Gou In the case that question and answer corpus data is less, obtains one and accurately answer.

Firstly, to achieve the above object, the present invention proposes a kind of answering method, the answering method comprising steps of

Obtain the input information of user；

Identify the name entity in the input information, and by the name entity link into the Chinese knowledge mapping Candidate's entity corresponding with the name entity, forms entity pair, wherein the entity is to including the name entity and institute State candidate entity；

The candidate relationship of the candidate entity is matched in the Chinese knowledge mapping by relational model；

According to the entity to the candidate relationship, form candidate triple；Wherein the candidate triple includes institute State name entity, the candidate entity and the candidate relationship；

The corresponding ranking results of each candidate's triple are obtained based on study order models；And

According to the ranking results inquiry Chinese knowledge mapping, to obtain the answer of the input information.

Further, the name entity in the identification input information specifically includes:

The input information is labeled, annotation results are obtained；And

According to the annotation results, the name entity in the input information is identified by Recognition with Recurrent Neural Network model.

Further, the name entity in the identification input information, and by the name entity link to described Candidate's entity corresponding with the name entity, the step for forming entity pair include: in Chinese knowledge mapping

The similarity between name entity described in each entity pair and the candidate entity is calculated, wherein described similar Degree is obtained according to chinese character similarity, pinyin character similarity, term vector similarity and the concerned degree of entity；

It is arranged the names with obtaining each entity to corresponding to each entity to sequence according to each similarity；And

The corresponding entity pair is chosen according to described arrange the names.

Further, the relationship templates include between first instance, second instance and first instance and second instance Relationship.

Further, described based on study order models, obtain the corresponding ranking results tool of each candidate's triple Body includes:

Calculate the corresponding each feature vector of each triple；And

Each described eigenvector is input in the study order models corresponding to obtain each candidate triple Ranking results.

Further, the step of each feature vector for calculating each triple includes:

The first similarity feature between the name entity and the candidate entity is calculated according to the triple；

The name entity in the input information is removed to obtain remaining word, and calculates the remaining word and same The second similarity feature between adopted word and context vocabulary；

According to the input information generate high dimension vector, wherein the high dimension vector be according in the input information whether It is generated there are default vocabulary；And

According to the first similarity feature, the second similarity feature and the high dimension vector, the feature is generated Vector.

Further, the study order models are formed by by training first sample and each candidate triple What the second sample obtained, wherein first sample is the triple being made of the model answer of the input information.

A kind of question and answer system of the Chinese knowledge mapping based on study sequence, the question and answer system include:

First obtains module, for obtaining the input information of user；

Identification and link module, the name entity in the input information for identification, and by the name entity link Into the Chinese knowledge mapping, candidate entity corresponding with the name entity, forms entity pair, wherein the entity pair Including the name entity and the candidate entity；

Matching module, for matching the candidate of the candidate entity in the Chinese knowledge mapping by relational model Relationship；

Form module, for according to the entity to the candidate relationship, form candidate triple；The wherein candidate Triple includes the name entity, the candidate entity and the candidate relationship；

Second acquisition module, for obtaining the corresponding sequence knot of each candidate's triple based on study order models Fruit；And

Third obtains module, for inquiring the Chinese knowledge mapping according to the ranking results, to obtain the input The answer of information.

To achieve the above object, it the present invention also provides a kind of computer equipment, including memory, processor and is stored in On memory and the computer program that can run on the processor, the processor are realized when executing the computer program The step of above method.

To achieve the above object, the present invention also provides computer readable storage mediums, are stored thereon with computer program, institute State the step of above method is realized when computer program is executed by processor.

Compared to traditional technology, answering method, computer equipment and the storage of knowledge based map proposed by the invention Medium can effectively utilize external resource, and the synonym or online text of the relationship fact are efficiently used by width learning model The external resources such as word, this portion of external resource quick by text mining or directly in the way of Chinese word body etc. can obtain It arrives.Also by the combination of width learning model and deep learning model, data volume needed for can reduce model, in training data It also can be preferably exported when less as a result, this has when the knowledge mapping question and answer in exploitation new vertical field Very important meaning.

Detailed description of the invention

Fig. 1 is the flow diagram of the answering method of first embodiment of the invention；

Fig. 2 is the flow diagram of the answering method of second embodiment of the invention；

Fig. 3 is the flow diagram of the answering method of third embodiment of the invention；

Fig. 4 is the flow diagram of the answering method of fourth embodiment of the invention；

Fig. 5 is the flow diagram of the answering method of fifth embodiment of the invention；

Fig. 6 is the block diagram of the question and answer system of sixth embodiment of the invention；

Fig. 7 is the block diagram of the question and answer system of seventh embodiment of the invention；And

Fig. 8 is the block diagram of synonym collector unit in the question and answer system of eighth embodiment of the invention.

The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.

Specific embodiment

In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that described herein, specific examples are only used to explain the present invention, not For limiting the present invention.Based on the embodiments of the present invention, those of ordinary skill in the art are not before making creative work Every other embodiment obtained is put, shall fall within the protection scope of the present invention.

It should be noted that the description for being related to " first ", " second " etc. in the present invention is used for description purposes only, and cannot It is interpreted as its relative importance of indication or suggestion or implicitly indicates the quantity of indicated technical characteristic.Define as a result, " the One ", the feature of " second " can explicitly or implicitly include at least one of the features.In addition, the skill between each embodiment Art scheme can be combined with each other, but must be based on can be realized by those of ordinary skill in the art, when technical solution Will be understood that the combination of this technical solution is not present in conjunction with there is conflicting or cannot achieve when, also not the present invention claims Protection scope within.

Referring to FIG. 1, providing a kind of answering method in first embodiment.The answering method includes:

Step S110: the input information of user is obtained.

Wherein, input information can be natural language retrieval sentence (such as question sentence), as user inputs question sentence on search website: " what medicine cough needs to eat " the present embodiment inputs the mode of information without limitation to acquisition.

Step S120: the name entity in the identification input information, and by the name entity link to the Chinese Candidate entity in knowledge mapping forms entity pair, wherein the entity is to including the name entity and the candidate entity.

Specifically, by carrying out sequence labelling, then root to input information using mark set method, Recognition with Recurrent Neural Network model The identification of name entity is completed according to the result of sequence labelling (specific steps will be discussed in detail in a second embodiment).Such as " what medicine cough needs to eat ", first pass through BIO mark set method and it be labeled, obtain the question sentence according to annotation results Vector information, then using the vector information as the input of Recognition with Recurrent Neural Network model, to identify " cough " this name reality Body.Then, which is corresponded to globally unique identifier (the Globally Unique in Chinese knowledge mapping Identifier, GUID), thus corresponding candidate's entity, such as cough into knowledge mapping by the name entity link.In addition, Each of knowledge mapping candidate's entity uniquely corresponds to a GUID, can be distinguished in Chinese knowledge mapping by the GUID Different candidate's entities.

Wherein, Chinese knowledge mapping is a kind of new technique for storing labyrinth information.It is deposited in Chinese knowledge mapping A host of facts type knowledge is stored up, relation information of the storage inside between entity and entity.Chinese knowledge mapping is mostly with RDF The format memory data of (Resource Description Framework), a fact are represented as (S, P, an O) three Tuple, shaped like (subject, predicate, object), wherein S and O is expressed as entity, and O is sometimes also indicated as attribute value, P Indicate the relationship between S and O.Entity link be solve name entity ambiguity problem a kind of important method, this method pass through by Have ambiguous entity denotion item and is linked to the elimination for realizing entity ambiguity in given knowledge mapping.

In addition, because there is alias or other information in name entity, each of Chinese knowledge mapping candidate's entity and Its corresponding name and alias, obtain Alias information, and the reversed dictionary for constructing alias to candidate entity is used for entity link.Building When dictionary, needs to carry out alias character string unitized processing, for example be converted to lowercase character, leave out spcial character etc., and The entity in alias dictionary is ranked up as popularity in the frequency that knowledge mapping kind occurs by obtaining entity.It identifies After naming Entity recognition, we are searched in alias dictionary using name entity and obtain candidate entity, and knowing according to entity Name degree chooses conduct candidate's entity in the top.

Step S130: the candidate of candidate entity is matched in the Chinese knowledge mapping by relationship templates and is closed System.

Specifically, relationship templates are understood expressed by the input information (such as question sentence) of user by natural language understanding technology Semanteme, and matched with the relationship P in the triple (S, P, O) in Chinese knowledge mapping, which is determined with this Corresponding candidate relationship in the semantic and Chinese knowledge mapping of expression.Wherein, relationship templates include first instance, second instance and Relationship between first instance and second instance.Relationship templates be by extracting some triples in Chinese knowledge mapping, and Relation information is extracted from these triples, to obtain according to the process training of these relation informations corresponding with these relation informations Relationship templates.

Step S140: according to the entity to the candidate relationship, form candidate triple；The wherein candidate ternary Group includes the name entity, the candidate entity and the candidate relationship.

Specifically, the name entity identified using above-mentioned steps, the name entity are corresponding in Chinese knowledge mapping Candidate entity and candidate relationship, to form each candidate triple.

Step S150: the corresponding ranking results of each candidate's triple are obtained based on study order models.

Specifically, each candidate triple is converted into corresponding vector information, and the input as study order models, warp The series of computation of the study order models is crossed, thus output ranking results corresponding with each candidate's triple.Wherein sort As a result can be more more forward more accurate according to ranking, more more inaccurate rule is arranged ranking rearward, it is also possible to other modes, The present embodiment does not limit.

Wherein, study order models are calculated using study sort algorithm.Learn sort algorithm (Learning Torank, LTR) be a kind of supervised learning (SupervisedLearning, SL) sort method.LTR generally has three classes Method: single document method (Pointwise), document is to method (Pairwise), lists of documents method (Listwise).This implementation Study sort algorithm is using document to method (Pairwise) in example.

In one embodiment, study order models are to be formed by second by training first sample and each candidate triple What sample obtained, wherein first sample is the triple being made of the model answer of inputted information.For example, according to a question sentence Model answer triple (name entity, candidate entity, candidate relationship), take 10 times at random in Chinese knowledge mapping Entity is selected, is acquiring candidate relationship according to this candidate entity, finally obtaining 50 triples, (name entity, is waited at candidate entity Select relationship) composition negative sample (N).Wherein the triple (name entity, candidate entity, candidate relationship) of the model answer is positive Sample (P).By a combination in the positive sample (P) and negative sample (N), generate two samples, be (P, N) sample and (N, P) sample.The label of (P, N) sample is 1, and the label of (N, P) sample is 0.Learn order models can according to the sample training and It arrives.

Step S160: according to the ranking results inquiry Chinese knowledge mapping, to obtain answering for the input information Case.

Specifically, according to the ranking results of each candidate triple, candidate triple of the ranking before preset value is chosen, then These selected candidate triples are converted into the query language in Chinese knowledge mapping, to hold in Chinese knowledge mapping The row query statement, returns to the corresponding answer of input information after inquiry.

In short, the answering method, which can use study order models, effectively utilizes external resource, in question and answer corpus data In the case where less, for user the problem of can also obtain accurate answer.

In a second embodiment, referring to FIG. 2, being identified in the input information in step S120 in first embodiment Name entity the step for include:

Step S210 is labeled the input information, obtains annotation results.

Assuming that the question sentence of user's input is q:X=(x₁,x₂,…,x_n),x_iIt indicates each word in question sentence, is marked using BIEO Injecting method marks each word in question sentence, and " B " is the beginning for naming entity, and " I " indicates inside name entity that " E " indicates name The end of entity, " O " indicate it is not name entity.Y=(y₁,y₂,…,y_n) indicate annotation results, it is obtained by the mask method Annotation results score are as follows:

Wherein matrix P ∈ R^K×nThe as state eigenmatrix of condition random field, P_{I, j}Indicate that j-th of word in sentence is marked Note is the score of i-th kind of label, A ∈ R^(K+2)×(k+2)Indicate state-transition matrix, elements A_{I, j}It indicates to shift from i-th kind of label To the score of jth kind label.Mask method is also possible to other mask methods, such as BIO, BIOES etc., and the present embodiment does not limit It is fixed.For example, it is as follows to the mark situation of question sentence to mark set method by BIEO: fishing (O) fish (O) than (O) match (O) in (O) tall building (B- LOC) (O) row (O) is lifted in the city door (I-LOC) (E-LOC).Using mark collection be in order to reduce noise as far as possible, thus identification and The entity accuracy rate extracted is higher.

Step S220 is identified in the input information according to the annotation results by Recognition with Recurrent Neural Network model Name entity.

Specifically, by the annotation results in above-mentioned steps, so that the annotation results of each word are got, further according to the mark As a result, obtaining the vector information of each word.For example, the annotation results of each word are converted to one-hot vector, then by each word One-hot DUAL PROBLEMS OF VECTOR MAPPING be the dense word vector of low-dimensional, the word vector of each word in the sentence is then subjected to successively group Arrangement is closed, to obtain the vector information of entire sentence.The vector information of entire sentence is input to Recognition with Recurrent Neural Network mould again In type, that is, it may recognize that the name entity in question sentence.Wherein Recognition with Recurrent Neural Network model can calculate the input information in question sentence In the corresponding label of each word probability and obtain optimal sequence label.Wherein the optimal sequence label is the life identified Name entity.Recognition with Recurrent Neural Network model can be two-way long short-term memory Recognition with Recurrent Neural Network model, be also possible to condition random Field model etc., the present embodiment to it without limitation.

In third embodiment, referring to FIG. 3, the answering method is also in one embodiment after step S130 Include:

Step S310 calculates the similarity between name entity described in each entity pair and the candidate entity, Described in similarity be according to chinese character similarity, pinyin character similarity, term vector similarity and entity is concerned spends It arrives.

Specifically, chinese character similarity, the pinyin character between entity centering name entity and candidate entity are calculated Similarity, term vector similarity and the concerned degree of entity, comprehensive each similarity is to obtain each entity to corresponding similarity. Wherein, similarity it is higher illustrate to name it is more similar between entity and candidate entity.The method for calculating similarity has based on bag of words mould Type, after it will name entity and candidate entity vectorization, it is converted into the distance calculated in space, apart from smaller similarity It is higher；Also have and press from both sides cosine of an angle between calculating two vectors.The size of the cosine can directly reflect similarity, i.e. cosine is smaller similar It spends higher；The present embodiment to calculate similarity method without limitation.The present embodiment passes through in chinese character, pinyin character, word The different aspects such as vector and attention rate calculate separately similarity, and finally synthesis obtains similarity again, so as to more accurately sentence Similarity degree between disconnected name entity and candidate entity, is also beneficial to find optimal candidate entity.

Step S320, according to each similarity to each entity to sequence, to obtain each entity to corresponding arrangement Ranking.

Specifically, the similarity calculated according to above-mentioned steps, thus according to similarity size to each entity into Row sequence, and then each entity is obtained to arranging the names in all entities pair.Wherein, similarity is higher, illustrates candidate entity Higher with the matching degree of name entity, similarity is lower to be illustrated candidate entity and names the matching degree of entity lower.

Step S330 chooses the corresponding entity pair according to described arrange the names.

Specifically, each entity pair to arrange the names before default ranking is chosen.Wherein, default ranking can be according to practical feelings Condition is set.In the present embodiment, presetting ranking is the 10th, selected so as to select before ranking ten entity pair The candidate entity of the entity centering taken is also closer to the name entity in input information.

In the 4th embodiment, referring to FIG. 4, step S150 includes in one embodiment；

Step S410 calculates the corresponding each feature vector of each triple.

Specifically, the name entity in each triple, candidate entity and candidate relationship are converted into one-hot vector, then It is mapped as the dense word vector of low-dimensional, finally each word vector is arranged again, gets the feature vector of each triple.

Each described eigenvector is input in the study order models to obtain each candidate ternary by step S420 The corresponding ranking results of group.

Specifically, using each feature vector as the input of study order models, the calculating through overfitting order models, output The corresponding ranking results of each triple.

In the 5th embodiment, referring to FIG. 5, step S410 includes: in the 4th implementation

Step S510 calculates the first similarity between the name entity and the candidate entity according to the triple Feature.

Specifically, for triple (name entity, candidate entity, candidate relationship), name entity and candidate entity are calculated Between the first similarity feature.Wherein the first similarity feature can be similar value.

Step S520 removes the name entity in the input information to obtain remaining word, and calculates described surplus The second similarity feature between remaining word and synonym and context vocabulary.

Specifically, the word in phrase is removed in input information user inputted, gets remaining some words Or word, the similarity feature of the word in these words or word and adjacent phrase is calculated, these words or word and its synonym are also calculated Similarity feature, two parts similarity feature combine to obtain the second similarity feature.

Step S530 generates high dimension vector according to the input information, wherein the high dimension vector is according to the input It is generated in information with the presence or absence of default vocabulary.

Specifically, for the natural language question sentence of user's input, whether default vocabulary is appeared according to the word in the question sentence In, to generate high dimension vector corresponding with the question sentence.Wherein each of high dimension vector position all represents a word, such as There are the words in the fruit natural language question sentence, then value in the position is 1, it is otherwise 0.Such as user inputs " aspirin It is which patient eats ", if only this four words of aspirin are present in default vocabulary, height corresponding to the question sentence It is 1 that dimensional vector, which is exactly in the position that this four words of aspirin occur, other are 0, and the dimension of the high dimension vector can be according to reality The setting of border situation.

Step S540 is generated according to the first similarity feature, the second similarity feature and the high dimension vector Described eigenvector.

Specifically, the first similarity characteristic value, the second similarity feature and high dimension vector are spliced, to get Final feature vector.

In the 6th embodiment, referring to FIG. 6, providing a kind of question and answer of Chinese knowledge mapping based on study sequence Device 600.The question and answer system 600 includes:

First obtains module 610, for obtaining the input information of user.

Identification and link module 620, the name entity in the input information for identification, and by the name chain of entities It is connected to candidate entity corresponding with the name entity in the Chinese knowledge mapping, entity pair is formed, wherein the entity To including the name entity and the candidate entity.

Specifically, by carrying out sequence labelling, then root to input information using mark set method, Recognition with Recurrent Neural Network model The identification of name entity is completed according to the result of sequence labelling (specific steps will be discussed in detail in a second embodiment).Such as " cough Cough and what medicine needed to eat ", first pass through BIO mark set method and it be labeled, the vector of the question sentence is obtained according to annotation results Information, then using the vector information as the input of Recognition with Recurrent Neural Network model, thus this name entity that identifies " cough ".It connects , which is corresponded to globally unique identifier (the Globally Unique in Chinese knowledge mapping Identifier, GUID), thus by the name entity link into knowledge mapping it is corresponding candidate entity.In addition, knowledge mapping Each of candidate entity uniquely correspond to a GUID, it is candidate to distinguish the difference in Chinese knowledge mapping by the GUID Entity.

Matching module 630, for matching the candidate entity in the Chinese knowledge mapping by relational model Candidate relationship.

Specifically, relationship templates are understood expressed by the input information (such as question sentence) of user by natural language understanding technology Semanteme, and matched with the relationship P in the triple (S, P, O) in Chinese knowledge mapping, which is determined with this Corresponding candidate relationship in the semantic and Chinese knowledge mapping of expression.Wherein, relationship templates are by Chinese knowledge mapping Some triples are extracted, and extract relation information from these triples, to be obtained according to these relation informations by training Relationship templates corresponding with these relation informations.

Form module 640, for according to the entity to the candidate relationship, form candidate triple；It is wherein described Candidate triple includes the name entity, the candidate entity and the candidate relationship.

Second acquisition module 650, for obtaining the corresponding sequence of each candidate's triple based on study order models As a result.

Specifically, using each candidate triple as the input of study order models, by a system of the study order models Column count, thus output ranking results corresponding with each candidate's triple.Wherein ranking results can it is more forward according to ranking more Accurately, more more inaccurate rule is arranged ranking rearward, is also possible to other modes, the present embodiment does not limit.

Third obtains module 660, described defeated to obtain for inquiring the Chinese knowledge mapping according to the ranking results Enter the answer of information.

In addition, referring to FIG. 7, the question and answer system 600 of the Chinese knowledge mapping based on study sequence further includes line lower module 700, which is used to prepare for the operation of above-mentioned question and answer system.

Line lower module 700 include entity referral rate unit 710, synonym collector unit 720, context excavate unit 730, Question template unit 740 and study sequencing unit 750.

Entity referral rate unit 710 is given a mark for being mentioned number to the candidate entity in Chinese knowledge mapping.Tool Body, referral rate marking is carried out to the candidate entity in Chinese knowledge mapping, wherein the referral rate indicates that candidate's entity is benefited from The degree of concern at family.This part can by be ready for referral rate ranking (such as: the most concerned drug of patient is ranked List), it can also be putd question to by crawling user on the network, the frequency that computational entity is referred to by user.

Synonym collector unit 720 is used to collect the relation name of each candidate relationship in Chinese knowledge mapping, wherein closing It is the synonym that title includes title and title.

Specifically, each candidate relationship has a title, such as " xx drug treatment xx disease in Chinese knowledge mapping This relationship of disease ", title is ... indication ..., but due to the diversity of Chinese natural language, user may " what xx medicine cures mainly " said, " what xx medicine function is " etc..So needing to collect the same of relation name (or relationship predicate) Adopted word.The synonym collector unit 612 is used to collect the relation name of each candidate relationship in Chinese knowledge mapping, this relationship Title includes the synonym of title He the title, so that it is guaranteed that the accuracy of later period question and answer.

Context excavates unit 730 and is used to be based on text mining method, finds out two candidate entities in Chinese knowledge mapping Between connection relationship.Specifically, context excavates unit based entirely on the text mining of remote supervisory.In two candidate entities Between connection relationship (triple for considering that longest 2 is jumped is true) may have it is multiple.In the text set of professional domain, find this two A word that a candidate's entity occurs simultaneously does the analysis of dependency grammar tree to the words, if the two entities are in dependency grammar Minimum path length on tree is less than or equal to 4, then the word on this shortest path is just used as relationship between the two candidate entities (can Can have multiple) cliction up and down (if this word is not the synonym of relationship).The general text information of professional domain is (as specially Industry document etc.) data are abundant, but question and answer corpus (the question and answer corpus for being especially suitable for current knowledge mapping) may be relatively dilute It lacks.By text mining, a large amount of language ambience informations can be provided for the question and answer system, to effectively utilize external resource.

Question template unit 740 is used to question sentence being divided into question sentence form predetermined.Specifically, by question sentence according to pre- The question sentence form first defined is divided, and the search in Chinese knowledge mapping so is also more convenient, is more efficient.This step can be with The relation space that regulation is compared is within jumping apart from main body entity double bounce or three.

Learn sequencing unit 750 to be used to obtain training data according to question sentence.Specifically, study sequencing unit is obtained according to question sentence Training data is got, such a sort algorithm of pairwise learning to rank is based on.Although possible question and answer expect number According to less, but training data can be expanded by way of generating negative sample, obtain the preferable Question-Answering Model of performance.

Wherein, referring to FIG. 8, synonym collector unit 720 include mark subelement 721, recording frequency subelement 722 and Manual examination and verification subelement 723.

Mark subelement 721 is used to mark the relationship of candidate entity in the entity and knowledge mapping in question sentence.Recording frequency Subelement 722 is used to remove entity name, stop words and the punctuation mark in question sentence, obtains remaining word, utilizes the inverse text of word frequency- This frequency approach gives a mark to the remaining word, obtains the score value of remaining word, and record score value is more than the remaining word of preset value Language.

Specifically, after recording frequency subelement 722 is by removing the entity name in question sentence, stop words and punctuation mark, The remaining word in question sentence is got, using the inverse text frequency approach of word frequency-, gives a mark to the residue word, it is higher to collect score Word, such as before score value ranking 15 word.

Wherein, TF-IDF is the abbreviation of Term Frequency-Inverse Document Frequency, i.e. " word frequency- Inverse text frequency ".It consists of two parts, TF and IDF.What TF was indicated is word frequency, and text has namely been made in vectorization before In each word frequency of occurrences statistics, and as text feature.IDF, i.e. " inverse text frequency ".The word frequency of some words is very high but again The property wanted is but very low, and IDF is exactly to help us and react the importance of this word, and then to correct the word only indicated with word frequency special Value indicative.

For summary, IDF reflects the frequency that a word occurs in all texts, if a word is in many texts Middle appearance, then its IDF value should be low, such as " I " word.And in turn if a word occurs in fewer text, So its IDF value should be high.Such as the noun such as " machine learning " of some professions.Such word IDF value should be high.One pole The case where end, if a word all occurs in all texts, its IDF value should be 0.

Manual examination and verification subelement 823 falls synonym unreasonable under each relationship for manual filtration.Specifically, as wished Synset is more accurate, can do certain manual examination and verification, i.e., under every class relationship, manual filtration falls unreasonable synonym.

The present invention also provides a kind of computer equipments, can such as execute smart phone, tablet computer, the notebook electricity of program Brain, desktop computer, rack-mount server, blade server, tower server or Cabinet-type server (including independent clothes Server cluster composed by business device or multiple servers) etc..The computer equipment of the present embodiment includes at least but unlimited In: memory, the processor etc. of connection can be in communication with each other by device bus.

The present embodiment also provides a kind of computer readable storage medium, such as flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory etc.), random access storage device (RAM), static random-access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read only memory (PROM), magnetic storage, magnetic Disk, CD, server, App are stored thereon with computer program, phase are realized when program is executed by processor using store etc. Answer function.The computer readable storage medium of the present embodiment is used for storage electronics 20, this hair is realized when being executed by processor Bright answering method.

The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.

Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art The part contributed out can be embodied in the form of software products, which is stored in a storage medium In (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, computer, clothes Business device, air conditioner or the network equipment etc.) execute method described in each embodiment of the present invention.

The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills Art field, is included within the scope of the present invention.

Claims

1. a kind of answering method, which is characterized in that the answering method includes:

Obtain the input information of user；

Identify it is described input information in name entity, and by the name entity link to it is described Chinese knowledge mapping in institute The corresponding candidate entity of name entity is stated, entity pair is formed, wherein the entity is to including the name entity and the time Select entity；

The candidate relationship of the candidate entity is matched in the Chinese knowledge mapping by relationship templates；

According to the entity to the candidate relationship, candidate triple is formed, wherein candidate's triple includes the life Name entity, the candidate entity and the candidate relationship；

2. answering method as described in claim 1, which is characterized in that the name entity tool in the identification input information Body includes:

The input information is labeled, annotation results are obtained；And according to the annotation results, pass through Recognition with Recurrent Neural Network mould Type identifies the name entity in the input information.

3. answering method as described in claim 1, which is characterized in that the name entity in the identification input information, And into the Chinese knowledge mapping, candidate entity corresponding with the name entity, formation are real by the name entity link After the step of body pair, the answering method further include:

The similarity between name entity described in each entity pair and the candidate entity is calculated, wherein the similarity is It is obtained according to chinese character similarity, pinyin character similarity, term vector similarity and the concerned degree of entity；

4. answering method as described in claim 1, which is characterized in that the relationship templates include first instance, second instance And the relationship between first instance and second instance.

5. answering method as described in claim 1, which is characterized in that it is described based on study order models, obtain each time The corresponding ranking results of triple are selected to specifically include:

Calculate the corresponding each feature vector of each triple；And

Each described eigenvector is input in the study order models to obtain the corresponding row of each candidate's triple Sequence result.

6. answering method as claimed in claim 5, which is characterized in that each feature vector for calculating each triple Step includes:

The name entity in the input information is removed to obtain remaining word, and calculates the remaining word and synonym And the second similarity feature between context vocabulary；

High dimension vector is generated according to the input information, wherein the high dimension vector is whether there is according in the input information What default vocabulary generated；And

According to the first similarity feature, the second similarity feature and the high dimension vector, described eigenvector is generated.

7. answering method as described in claim 1, which is characterized in that the study order models are by training first sample It is formed by what the second sample obtained with each candidate triple, wherein first sample is answered by the standard of the input information The triple that case is constituted.

8. a kind of question and answer system, which is characterized in that the question and answer system includes:

First obtains module, for obtaining the input information of user；

Identification and link module, the name entity in the input information for identification, and by the name entity link to institute Candidate's entity corresponding with the name entity in Chinese knowledge mapping is stated, entity pair is formed, wherein the entity is to including The name entity and the candidate entity；

Matching module is closed for matching the candidate of candidate entity in the Chinese knowledge mapping by relational model System；

Form module, for according to the entity to the candidate relationship, form candidate triple；The wherein candidate ternary Group includes the name entity, the candidate entity and the candidate relationship；

Second acquisition module, for obtaining the corresponding ranking results of each candidate's triple based on study order models；And

Third obtains module, for inquiring the Chinese knowledge mapping according to the ranking results, to obtain the input information Answer.

9. a kind of computer equipment, can run on a memory and on a processor including memory, processor and storage Computer program, which is characterized in that the processor realizes any one of claim 1 to 7 institute when executing the computer program The step of stating answering method.

10. a kind of computer readable storage medium, is stored thereon with computer program, it is characterised in that: the computer program The step of any one of claim 1 to 7 answering method is realized when being executed by processor.