CN109033318A

CN109033318A - Intelligent answer method and device

Info

Publication number: CN109033318A
Application number: CN201810790249.5A
Authority: CN
Inventors: 余军; 罗长寿; 郑亚明; 魏清凤; 王富荣; 曹承忠; 陆阳; 郭强; 于维水; 王静宇
Original assignee: Beijing Academy of Agriculture and Forestry Sciences
Current assignee: Beijing Academy of Agriculture and Forestry Sciences
Priority date: 2018-07-18
Filing date: 2018-07-18
Publication date: 2018-12-18
Anticipated expiration: 2038-07-18
Also published as: CN109033318B

Abstract

The present invention provides a kind of intelligent answer method and device, treats the text answered a question and carries out word segmentation processing, the context for carrying out semantic similarity judgement is determined according to the word segmentation result to be answered a question；A certain number of FAQs are collected according to the context；Word segmentation processing is carried out to the text of all FAQs, context figure is established according to the word segmentation result of all FAQs；The FAQs is calculated and wait the semantic similarity between answering a question according to the word segmentation result of the FAQs, word segmentation result to be answered a question and the context figure for any one of FAQs；Using the corresponding candidate answers of the FAQs with highest similarity as described wait corresponding answer of answering a question.The embodiment of the present invention can more accurately analyze the problem of to be answered and furnish an answer.

Description

Intelligent answer method and device

Technical field

The present invention relates to natural language processing technique fields, more particularly, to intelligent answer method and device.

Background technique

In question answering system, general chat answer push randomness is strong.But in professional application field, reply content needs Precisely.It is known as sentence phase using research of the computer identification " user's enquirement " compared with sentence existing in sentence library carries out semanteme It is studied like degree.It has been a hot spot of research and difficult point as a critical problem in natural language processing.Sentence is similar Degree research is between excavates sentence word itself (as depended on WordNet framework other than relationship and Overlapping Calculation sentence similarity With dependent on Hownet framework and corpus), feature extraction neural network based also starts to be developed.

Calculation method experts and scholars based on semantic similarity have been carried out extensive research.Such as: it is based on word The statistical method of co-occurrence.This method mainly passes through word frequency in sentence and is counted, such as TF-IDF algorithm, Jaccard The improved method etc. of Similarity Coefficient method and Metzler based on overlap.These methods realize it is simple, Efficiently, but completely the morphology and semantic information of sentence are had ignored.Another kind is the method based on morphology and semantic information.The party Method considers semantic information relevant factor, but building is relative complex, such as the Semantic Similarity Measurement based on ontology.Third Kind, it is based on neural network corpus training characteristics extraction method, is also greatly developed in recent years, such as based on the sentence of Word2vec Sub- Semantic Similarity Measurement research etc., depends on the quality and quantity of corpus, focuses on feature extraction, have ignored the reason of sentence justice Solution can not achieve really to semantic excavation.4th kind is then the method for using comprehensive fusion means, is such as based on multiple features Sentence semantic similarity calculating of fusion etc..With going deep into for research, connected applications experience discovery is various in practical applications If method, departing from application scenarios, algorithm either realizes complicated or low efficiency, and uncertain factor interference is more, have certain Operation limitation.Therefore, the prior art provides " a kind of Measurement of word similarity based on context ".This method, On the basis of similarity calculating method, by introducing the context of word, meaning of a word phase is assessed using the concept of fuzzy mathematics Like degree calculation method.The correlation that this method uses for reference degree of membership is fixed, constructs Fuzzy importance of the word in context of co-text, mentions The sentence justice similarity degree of word has been risen, but has had deficiency in sentence entirety sentence justice similarity.

Summary of the invention

The present invention provide a kind of intelligent answer method for overcoming the above problem or at least being partially solved the above problem and Device.

According to the first aspect of the invention, a kind of intelligent answer method is provided, comprising:

It treats the text answered a question and carries out word segmentation processing, be used for according to the word segmentation result determination to be answered a question Carry out the context of semantic similarity judgement；

A certain number of FAQs are collected according to the context；

Word segmentation processing is carried out to the text of all FAQs, according to the word segmentation result of all FAQs Establish context figure；

For any one of FAQs, according to the word segmentation result of the FAQs, participle knot to be answered a question Fruit and the context figure, calculate the FAQs and wait the similarity between answering a question；

Using the corresponding candidate answers of the FAQs with highest similarity as described wait corresponding answer of answering a question Case；

Wherein, the context figure is the non-directed graph of the syntagmatic between each participle for indicating all FAQs.

According to the second aspect of the invention, a kind of intelligent answer device is provided, comprising:

Context obtains module, for according to wait the context determined for carrying out semantic similarity judgement of answering a question；

FAQs obtains module, for collecting a certain number of FAQs according to the context；

Context figure obtains module, word segmentation processing is carried out for the text to all FAQs, according to all described The word segmentation result of FAQs establishes context figure；Wherein, the context figure be indicate all FAQs it is each segment between Syntagmatic non-directed graph；

Similarity calculation module is used for for any one of FAQs, according to the participle knot of the FAQs Fruit, word segmentation result to be answered a question and the context figure, calculate the FAQs and wait the similarity between answering a question；

Answer matches module, for will have the corresponding candidate answers of the FAQs of highest similarity as it is described to It answers a question corresponding answer.

According to the third aspect of the present invention, a kind of electronic equipment is also provided, comprising:

At least one processor；And

At least one processor being connect with the processor communication, in which:

The memory is stored with the program instruction that can be executed by the processor, and the processor calls described program Instruction is able to carry out in the various possible implementations of first aspect intelligently asks provided by any possible implementation Answer method.

According to the fourth aspect of the present invention, a kind of non-transient computer readable storage medium is also provided, it is described non-transient Computer-readable recording medium storage computer instruction, the computer instruction make the computer execute each of first aspect Intelligent answer method provided by any possible implementation in the possible implementation of kind.

Intelligent answer method and device proposed by the present invention treats to answer a question carrying out contextual analysis, obtains subsequent progress The context of semantic similarity judgement, it is clear that the context be it is relevant to wait answer a question, continue to obtain a certain number of with the language The same or similar FAQs in border will be treated to answer a question to be mapped in the same context with FAQs and be analyzed, The precision of variance analysis between raising problem, so that the accuracy of Semantic Similarity Measurement is higher, further according to above-mentioned acquisition The word segmentation result of FAQs constructs context figure, and context figure is according to above-mentioned a certain number of common in embodiments of the present invention Problem building, embody the characteristic of big data, with it is existing based on semantic similarity to be compared wait answer a question and commonly use The context of the context building of problem is entirely different, and the context in the embodiment of the present invention is context macroscopically.The present invention is implemented Example can more accurately analyze the problem of to be answered and furnish an answer.

Detailed description of the invention

Fig. 1 is the flow diagram according to the intelligent answer method of the embodiment of the present invention；

Fig. 2 is the context figure according to the embodiment of the present invention；

Fig. 3 be according to the embodiment of the present invention according to the word segmentation result of FAQs, word segmentation result to be answered a question with And context figure, calculate FAQs and the flow diagram wait the similarity between answering a question；

Fig. 4 is any one participle and the second text that the first text is obtained according to context figure according to the embodiment of the present invention The similarity of this any one participle, to calculate the flow diagram of the offset similarity of the first text and the second text；

Fig. 5 is the functional block diagram according to the intelligent answer device of the embodiment of the present invention；

Fig. 6 is the block diagram according to the electronic equipment of the embodiment of the present invention.

Specific embodiment

With reference to the accompanying drawings and examples, specific embodiments of the present invention will be described in further detail.Implement below Example is not intended to limit the scope of the invention for illustrating the present invention.

Have in the prior art about the calculation method of semantic similarity following several: the first: based on word co-occurrence Statistical method.This method mainly passes through word frequency in sentence and is counted, such as TF-IDF algorithm, Jaccard Similarity The improved method etc. of Coefficient method and Metzler based on overlap.These methods are realized simple, efficient but complete Have ignored the morphology and semantic information of sentence.Second is the method based on morphology and semantic information.This method considers language Adopted information relevant factor, but building is relative complex, such as the Semantic Similarity Measurement based on ontology.The third is based on nerve Network corpus training characteristics extraction method, is also greatly developed, in recent years such as based on the sentence semantic similarity of Word2vec Research etc. is calculated, the quality and quantity of corpus are depended on, focuses on feature extraction, has ignored the understanding of sentence justice, can not achieve true The excavation of face semanteme.4th kind is then the method for using comprehensive fusion means, such as sentence language based on multi-feature fusion Adopted similarity calculation etc..With going deep into for research, connected applications experience discovery, if various methods are detached from practical applications Application scenarios, algorithm either realize complicated or low efficiency, and uncertain factor interference is more, has certain operation to limit to Property.Therefore, the prior art provides " a kind of Measurement of word similarity based on context ".This method, in similarity meter On the basis of calculation method, by introducing the context of word, the concept of fuzzy mathematics is used to assess the meaning of a word mutually to spend calculating side Method.The correlation that this method uses for reference degree of membership is fixed, constructs Fuzzy importance of the word in context of co-text, improves the sentence of word Adopted similarity degree, but have deficiency in sentence entirety sentence justice similarity.

In order to overcome the above problem of the prior art, the embodiment of the present invention provides a kind of calculation method of semantic similarity, Its inventive concept is to treat to answer a question to carry out contextual analysis, obtains the subsequent context for carrying out semantic similarity judgement, it is clear that The context be it is relevant to wait answer a question, continue to obtain it is a certain number of with the same or similar FAQs of the context, with It will treat to answer a question to be mapped in the same context with FAQs and analyze, the essence of the variance analysis between raising problem Degree constructs context further according to the word segmentation result of the FAQs of above-mentioned acquisition so that the accuracy of Semantic Similarity Measurement is higher Figure, context figure is constructed according to above-mentioned a certain number of FAQs in embodiments of the present invention, embodies the spy of big data Property, with it is existing based on semantic similarity to be compared wait answer a question with the context of common problem building context it is complete It is different.Context in the embodiment of the present invention is context macroscopically.The embodiment of the present invention can more accurately be analyzed wait answer The problem of and furnish an answer.

Fig. 1 shows the flow diagram of the calculation method of the semantic similarity of the embodiment of the present invention, as shown, packet It includes:

S101, the text progress word segmentation processing answered a question is treated, is determined according to the word segmentation result to be answered a question For carrying out the context of semantic similarity judgement.

Specifically, the process that the embodiment of the present invention obtains text to be answered a question can be with are as follows:

The text data for receiving the problem of to be answered, as text to be answered a question.

The voice data for receiving the problem of to be answered carries out speech recognition to voice data and obtains by speech recognition Text data, and from the text data Jing Guo speech recognition as text to be answered a question.

It should be understood that the above-mentioned process for obtaining text to be answered a question is only several possible implementations, and Any restriction should not be constituted to the embodiment of the present invention.

In order to more easily describe the basic principle of the embodiment of the present invention, text to be answered a question is referred to as first Text p₁, according to existing participle technique, by p₁Participle is S₁、S₂、…S_m, wherein m is from p₁Obtained participle number is segmented, The participle of text and the number of participle thereby is achieved.

It is used to carry out the context of semantic similarity judgement in the embodiment of the present invention to be true according to the word segmentation result of the first text Fixed.Word segmentation result can show the technical field of text to be answered, environment, theme, the tone etc. information, for example, the first text This are as follows: the method that tomato carries out nursery in greenhouse.By participle, the word segmentation result of the first text are as follows: tomato, greenhouse, nursery, Method, by analyzing word segmentation result, it is known that the context of the first text is agricultural breeding, and especially tomato cultivates field.

S102, a certain number of FAQs are collected according to context.

It should be noted that the embodiment of the present invention after determining context, is collected a certain number of from preset database FAQs.It is understood that storing the FAQs of magnanimity and answering for each FAQs in database Case.These FAQs and answer can be collected from internet by web crawlers processing method.In above-described embodiment In, it determines and cultivates field wait answer a question for tomato, therefore, a certain number of tomato cultivation necks can be found out from database The FAQs in domain.It should be understood that the above-mentioned process according to a certain number of FAQs of context mobile phone is only possible Implementation constitutes any restriction without coping with the application.

S103, word segmentation processing is carried out to the text of all FAQs, is established according to the word segmentation result of all FAQs Context figure；Wherein, context figure is the non-directed graph of the syntagmatic between each participle for indicating all FAQs.

Specifically, the process for carrying out word segmentation processing to the text of FAQs can refer to the description of above-described embodiment, herein It repeats no more.

It should be noted that the context figure of the embodiment of the present invention is a net figure, the vertex in net figure is participle, connection The side of word and word or arc indicate between two words there are syntagmatic (be also possible to weight relationship, the embodiment of the present invention to this not It limits).Context figure is non-directed graph in the embodiment of the present invention, if context non-directed graph G has n vertex, (i.e. n different Word), then adjacency matrix is the square matrix of a n*n, is defined as:

In above formula, g [i] [j] indicates participle i and the word of participle j composition to the value in adjacency matrix, and E indicates two words There are syntagmatics.

For example, there are the text of two FAQs, hereinafter referred to as sample text 1 and sample text 2, sample text 1: the method that tomato carries out nursery in greenhouse；Sample text 2: the method for tomato progress nursery.By segment, go stop words and After Feature Words extract, will propose four words: tomato, greenhouse, nursery, method in order to express easily, are set to herein: V1 (tomato), V2 (greenhouse), V3 (nursery), V4 (method)；There are frontier juncture system (V1V2), (V1V3), and (V2V3), (V3V4) then gives birth to At context figure (embodiment of the present invention does not consider locality, so be non-directed graph) as shown in Fig. 2, corresponding adjacent square Battle array is as follows:

After context figure is converted into adjacency matrix, the degree of any vertex (word) can get (that is, of word associated therewith Number), it is in fact exactly this vertex V_iThe sum of the element of the i-th row in adjacency matrix.Example: V₁Degree is 2, V₂Degree is 2, V₃Degree is 3, V₄Degree is 1；Seek vertex V_iAll of its neighbor point be exactly that will abut against in matrix the i-th row element to scan one time, it is exactly to abut that element, which is 1, The word set of point, all of its neighbor point composition is then the context word set of the word: V₁The context word set of word includes V₂And V₃；V₂The context of word Word set includes V₁And V₃,V₃The context word set of word includes V₁、 V₂And V₄, V₄The context word set of word includes V₃。

S104, for any one FAQs, according to the word segmentation result of the FAQs, participle to be answered a question As a result and the context figure, the FAQs is calculated and wait the semantic similarity between answering a question.

It should be noted that the embodiment of the present invention is in computing semantic similarity, by that will ask wait answer a question with common The word segmentation result of topic is mapped in corresponding context and is calculated, the precision of the variance analysis between raising problem, so that semantic phase The accuracy calculated like degree is higher.

S105, using the corresponding candidate answers of the FAQs with highest similarity as the correspondence to be answered a question Answer；

Specifically, it to above-mentioned each FAQs and after answering a question and carrying out semantic similarity judgement, can obtain One FAQs with highest similarity, using the answer of the FAQs with highest similarity as wait solve question and answer The answer of topic realizes the effect of intelligent answer.

Content based on the above embodiment, as a kind of alternative embodiment, according to the word segmentation result of the FAQs, to The word segmentation result answered a question and the context figure, calculate the FAQs and the process wait the similarity between answering a question It is related to the calculating of two levels --- expression layer similarity and semantic layer similarity.So-called expression layer similarity refers to two The modal similarity degree of sentence, with the number and the relative position in sentence of contained same words or synonym in two sentences To measure.Semantic layer refers to literal cannot directly reflecting, it is to be understood that sentence face implicit semantic.Surface layer similarity meter There are many calculation methods, such as cosine similarity, generalized J accard similarity.And semantic layer similarity then can be using semantic word Allusion quotation and meaning of a word context.

Fig. 3 show the embodiment of the present invention according to the word segmentation result of the FAQs, word segmentation result to be answered a question with And the context figure, the FAQs and the flow diagram wait the semantic similarity between answering a question are calculated, as shown in figure 3, Specifically:

S301, the cosine similarity that the first text and the second text are calculated according to context figure.Wherein, first text For the text of the file to be answered, second text is the text of the FAQs, uses p₂It indicates.Wherein, p₂Participle For W₁、W₂、…W_n, n is from p₂Segment obtained participle number.

It should be noted that the cosine value of the angle between cosine similarity i.e. two vector, cosine similarity is used to table Show the difference degree between two sentences；Cosine similarity lays particular emphasis on the similarities and differences of the vector on direction, that is, the similarities and differences of trend, Rather than the size of absolute distance.Its formula is as follows:

Wherein, x_iIndicate the first text p₁In i-th participle TF-IDF weight, y_iIndicate the second text p₂In i-th The TF-IDF weight of participle, TF-IDF (term frequency-inverse document frequency) is that one kind is used for The common weighting technique of information retrieval and data mining.TF means word frequency (Term Frequency) that IDF means inverse text Frequency index (Inverse Document Frequency).Since context figure is a word set relational graph, in sentence point After word, the weight progress sentence that can be good at calculating word in sentence using TF-IDF takes word, utilizes more than space vector after taking word The measuring similarity of string angle will not be influenced by index scale, and cosine value is fallen within section [0,1], and value is bigger, then difference is got over It is small.

Any one participle of S302, any one participle that the first text is obtained according to context figure and the second text Similarity, to calculate the offset similarity of the first text and the second text.

It should be noted that the embodiment of the present invention is according to the participle in two texts when calculating offset similarity What the similarity in context figure obtained, since context figure has recorded the abutment points (i.e. context word set) of each participle, pass through The approximate situation for comparing abutment points between segmenting two-by-two, that is, can determine whether similarity degree of two texts in word position relationship.

S303, according to context figure obtain the first text in be not present in the second text all participles context word set with And second be not present in text the first text all participles context word set, to calculate the first text and the second text Semantic layer similarity.

It should be noted that semantic layer similarity embody two text implicit semantics relationship since be on literal not The information that can directly translate, the embodiment of the present invention are obtained respectively by context figure and are not present in another text in each text All participles context word set, semantic layer similarity is calculated by above-mentioned two context word set.

It is S304, similar according to the cosine similarity of the first text and the second text, offset similarity and semantic layer Degree calculates the semantic similarity of the first text and the second text.

Method provided in an embodiment of the present invention obtains the cosine phase of the first text and the second text by context figure respectively Like degree, offset similarity and semantic layer similarity, the participle of two texts is obtained in space vector cosine angle and position The similarity of relationship and the mutual word that do not include finally obtain semantic similarity, to can be improved similar in the similarity of semantic layer Spend the reliability and accuracy of judgement.

Content based on the above embodiment obtains the TF- segmented in the first/bis- text as a kind of alternative embodiment The method of IDF weight specifically:

Abutment points of the participles all in first text on context figure are constituted into word set A, by participles all in the second text Abutment points on context figure constitute word set B；

All participles in word set A and word set B are constituted into word set T, T=A ∪ B；

Abutment points of the participle being not present in the second text in first text on context figure are constituted into word set C；

Abutment points of the participle being not present in the first text in second text on context figure are constituted into word set D.

For the participle x in the first/bis- text_i, obtain participle x_iAbutment points on context figure constitute word set E, by word Collect the registration in the participle and word set T in E as participle x_iTF value；With 1g (n_T/n_E∩T) as participle x_iIDF value, will The product of TF value and IDF value is as participle x_iTF-IDF weight, wherein n_TIndicate the sum segmented in word set T, n_E∩TIt indicates The sum of word set E and word set T shared participle.

The method of the IF-IDF weight segmented in the first/bis- text of acquisition of the embodiment of the present invention, in conjunction with participle in context Syntagmatic in figure combines context locating for text and obtains IF-IDF weight, can further increase the cosine of text The precision of similarity.

Content based on the above embodiment obtains any of the first text according to context figure as a kind of alternative embodiment The similarity of any one participle of one participle and the second text, to calculate the offset phase of the first text and the second text Like degree, as shown in figure 4, specifically:

S401, according to the first text p₁Word segmentation result, obtain the first text in segment sum m, the first text length len(P₁) and participle S_iRelative position pos (S in the first text_i)。

It should be noted that participle S_iRelative position pos (S in the first text_i) pass through formula It calculates, wherein i indicates position of the participle in the first text.

S402, according to the second text p₂Word segmentation result, obtain the second text in segment sum n, the second text length len(P₂) and participle W_jRelative position pos (W in the second text_j)；.

It should be noted that participle W_jRelative position pos (W in the second text_j) pass through formula It calculates, wherein j indicates position of the participle in the second text.It should be noted that the embodiment of the present invention to step S401 with The sequencing of S402 is not construed as limiting.

S403, participle S is calculated according to context figure_iWith participle W_jSimilarity sim (S_i, W_j)。

It should be noted that the similarity between the context calculating participle just for participle, this hair is different from the prior art Bright embodiment obtains participle S especially by context figure_iWith participle W_jAbutment points, obtain similarity by comparing adjacent point data sim(S_i, W_j), that is, realize the similarity judgement segmented in macroscopical context.

S404, according to formulaMeter Calculate the first text p₁With the second text p₂Offset similarity Sim_p(p₁, p₂)。

It should be noted that by the formula of offset similarity it is found that this two when the similarity segmented when two is consistent The relative position of a participle is more consistent, then total offset similarity is bigger, and when the relative position of two participles is consistent, The more big then total offset similarity of the similarity of participle is bigger.

The method provided in an embodiment of the present invention for calculating offset similarity, obtains the inclined of two texts from context figure Shifting amount similarity only considers the offset similarity that the context relation of participle obtains compared with prior art, is further promoted Difference precision between text, so that the accuracy of Semantic Similarity Measurement is higher.

Content based on the above embodiment calculates participle S according to the context figure as a kind of alternative embodiment_iWith point Word W_jSimilarity sim (S_i, W_j), specifically:

Participle S is obtained on context figure_iAbutment points π (S_i) and degree len (π (S_i))；

Participle W is obtained on context figure_jAbutment points π (W_j) and degree len (π (W_j))；

According to formulaCalculate similarity sim (S_i, W_j)；

Wherein, T (π (S_i)∩π(W_j)) indicate participle S_iWith participle W_jShared abutment points.

The method provided in an embodiment of the present invention for calculating offset similarity obtains point of two texts from context figure Similarity between word, only considers the context relation of participle compared with prior art, further promotes the difference essence between text Degree, so that the accuracy of Semantic Similarity Measurement is higher.

Content based on the above embodiment is obtained in the first text according to context figure and is not deposited as a kind of alternative embodiment It is to be not present in the language of all participles of the first text in the context word set and the second text of all participles of the second text Border word set, to calculate the semantic layer similarity of the first text and the second text, specifically:

In the first text p₁The second text p of middle acquisition₂In the participle that is not present, first participle collection is constituted, on context figure The context word that the first participle concentrates all participles is obtained, the first context word set π (P is constituted₁), in the second text p₂It is middle to obtain first Text p₁In the participle that is not present, constitute the second participle collection, the context that the second participle concentrates all participles obtained on context figure Word constitutes the second context word set π (P₂)。

With the first text are as follows: the method that tomato carries out nursery in greenhouse, the second text are as follows: U.S. tomato carries out nursery Method is illustrated, the first text word segmentation result are as follows: tomato, greenhouse, nursery, method, the word segmentation result of the second text are as follows: beauty State, tomato, nursery, method obtain in context figure then the participle that the second text in the first text is not present is greenhouse Participle: the context word set in greenhouse.Similarly, the participle of the first text supplement in the second text is the U.S., is obtained in context figure Take participle: the context word set in the U.S..

According to formulaCalculate the semanteme of the first text and the second text Layer similarity Sim_L(p₁, p₂)；

Wherein, work as p₁And p₂In be not present antonym when, α=1；Work as p₁And p₂In there are when antonym, α=- 1；T(π (P₁)∩π(P₂)) indicate π (P₁) and π (P₂) in share context word； T(π(P₁)∪π(P₂)) indicate π (P₁) and π (P₂) in institute Some context words.

It should be noted that when calculating semantic layer similarity using above-mentioned formula, it is also necessary in advance to the first text and Whether retrieved containing antonym in second text.When containing antonym, the semanteme of two texts is with greater probability Opposite.According to π (P₁) and π (P₂) in share context word account for π (P₁) and π (P₂) in all context word ratio and be The no state containing antonym, the embodiment of the present invention realize the calculating to semantic layer similarity.The embodiment of the present invention provides Method combine context figure in the case where, analyze in two sentences mutually do not include word semantic layer similarity have more High precision.

Content based on the above embodiment, as a kind of alternative embodiment, according to the cosine of the first text and the second text Similarity, offset similarity and semantic layer similarity calculate the semantic similarity of the first text and the second text, specifically Are as follows:

According to formula: Sim_b(p₁, p₂)=Cosin (p₁, p₂)+α₁×Sim_p(p₁, p₂) obtain the first text p₁With second Text p₂Expression layer similarity Sim_b(p₁, p₂)；

According to formula: m (p₁, p₂)=Sim_b(p₁, p₂)+β₁×Sim_L(p₁, p₂) obtain the first text p₁With the second text p₂ Semantic similarity m (p₁, p₂)；

Wherein, Cosin (p₁, p₂)、Sim_p(p₁, p₂) and Sim_L(p₁, p₂) respectively indicate the first text p₁With the second text p₂Cosine similarity, offset similarity and semantic layer similarity, α₁Indicate offset similarity for expression layer similarity Impact factor, β₁Indicate semantic layer similarity for the impact factor of semantic similarity.

It should be noted that cosine similarity and offset similarity are collectively formed expression layer phase by the embodiment of the present invention Like degree, semantic pixel is obtained further according to expression layer similarity and semantic layer similarity are comprehensive.The embodiment of the present invention fully considers Macroscopical context has carried out the excavation of deeper degree to semantic image, to semanteme.

Content based on the above embodiment passes through practice analysis α as a kind of alternative embodiment₁Value should ensure that with partially The product of shifting amount similarity is less than cosine similarity value, while guaranteeing α₁Product with offset similarity is with cosine similarity value Become larger by 0 and become larger, starts to be become smaller with cosine similarity value by becoming larger when reaching a certain value.Therefore, according to public affairs Formula: α₁=(1-Cosin (p₁, p₂))×Cosin(p₁, p₂) obtain impact factor α₁；

Pass through practice analysis β₁Value, which should ensure that, is less than expression layer similarity value with the product of semantic layer similarity, protects simultaneously Demonstrate,prove β₁Become larger with the product of semantic layer similarity as expression layer similarity value becomes larger by 0, when reaching a certain Zhi Lin circle point Start to be become smaller with expression layer similarity value by becoming larger.Therefore, according to formula: β₁=(1-Sim_b(p₁, p₂))×Sim_b(p₁, p₂) obtain impact factor β₁。

According to another aspect of the present invention, the embodiment of the present invention also provides a kind of intelligent answer device, referring to Fig. 5, figure 5 show the functional block diagram of the intelligent answer device of the embodiment of the present invention, the system be used in foregoing embodiments according to It answers a question and matches answer with the semantic similarity of FAQs.Therefore, in the intelligent answer method in foregoing embodiments Description and definition, can be used for the understanding of each execution module in the embodiment of the present invention.

As shown, the intelligent answer device includes:

Context obtains module 501, word segmentation processing is carried out for treating the text answered a question, according to described wait solve question and answer The word segmentation result of topic determines the context for carrying out semantic similarity judgement；

FAQs obtains module 502, for collecting a certain number of FAQs according to the context；

Context figure obtains module 503, word segmentation processing is carried out for the text to all FAQs, according to all The word segmentation result of the FAQs establishes context figure；Wherein, the context figure is each point for indicating all FAQs The non-directed graph of syntagmatic between word；

Similarity calculation module 504 is used for for any one of FAQs, according to the participle of the FAQs As a result, word segmentation result to be answered a question and the context figure, calculate the FAQs to wait similar between answering a question Degree；

Answer matches module 505, for that will have described in the corresponding candidate answers conduct of the FAQs of highest similarity Wait corresponding answer of answering a question.

The intelligent answer device of the embodiment of the present invention obtains module by context and determines according to wait answer a question for carrying out The sentence of semantic similarity judgement, then module is obtained by FAQs, a certain number of FAQs are collected according to context, with It will treat to answer a question to be mapped in the same context with FAQs and analyze, the essence of the variance analysis between raising problem Degree obtains module according to the receipts of the FAQs of mobile phone by context figure so that the accuracy of Semantic Similarity Measurement is higher Collection word segmentation result establishes context figure, and the phase wait answer a question with FAQs is calculated according to context figure by similarity calculation module Like degree, context figure is constructed according to above-mentioned a certain number of FAQs in embodiments of the present invention, embodies big data Characteristic, with it is existing based on semantic similarity to be compared wait answer a question with the context of common problem building context it is complete It is complete different.Context in the embodiment of the present invention is context macroscopically, finally will have highest similar by answer matches module The corresponding candidate answers of the FAQs of degree are as described wait corresponding answer of answering a question.The embodiment of the present invention can be more quasi- It really analyzes the problem of to be answered and furnishes an answer.

The embodiment of the invention provides a kind of electronic equipment.Referring to Fig. 6, which includes: processor (processor) 601, memory (memory) 602 and bus 603；

Wherein, processor 601 and memory 602 complete mutual communication by bus 603 respectively；Processor 601 is used In calling the program instruction in memory 602, to execute the calculation method of semantic similarity provided by above-described embodiment, example Such as include: to treat the text answered a question to carry out word segmentation processing, according to the word segmentation result to be answered a question determine for into The context of row semantic similarity judgement；A certain number of FAQs are collected according to the context；It described common is asked to all The text of topic carries out word segmentation processing, establishes context figure according to the word segmentation result of all FAQs；For any one institute FAQs is stated, according to the word segmentation result of the FAQs, word segmentation result to be answered a question and the context figure, is calculated The FAQs and wait the similarity between answering a question；The corresponding candidate answers of FAQs with highest similarity are made To be described wait corresponding answer of answering a question；Wherein, the context figure be indicate all FAQs it is each segment between The non-directed graph of syntagmatic.

The embodiment of the present invention provides a kind of non-transient computer readable storage medium, the non-transient computer readable storage Medium storing computer instruction, the computer instruction make computer execute the meter of semantic similarity provided by above-described embodiment Calculation method, for example, treat the text answered a question and carry out word segmentation processing, according to the word segmentation result to be answered a question Determine the context for carrying out semantic similarity judgement；A certain number of FAQs are collected according to the context；To all The text of the FAQs carries out word segmentation processing, establishes context figure according to the word segmentation result of all FAQs；For Any one of FAQs, according to the word segmentation result of the FAQs, word segmentation result to be answered a question and institute's predicate Border figure calculates the FAQs and wait the similarity between answering a question；By the corresponding time of FAQs with highest similarity Select answer as described wait corresponding answer of answering a question；Wherein, the context figure is to indicate all FAQs The non-directed graph of syntagmatic between each participle.

The apparatus embodiments described above are merely exemplary, wherein unit can be with as illustrated by the separation member It is or may not be and be physically separated, component shown as a unit may or may not be physical unit, Can be in one place, or may be distributed over multiple network units.It can select according to the actual needs wherein Some or all of the modules achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying creativeness Labour in the case where, it can understand and implement.

Through the above description of the embodiments, those skilled in the art can be understood that each embodiment It can realize by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Based on such reason Solution, substantially the part that contributes to existing technology can embody above-mentioned technical proposal in the form of software products in other words Out, which may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, It uses including some instructions so that a computer equipment (can be personal computer, server or the network equipment etc.) is held The method of certain parts of each embodiment of row or embodiment.

Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations；To the greatest extent Present invention has been described in detail with reference to the aforementioned embodiments for pipe, those skilled in the art should understand that: it is still It is possible to modify the technical solutions described in the foregoing embodiments, or part of technical characteristic is equally replaced It changes；And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution Spirit and scope.

Claims

1. a kind of intelligent answer method characterized by comprising

It treats the text answered a question and carries out word segmentation processing, determined according to the word segmentation result to be answered a question for carrying out language The context of adopted similarity judgement；

A certain number of FAQs are collected according to the context；

Word segmentation processing is carried out to the text of all FAQs, language is established according to the word segmentation result of all FAQs Border figure；

For any one of FAQs, according to the word segmentation result of the FAQs, word segmentation result to be answered a question with And the context figure, calculate the FAQs and wait the semantic similarity between answering a question；

Using the corresponding candidate answers of the FAQs with highest similarity as described wait corresponding answer of answering a question；

2. intelligent answer method according to claim 1, which is characterized in that the participle knot according to the FAQs It is similar to wait the semanteme between answering a question to calculate the FAQs for fruit, word segmentation result to be answered a question and the context figure Degree, specifically:

The cosine similarity of the first text and the second text is calculated according to the context figure；Wherein, first text is described The text of file to be answered, second text are the text of the FAQs；

The phase of any one participle of first text and any one participle of the second text is obtained according to the context figure Like degree, to calculate the offset similarity of first text and the second text；

The context word set that all participles of second text are not present in first text is obtained according to the context figure, And the context word set of all participles of first text is not present in second text, to calculate first text With the semantic layer similarity of the second text；

According to the cosine similarity of first text and the second text, offset similarity and semantic layer similarity, calculate The semantic similarity of first text and the second text.

3. intelligent answer method according to claim 2, which is characterized in that described to obtain described the according to the context figure The similarity of any one participle of any one participle and the second text of one text, to calculate first text and second The offset similarity of text, specifically:

According to the first text p₁Word segmentation result, obtain the length len that sum m, the first text are segmented in first text (P₁) and participle S_iRelative position pos (S in the first text_i)；

According to the second text p₂Word segmentation result, obtain the length len that sum n, the second text are segmented in second text (P₂) and participle W_jRelative position pos (W in the second text_j)；

Participle S is calculated according to the context figure_iWith participle W_jSimilarity sim (S_i, W_j)；

According to formulaCalculate the first text p₁ With the second text p₂Offset similarity Sim_p(p₁, p₂)。

4. intelligent answer method according to claim 3, which is characterized in that described to calculate participle S according to the context figure_i With participle W_jSimilarity sim (S_i, W_j), specifically:

Participle S is obtained on the context figure_iAbutment points π (S_i) and degree len (π (S_i))；

Participle W is obtained on the context figure_jAbutment points π (W_j) and degree len (π (W_j))；

According to formulaCalculate similarity sim (S_i, W_j)；

5. intelligent answer method according to claim 2, which is characterized in that described to obtain described the according to the context figure It is not present in being not present in institute in the context word set and second text of all participles of second text in one text The context word set of all participles of the first text is stated, to calculate the semantic layer similarity of first text and the second text, tool Body are as follows:

In the first text p₁It is middle to obtain the second text p₂In the participle that is not present, first participle collection is constituted, in the context figure The upper context word for obtaining the first participle and concentrating all participles, constitutes the first context word set π (P₁)；

In the second text p₂It is middle to obtain the first text p₁In the participle that is not present, the second participle collection is constituted, in institute's predicate The context word that second participle concentrates all participles is obtained on the figure of border, constitutes the second context word set π (P₂)；

According to formulaJuice calculates the semanteme of first text and the second text Layer similarity Sim_L(p₁, p₂)；

Wherein, work as p₁And p₂In be not present antonym when, α=1；Work as p₁And p₂In there are when antonym, α=- 1；T(π(P₁)∩π (P₂)) indicate the π (P₁) and π (P₂) in share context word；T(π(P₁)∪π(P₂)) indicate π (P₁) and π (P₂) in it is all Context word.

6. intelligent answer method according to claim 2, which is characterized in that described according to first text and the second text This cosine similarity, offset similarity and semantic layer similarity calculates the semanteme of first text and the second text Similarity, specifically:

According to formula: Sim_b(p₁, p₂)=Cosin (p₁, p₂)+α₁×Sim_p(p₁, p₂) obtain the first text p₁With the second text p₂ Expression layer similarity Sim_b(p₁, p₂)；

According to formula: m (p₁, p₂)=Sim_b(p₁, p₂)+β₁×Sim_L(p₁, p₂) obtain the first text p₁With the second text p₂Language Adopted similarity m (p₁, p₂)；

Wherein, Cosin (p₁, p₂)、Sim_p(p₁, p₂) and Sim_L(p₁, p₂) respectively indicate the first text p₁With the second text p₂It is remaining String similarity, offset similarity and semantic layer similarity, α₁Indicate influence of the offset similarity for expression layer similarity The factor, β₁Indicate semantic layer similarity for the impact factor of semantic similarity.

7. intelligent answer method according to claim 6, which is characterized in that

According to formula: α₁=(1-Cosin (p₁, p₂))×Cosin(p₁, p₂) obtain impact factor α₁；

According to formula: β₁=(1-Sim_b(p₁, p₂))×Sim_b(p₁, p₂) obtain impact factor β₁。

8. a kind of intelligent answer device characterized by comprising

Context obtains module, word segmentation processing is carried out for treating the text answered a question, according to the participle to be answered a question As a result the context for carrying out semantic similarity judgement is determined；

Context figure obtains module, word segmentation processing is carried out for the text to all FAQs, according to all described common The word segmentation result of problem establishes context figure；Wherein, the context figure is the group between each participle for indicating all FAQs The non-directed graph of conjunction relationship；

Similarity calculation module is used for for any one of FAQs, according to the word segmentation result of the FAQs, wait solve The word segmentation result of question and answer topic and the context figure, calculate the FAQs and wait the similarity between answering a question；

Answer matches module, for being answered a question using the corresponding candidate answers of the FAQs with highest similarity as described wait solve Inscribe corresponding answer.

9. a kind of electronic equipment characterized by comprising

At least one processor；And

The memory is stored with the program instruction that can be executed by the processor, and the processor calls described program to instruct energy Enough methods executed as described in claim 1 to 7 is any.

10. a kind of non-transient computer readable storage medium, which is characterized in that the non-transient computer readable storage medium is deposited Computer instruction is stored up, the computer instruction makes the computer execute the method as described in claim 1 to 7 is any.