CN109033318A - Intelligent answer method and device - Google Patents

Intelligent answer method and device Download PDF

Info

Publication number
CN109033318A
CN109033318A CN201810790249.5A CN201810790249A CN109033318A CN 109033318 A CN109033318 A CN 109033318A CN 201810790249 A CN201810790249 A CN 201810790249A CN 109033318 A CN109033318 A CN 109033318A
Authority
CN
China
Prior art keywords
text
context
similarity
faqs
participle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810790249.5A
Other languages
Chinese (zh)
Other versions
CN109033318B (en
Inventor
余军
罗长寿
郑亚明
魏清凤
王富荣
曹承忠
陆阳
郭强
于维水
王静宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Academy of Agriculture and Forestry Sciences
Original Assignee
Beijing Academy of Agriculture and Forestry Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Academy of Agriculture and Forestry Sciences filed Critical Beijing Academy of Agriculture and Forestry Sciences
Priority to CN201810790249.5A priority Critical patent/CN109033318B/en
Publication of CN109033318A publication Critical patent/CN109033318A/en
Application granted granted Critical
Publication of CN109033318B publication Critical patent/CN109033318B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The present invention provides a kind of intelligent answer method and device, treats the text answered a question and carries out word segmentation processing, the context for carrying out semantic similarity judgement is determined according to the word segmentation result to be answered a question;A certain number of FAQs are collected according to the context;Word segmentation processing is carried out to the text of all FAQs, context figure is established according to the word segmentation result of all FAQs;The FAQs is calculated and wait the semantic similarity between answering a question according to the word segmentation result of the FAQs, word segmentation result to be answered a question and the context figure for any one of FAQs;Using the corresponding candidate answers of the FAQs with highest similarity as described wait corresponding answer of answering a question.The embodiment of the present invention can more accurately analyze the problem of to be answered and furnish an answer.

Description

Intelligent answer method and device
Technical field
The present invention relates to natural language processing technique fields, more particularly, to intelligent answer method and device.
Background technique
In question answering system, general chat answer push randomness is strong.But in professional application field, reply content needs Precisely.It is known as sentence phase using research of the computer identification " user's enquirement " compared with sentence existing in sentence library carries out semanteme It is studied like degree.It has been a hot spot of research and difficult point as a critical problem in natural language processing.Sentence is similar Degree research is between excavates sentence word itself (as depended on WordNet framework other than relationship and Overlapping Calculation sentence similarity With dependent on Hownet framework and corpus), feature extraction neural network based also starts to be developed.
Calculation method experts and scholars based on semantic similarity have been carried out extensive research.Such as: it is based on word The statistical method of co-occurrence.This method mainly passes through word frequency in sentence and is counted, such as TF-IDF algorithm, Jaccard The improved method etc. of Similarity Coefficient method and Metzler based on overlap.These methods realize it is simple, Efficiently, but completely the morphology and semantic information of sentence are had ignored.Another kind is the method based on morphology and semantic information.The party Method considers semantic information relevant factor, but building is relative complex, such as the Semantic Similarity Measurement based on ontology.Third Kind, it is based on neural network corpus training characteristics extraction method, is also greatly developed in recent years, such as based on the sentence of Word2vec Sub- Semantic Similarity Measurement research etc., depends on the quality and quantity of corpus, focuses on feature extraction, have ignored the reason of sentence justice Solution can not achieve really to semantic excavation.4th kind is then the method for using comprehensive fusion means, is such as based on multiple features Sentence semantic similarity calculating of fusion etc..With going deep into for research, connected applications experience discovery is various in practical applications If method, departing from application scenarios, algorithm either realizes complicated or low efficiency, and uncertain factor interference is more, have certain Operation limitation.Therefore, the prior art provides " a kind of Measurement of word similarity based on context ".This method, On the basis of similarity calculating method, by introducing the context of word, meaning of a word phase is assessed using the concept of fuzzy mathematics Like degree calculation method.The correlation that this method uses for reference degree of membership is fixed, constructs Fuzzy importance of the word in context of co-text, mentions The sentence justice similarity degree of word has been risen, but has had deficiency in sentence entirety sentence justice similarity.
Summary of the invention
The present invention provide a kind of intelligent answer method for overcoming the above problem or at least being partially solved the above problem and Device.
According to the first aspect of the invention, a kind of intelligent answer method is provided, comprising:
It treats the text answered a question and carries out word segmentation processing, be used for according to the word segmentation result determination to be answered a question Carry out the context of semantic similarity judgement;
A certain number of FAQs are collected according to the context;
Word segmentation processing is carried out to the text of all FAQs, according to the word segmentation result of all FAQs Establish context figure;
For any one of FAQs, according to the word segmentation result of the FAQs, participle knot to be answered a question Fruit and the context figure, calculate the FAQs and wait the similarity between answering a question;
Using the corresponding candidate answers of the FAQs with highest similarity as described wait corresponding answer of answering a question Case;
Wherein, the context figure is the non-directed graph of the syntagmatic between each participle for indicating all FAQs.
According to the second aspect of the invention, a kind of intelligent answer device is provided, comprising:
Context obtains module, for according to wait the context determined for carrying out semantic similarity judgement of answering a question;
FAQs obtains module, for collecting a certain number of FAQs according to the context;
Context figure obtains module, word segmentation processing is carried out for the text to all FAQs, according to all described The word segmentation result of FAQs establishes context figure;Wherein, the context figure be indicate all FAQs it is each segment between Syntagmatic non-directed graph;
Similarity calculation module is used for for any one of FAQs, according to the participle knot of the FAQs Fruit, word segmentation result to be answered a question and the context figure, calculate the FAQs and wait the similarity between answering a question;
Answer matches module, for will have the corresponding candidate answers of the FAQs of highest similarity as it is described to It answers a question corresponding answer.
According to the third aspect of the present invention, a kind of electronic equipment is also provided, comprising:
At least one processor;And
At least one processor being connect with the processor communication, in which:
The memory is stored with the program instruction that can be executed by the processor, and the processor calls described program Instruction is able to carry out in the various possible implementations of first aspect intelligently asks provided by any possible implementation Answer method.
According to the fourth aspect of the present invention, a kind of non-transient computer readable storage medium is also provided, it is described non-transient Computer-readable recording medium storage computer instruction, the computer instruction make the computer execute each of first aspect Intelligent answer method provided by any possible implementation in the possible implementation of kind.
Intelligent answer method and device proposed by the present invention treats to answer a question carrying out contextual analysis, obtains subsequent progress The context of semantic similarity judgement, it is clear that the context be it is relevant to wait answer a question, continue to obtain a certain number of with the language The same or similar FAQs in border will be treated to answer a question to be mapped in the same context with FAQs and be analyzed, The precision of variance analysis between raising problem, so that the accuracy of Semantic Similarity Measurement is higher, further according to above-mentioned acquisition The word segmentation result of FAQs constructs context figure, and context figure is according to above-mentioned a certain number of common in embodiments of the present invention Problem building, embody the characteristic of big data, with it is existing based on semantic similarity to be compared wait answer a question and commonly use The context of the context building of problem is entirely different, and the context in the embodiment of the present invention is context macroscopically.The present invention is implemented Example can more accurately analyze the problem of to be answered and furnish an answer.
Detailed description of the invention
Fig. 1 is the flow diagram according to the intelligent answer method of the embodiment of the present invention;
Fig. 2 is the context figure according to the embodiment of the present invention;
Fig. 3 be according to the embodiment of the present invention according to the word segmentation result of FAQs, word segmentation result to be answered a question with And context figure, calculate FAQs and the flow diagram wait the similarity between answering a question;
Fig. 4 is any one participle and the second text that the first text is obtained according to context figure according to the embodiment of the present invention The similarity of this any one participle, to calculate the flow diagram of the offset similarity of the first text and the second text;
Fig. 5 is the functional block diagram according to the intelligent answer device of the embodiment of the present invention;
Fig. 6 is the block diagram according to the electronic equipment of the embodiment of the present invention.
Specific embodiment
With reference to the accompanying drawings and examples, specific embodiments of the present invention will be described in further detail.Implement below Example is not intended to limit the scope of the invention for illustrating the present invention.
Have in the prior art about the calculation method of semantic similarity following several: the first: based on word co-occurrence Statistical method.This method mainly passes through word frequency in sentence and is counted, such as TF-IDF algorithm, Jaccard Similarity The improved method etc. of Coefficient method and Metzler based on overlap.These methods are realized simple, efficient but complete Have ignored the morphology and semantic information of sentence.Second is the method based on morphology and semantic information.This method considers language Adopted information relevant factor, but building is relative complex, such as the Semantic Similarity Measurement based on ontology.The third is based on nerve Network corpus training characteristics extraction method, is also greatly developed, in recent years such as based on the sentence semantic similarity of Word2vec Research etc. is calculated, the quality and quantity of corpus are depended on, focuses on feature extraction, has ignored the understanding of sentence justice, can not achieve true The excavation of face semanteme.4th kind is then the method for using comprehensive fusion means, such as sentence language based on multi-feature fusion Adopted similarity calculation etc..With going deep into for research, connected applications experience discovery, if various methods are detached from practical applications Application scenarios, algorithm either realize complicated or low efficiency, and uncertain factor interference is more, has certain operation to limit to Property.Therefore, the prior art provides " a kind of Measurement of word similarity based on context ".This method, in similarity meter On the basis of calculation method, by introducing the context of word, the concept of fuzzy mathematics is used to assess the meaning of a word mutually to spend calculating side Method.The correlation that this method uses for reference degree of membership is fixed, constructs Fuzzy importance of the word in context of co-text, improves the sentence of word Adopted similarity degree, but have deficiency in sentence entirety sentence justice similarity.
In order to overcome the above problem of the prior art, the embodiment of the present invention provides a kind of calculation method of semantic similarity, Its inventive concept is to treat to answer a question to carry out contextual analysis, obtains the subsequent context for carrying out semantic similarity judgement, it is clear that The context be it is relevant to wait answer a question, continue to obtain it is a certain number of with the same or similar FAQs of the context, with It will treat to answer a question to be mapped in the same context with FAQs and analyze, the essence of the variance analysis between raising problem Degree constructs context further according to the word segmentation result of the FAQs of above-mentioned acquisition so that the accuracy of Semantic Similarity Measurement is higher Figure, context figure is constructed according to above-mentioned a certain number of FAQs in embodiments of the present invention, embodies the spy of big data Property, with it is existing based on semantic similarity to be compared wait answer a question with the context of common problem building context it is complete It is different.Context in the embodiment of the present invention is context macroscopically.The embodiment of the present invention can more accurately be analyzed wait answer The problem of and furnish an answer.
Fig. 1 shows the flow diagram of the calculation method of the semantic similarity of the embodiment of the present invention, as shown, packet It includes:
S101, the text progress word segmentation processing answered a question is treated, is determined according to the word segmentation result to be answered a question For carrying out the context of semantic similarity judgement.
Specifically, the process that the embodiment of the present invention obtains text to be answered a question can be with are as follows:
The text data for receiving the problem of to be answered, as text to be answered a question.
The voice data for receiving the problem of to be answered carries out speech recognition to voice data and obtains by speech recognition Text data, and from the text data Jing Guo speech recognition as text to be answered a question.
It should be understood that the above-mentioned process for obtaining text to be answered a question is only several possible implementations, and Any restriction should not be constituted to the embodiment of the present invention.
In order to more easily describe the basic principle of the embodiment of the present invention, text to be answered a question is referred to as first Text p1, according to existing participle technique, by p1Participle is S1、S2、…Sm, wherein m is from p1Obtained participle number is segmented, The participle of text and the number of participle thereby is achieved.
It is used to carry out the context of semantic similarity judgement in the embodiment of the present invention to be true according to the word segmentation result of the first text Fixed.Word segmentation result can show the technical field of text to be answered, environment, theme, the tone etc. information, for example, the first text This are as follows: the method that tomato carries out nursery in greenhouse.By participle, the word segmentation result of the first text are as follows: tomato, greenhouse, nursery, Method, by analyzing word segmentation result, it is known that the context of the first text is agricultural breeding, and especially tomato cultivates field.
S102, a certain number of FAQs are collected according to context.
It should be noted that the embodiment of the present invention after determining context, is collected a certain number of from preset database FAQs.It is understood that storing the FAQs of magnanimity and answering for each FAQs in database Case.These FAQs and answer can be collected from internet by web crawlers processing method.In above-described embodiment In, it determines and cultivates field wait answer a question for tomato, therefore, a certain number of tomato cultivation necks can be found out from database The FAQs in domain.It should be understood that the above-mentioned process according to a certain number of FAQs of context mobile phone is only possible Implementation constitutes any restriction without coping with the application.
S103, word segmentation processing is carried out to the text of all FAQs, is established according to the word segmentation result of all FAQs Context figure;Wherein, context figure is the non-directed graph of the syntagmatic between each participle for indicating all FAQs.
Specifically, the process for carrying out word segmentation processing to the text of FAQs can refer to the description of above-described embodiment, herein It repeats no more.
It should be noted that the context figure of the embodiment of the present invention is a net figure, the vertex in net figure is participle, connection The side of word and word or arc indicate between two words there are syntagmatic (be also possible to weight relationship, the embodiment of the present invention to this not It limits).Context figure is non-directed graph in the embodiment of the present invention, if context non-directed graph G has n vertex, (i.e. n different Word), then adjacency matrix is the square matrix of a n*n, is defined as:
In above formula, g [i] [j] indicates participle i and the word of participle j composition to the value in adjacency matrix, and E indicates two words There are syntagmatics.
For example, there are the text of two FAQs, hereinafter referred to as sample text 1 and sample text 2, sample text 1: the method that tomato carries out nursery in greenhouse;Sample text 2: the method for tomato progress nursery.By segment, go stop words and After Feature Words extract, will propose four words: tomato, greenhouse, nursery, method in order to express easily, are set to herein: V1 (tomato), V2 (greenhouse), V3 (nursery), V4 (method);There are frontier juncture system (V1V2), (V1V3), and (V2V3), (V3V4) then gives birth to At context figure (embodiment of the present invention does not consider locality, so be non-directed graph) as shown in Fig. 2, corresponding adjacent square Battle array is as follows:
After context figure is converted into adjacency matrix, the degree of any vertex (word) can get (that is, of word associated therewith Number), it is in fact exactly this vertex ViThe sum of the element of the i-th row in adjacency matrix.Example: V1Degree is 2, V2Degree is 2, V3Degree is 3, V4Degree is 1;Seek vertex ViAll of its neighbor point be exactly that will abut against in matrix the i-th row element to scan one time, it is exactly to abut that element, which is 1, The word set of point, all of its neighbor point composition is then the context word set of the word: V1The context word set of word includes V2And V3;V2The context of word Word set includes V1And V3,V3The context word set of word includes V1、 V2And V4, V4The context word set of word includes V3
S104, for any one FAQs, according to the word segmentation result of the FAQs, participle to be answered a question As a result and the context figure, the FAQs is calculated and wait the semantic similarity between answering a question.
It should be noted that the embodiment of the present invention is in computing semantic similarity, by that will ask wait answer a question with common The word segmentation result of topic is mapped in corresponding context and is calculated, the precision of the variance analysis between raising problem, so that semantic phase The accuracy calculated like degree is higher.
S105, using the corresponding candidate answers of the FAQs with highest similarity as the correspondence to be answered a question Answer;
Specifically, it to above-mentioned each FAQs and after answering a question and carrying out semantic similarity judgement, can obtain One FAQs with highest similarity, using the answer of the FAQs with highest similarity as wait solve question and answer The answer of topic realizes the effect of intelligent answer.
Content based on the above embodiment, as a kind of alternative embodiment, according to the word segmentation result of the FAQs, to The word segmentation result answered a question and the context figure, calculate the FAQs and the process wait the similarity between answering a question It is related to the calculating of two levels --- expression layer similarity and semantic layer similarity.So-called expression layer similarity refers to two The modal similarity degree of sentence, with the number and the relative position in sentence of contained same words or synonym in two sentences To measure.Semantic layer refers to literal cannot directly reflecting, it is to be understood that sentence face implicit semantic.Surface layer similarity meter There are many calculation methods, such as cosine similarity, generalized J accard similarity.And semantic layer similarity then can be using semantic word Allusion quotation and meaning of a word context.
Fig. 3 show the embodiment of the present invention according to the word segmentation result of the FAQs, word segmentation result to be answered a question with And the context figure, the FAQs and the flow diagram wait the semantic similarity between answering a question are calculated, as shown in figure 3, Specifically:
S301, the cosine similarity that the first text and the second text are calculated according to context figure.Wherein, first text For the text of the file to be answered, second text is the text of the FAQs, uses p2It indicates.Wherein, p2Participle For W1、W2、…Wn, n is from p2Segment obtained participle number.
It should be noted that the cosine value of the angle between cosine similarity i.e. two vector, cosine similarity is used to table Show the difference degree between two sentences;Cosine similarity lays particular emphasis on the similarities and differences of the vector on direction, that is, the similarities and differences of trend, Rather than the size of absolute distance.Its formula is as follows:
Wherein, xiIndicate the first text p1In i-th participle TF-IDF weight, yiIndicate the second text p2In i-th The TF-IDF weight of participle, TF-IDF (term frequency-inverse document frequency) is that one kind is used for The common weighting technique of information retrieval and data mining.TF means word frequency (Term Frequency) that IDF means inverse text Frequency index (Inverse Document Frequency).Since context figure is a word set relational graph, in sentence point After word, the weight progress sentence that can be good at calculating word in sentence using TF-IDF takes word, utilizes more than space vector after taking word The measuring similarity of string angle will not be influenced by index scale, and cosine value is fallen within section [0,1], and value is bigger, then difference is got over It is small.
Any one participle of S302, any one participle that the first text is obtained according to context figure and the second text Similarity, to calculate the offset similarity of the first text and the second text.
It should be noted that the embodiment of the present invention is according to the participle in two texts when calculating offset similarity What the similarity in context figure obtained, since context figure has recorded the abutment points (i.e. context word set) of each participle, pass through The approximate situation for comparing abutment points between segmenting two-by-two, that is, can determine whether similarity degree of two texts in word position relationship.
S303, according to context figure obtain the first text in be not present in the second text all participles context word set with And second be not present in text the first text all participles context word set, to calculate the first text and the second text Semantic layer similarity.
It should be noted that semantic layer similarity embody two text implicit semantics relationship since be on literal not The information that can directly translate, the embodiment of the present invention are obtained respectively by context figure and are not present in another text in each text All participles context word set, semantic layer similarity is calculated by above-mentioned two context word set.
It is S304, similar according to the cosine similarity of the first text and the second text, offset similarity and semantic layer Degree calculates the semantic similarity of the first text and the second text.
Method provided in an embodiment of the present invention obtains the cosine phase of the first text and the second text by context figure respectively Like degree, offset similarity and semantic layer similarity, the participle of two texts is obtained in space vector cosine angle and position The similarity of relationship and the mutual word that do not include finally obtain semantic similarity, to can be improved similar in the similarity of semantic layer Spend the reliability and accuracy of judgement.
Content based on the above embodiment obtains the TF- segmented in the first/bis- text as a kind of alternative embodiment The method of IDF weight specifically:
Abutment points of the participles all in first text on context figure are constituted into word set A, by participles all in the second text Abutment points on context figure constitute word set B;
All participles in word set A and word set B are constituted into word set T, T=A ∪ B;
Abutment points of the participle being not present in the second text in first text on context figure are constituted into word set C;
Abutment points of the participle being not present in the first text in second text on context figure are constituted into word set D.
For the participle x in the first/bis- texti, obtain participle xiAbutment points on context figure constitute word set E, by word Collect the registration in the participle and word set T in E as participle xiTF value;With 1g (nT/nE∩T) as participle xiIDF value, will The product of TF value and IDF value is as participle xiTF-IDF weight, wherein nTIndicate the sum segmented in word set T, nE∩TIt indicates The sum of word set E and word set T shared participle.
The method of the IF-IDF weight segmented in the first/bis- text of acquisition of the embodiment of the present invention, in conjunction with participle in context Syntagmatic in figure combines context locating for text and obtains IF-IDF weight, can further increase the cosine of text The precision of similarity.
Content based on the above embodiment obtains any of the first text according to context figure as a kind of alternative embodiment The similarity of any one participle of one participle and the second text, to calculate the offset phase of the first text and the second text Like degree, as shown in figure 4, specifically:
S401, according to the first text p1Word segmentation result, obtain the first text in segment sum m, the first text length len(P1) and participle SiRelative position pos (S in the first texti)。
It should be noted that participle SiRelative position pos (S in the first texti) pass through formula It calculates, wherein i indicates position of the participle in the first text.
S402, according to the second text p2Word segmentation result, obtain the second text in segment sum n, the second text length len(P2) and participle WjRelative position pos (W in the second textj);.
It should be noted that participle WjRelative position pos (W in the second textj) pass through formula It calculates, wherein j indicates position of the participle in the second text.It should be noted that the embodiment of the present invention to step S401 with The sequencing of S402 is not construed as limiting.
S403, participle S is calculated according to context figureiWith participle WjSimilarity sim (Si, Wj)。
It should be noted that the similarity between the context calculating participle just for participle, this hair is different from the prior art Bright embodiment obtains participle S especially by context figureiWith participle WjAbutment points, obtain similarity by comparing adjacent point data sim(Si, Wj), that is, realize the similarity judgement segmented in macroscopical context.
S404, according to formulaMeter Calculate the first text p1With the second text p2Offset similarity Simp(p1, p2)。
It should be noted that by the formula of offset similarity it is found that this two when the similarity segmented when two is consistent The relative position of a participle is more consistent, then total offset similarity is bigger, and when the relative position of two participles is consistent, The more big then total offset similarity of the similarity of participle is bigger.
The method provided in an embodiment of the present invention for calculating offset similarity, obtains the inclined of two texts from context figure Shifting amount similarity only considers the offset similarity that the context relation of participle obtains compared with prior art, is further promoted Difference precision between text, so that the accuracy of Semantic Similarity Measurement is higher.
Content based on the above embodiment calculates participle S according to the context figure as a kind of alternative embodimentiWith point Word WjSimilarity sim (Si, Wj), specifically:
Participle S is obtained on context figureiAbutment points π (Si) and degree len (π (Si));
Participle W is obtained on context figurejAbutment points π (Wj) and degree len (π (Wj));
According to formulaCalculate similarity sim (Si, Wj);
Wherein, T (π (Si)∩π(Wj)) indicate participle SiWith participle WjShared abutment points.
The method provided in an embodiment of the present invention for calculating offset similarity obtains point of two texts from context figure Similarity between word, only considers the context relation of participle compared with prior art, further promotes the difference essence between text Degree, so that the accuracy of Semantic Similarity Measurement is higher.
Content based on the above embodiment is obtained in the first text according to context figure and is not deposited as a kind of alternative embodiment It is to be not present in the language of all participles of the first text in the context word set and the second text of all participles of the second text Border word set, to calculate the semantic layer similarity of the first text and the second text, specifically:
In the first text p1The second text p of middle acquisition2In the participle that is not present, first participle collection is constituted, on context figure The context word that the first participle concentrates all participles is obtained, the first context word set π (P is constituted1), in the second text p2It is middle to obtain first Text p1In the participle that is not present, constitute the second participle collection, the context that the second participle concentrates all participles obtained on context figure Word constitutes the second context word set π (P2)。
With the first text are as follows: the method that tomato carries out nursery in greenhouse, the second text are as follows: U.S. tomato carries out nursery Method is illustrated, the first text word segmentation result are as follows: tomato, greenhouse, nursery, method, the word segmentation result of the second text are as follows: beauty State, tomato, nursery, method obtain in context figure then the participle that the second text in the first text is not present is greenhouse Participle: the context word set in greenhouse.Similarly, the participle of the first text supplement in the second text is the U.S., is obtained in context figure Take participle: the context word set in the U.S..
According to formulaCalculate the semanteme of the first text and the second text Layer similarity SimL(p1, p2);
Wherein, work as p1And p2In be not present antonym when, α=1;Work as p1And p2In there are when antonym, α=- 1;T(π (P1)∩π(P2)) indicate π (P1) and π (P2) in share context word; T(π(P1)∪π(P2)) indicate π (P1) and π (P2) in institute Some context words.
It should be noted that when calculating semantic layer similarity using above-mentioned formula, it is also necessary in advance to the first text and Whether retrieved containing antonym in second text.When containing antonym, the semanteme of two texts is with greater probability Opposite.According to π (P1) and π (P2) in share context word account for π (P1) and π (P2) in all context word ratio and be The no state containing antonym, the embodiment of the present invention realize the calculating to semantic layer similarity.The embodiment of the present invention provides Method combine context figure in the case where, analyze in two sentences mutually do not include word semantic layer similarity have more High precision.
Content based on the above embodiment, as a kind of alternative embodiment, according to the cosine of the first text and the second text Similarity, offset similarity and semantic layer similarity calculate the semantic similarity of the first text and the second text, specifically Are as follows:
According to formula: Simb(p1, p2)=Cosin (p1, p2)+α1×Simp(p1, p2) obtain the first text p1With second Text p2Expression layer similarity Simb(p1, p2);
According to formula: m (p1, p2)=Simb(p1, p2)+β1×SimL(p1, p2) obtain the first text p1With the second text p2 Semantic similarity m (p1, p2);
Wherein, Cosin (p1, p2)、Simp(p1, p2) and SimL(p1, p2) respectively indicate the first text p1With the second text p2Cosine similarity, offset similarity and semantic layer similarity, α1Indicate offset similarity for expression layer similarity Impact factor, β1Indicate semantic layer similarity for the impact factor of semantic similarity.
It should be noted that cosine similarity and offset similarity are collectively formed expression layer phase by the embodiment of the present invention Like degree, semantic pixel is obtained further according to expression layer similarity and semantic layer similarity are comprehensive.The embodiment of the present invention fully considers Macroscopical context has carried out the excavation of deeper degree to semantic image, to semanteme.
Content based on the above embodiment passes through practice analysis α as a kind of alternative embodiment1Value should ensure that with partially The product of shifting amount similarity is less than cosine similarity value, while guaranteeing α1Product with offset similarity is with cosine similarity value Become larger by 0 and become larger, starts to be become smaller with cosine similarity value by becoming larger when reaching a certain value.Therefore, according to public affairs Formula: α1=(1-Cosin (p1, p2))×Cosin(p1, p2) obtain impact factor α1
Pass through practice analysis β1Value, which should ensure that, is less than expression layer similarity value with the product of semantic layer similarity, protects simultaneously Demonstrate,prove β1Become larger with the product of semantic layer similarity as expression layer similarity value becomes larger by 0, when reaching a certain Zhi Lin circle point Start to be become smaller with expression layer similarity value by becoming larger.Therefore, according to formula: β1=(1-Simb(p1, p2))×Simb(p1, p2) obtain impact factor β1
According to another aspect of the present invention, the embodiment of the present invention also provides a kind of intelligent answer device, referring to Fig. 5, figure 5 show the functional block diagram of the intelligent answer device of the embodiment of the present invention, the system be used in foregoing embodiments according to It answers a question and matches answer with the semantic similarity of FAQs.Therefore, in the intelligent answer method in foregoing embodiments Description and definition, can be used for the understanding of each execution module in the embodiment of the present invention.
As shown, the intelligent answer device includes:
Context obtains module 501, word segmentation processing is carried out for treating the text answered a question, according to described wait solve question and answer The word segmentation result of topic determines the context for carrying out semantic similarity judgement;
FAQs obtains module 502, for collecting a certain number of FAQs according to the context;
Context figure obtains module 503, word segmentation processing is carried out for the text to all FAQs, according to all The word segmentation result of the FAQs establishes context figure;Wherein, the context figure is each point for indicating all FAQs The non-directed graph of syntagmatic between word;
Similarity calculation module 504 is used for for any one of FAQs, according to the participle of the FAQs As a result, word segmentation result to be answered a question and the context figure, calculate the FAQs to wait similar between answering a question Degree;
Answer matches module 505, for that will have described in the corresponding candidate answers conduct of the FAQs of highest similarity Wait corresponding answer of answering a question.
The intelligent answer device of the embodiment of the present invention obtains module by context and determines according to wait answer a question for carrying out The sentence of semantic similarity judgement, then module is obtained by FAQs, a certain number of FAQs are collected according to context, with It will treat to answer a question to be mapped in the same context with FAQs and analyze, the essence of the variance analysis between raising problem Degree obtains module according to the receipts of the FAQs of mobile phone by context figure so that the accuracy of Semantic Similarity Measurement is higher Collection word segmentation result establishes context figure, and the phase wait answer a question with FAQs is calculated according to context figure by similarity calculation module Like degree, context figure is constructed according to above-mentioned a certain number of FAQs in embodiments of the present invention, embodies big data Characteristic, with it is existing based on semantic similarity to be compared wait answer a question with the context of common problem building context it is complete It is complete different.Context in the embodiment of the present invention is context macroscopically, finally will have highest similar by answer matches module The corresponding candidate answers of the FAQs of degree are as described wait corresponding answer of answering a question.The embodiment of the present invention can be more quasi- It really analyzes the problem of to be answered and furnishes an answer.
The embodiment of the invention provides a kind of electronic equipment.Referring to Fig. 6, which includes: processor (processor) 601, memory (memory) 602 and bus 603;
Wherein, processor 601 and memory 602 complete mutual communication by bus 603 respectively;Processor 601 is used In calling the program instruction in memory 602, to execute the calculation method of semantic similarity provided by above-described embodiment, example Such as include: to treat the text answered a question to carry out word segmentation processing, according to the word segmentation result to be answered a question determine for into The context of row semantic similarity judgement;A certain number of FAQs are collected according to the context;It described common is asked to all The text of topic carries out word segmentation processing, establishes context figure according to the word segmentation result of all FAQs;For any one institute FAQs is stated, according to the word segmentation result of the FAQs, word segmentation result to be answered a question and the context figure, is calculated The FAQs and wait the similarity between answering a question;The corresponding candidate answers of FAQs with highest similarity are made To be described wait corresponding answer of answering a question;Wherein, the context figure be indicate all FAQs it is each segment between The non-directed graph of syntagmatic.
The embodiment of the present invention provides a kind of non-transient computer readable storage medium, the non-transient computer readable storage Medium storing computer instruction, the computer instruction make computer execute the meter of semantic similarity provided by above-described embodiment Calculation method, for example, treat the text answered a question and carry out word segmentation processing, according to the word segmentation result to be answered a question Determine the context for carrying out semantic similarity judgement;A certain number of FAQs are collected according to the context;To all The text of the FAQs carries out word segmentation processing, establishes context figure according to the word segmentation result of all FAQs;For Any one of FAQs, according to the word segmentation result of the FAQs, word segmentation result to be answered a question and institute's predicate Border figure calculates the FAQs and wait the similarity between answering a question;By the corresponding time of FAQs with highest similarity Select answer as described wait corresponding answer of answering a question;Wherein, the context figure is to indicate all FAQs The non-directed graph of syntagmatic between each participle.
The apparatus embodiments described above are merely exemplary, wherein unit can be with as illustrated by the separation member It is or may not be and be physically separated, component shown as a unit may or may not be physical unit, Can be in one place, or may be distributed over multiple network units.It can select according to the actual needs wherein Some or all of the modules achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying creativeness Labour in the case where, it can understand and implement.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment It can realize by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Based on such reason Solution, substantially the part that contributes to existing technology can embody above-mentioned technical proposal in the form of software products in other words Out, which may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, It uses including some instructions so that a computer equipment (can be personal computer, server or the network equipment etc.) is held The method of certain parts of each embodiment of row or embodiment.
Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;To the greatest extent Present invention has been described in detail with reference to the aforementioned embodiments for pipe, those skilled in the art should understand that: it is still It is possible to modify the technical solutions described in the foregoing embodiments, or part of technical characteristic is equally replaced It changes;And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution Spirit and scope.

Claims (10)

1. a kind of intelligent answer method characterized by comprising
It treats the text answered a question and carries out word segmentation processing, determined according to the word segmentation result to be answered a question for carrying out language The context of adopted similarity judgement;
A certain number of FAQs are collected according to the context;
Word segmentation processing is carried out to the text of all FAQs, language is established according to the word segmentation result of all FAQs Border figure;
For any one of FAQs, according to the word segmentation result of the FAQs, word segmentation result to be answered a question with And the context figure, calculate the FAQs and wait the semantic similarity between answering a question;
Using the corresponding candidate answers of the FAQs with highest similarity as described wait corresponding answer of answering a question;
Wherein, the context figure is the non-directed graph of the syntagmatic between each participle for indicating all FAQs.
2. intelligent answer method according to claim 1, which is characterized in that the participle knot according to the FAQs It is similar to wait the semanteme between answering a question to calculate the FAQs for fruit, word segmentation result to be answered a question and the context figure Degree, specifically:
The cosine similarity of the first text and the second text is calculated according to the context figure;Wherein, first text is described The text of file to be answered, second text are the text of the FAQs;
The phase of any one participle of first text and any one participle of the second text is obtained according to the context figure Like degree, to calculate the offset similarity of first text and the second text;
The context word set that all participles of second text are not present in first text is obtained according to the context figure, And the context word set of all participles of first text is not present in second text, to calculate first text With the semantic layer similarity of the second text;
According to the cosine similarity of first text and the second text, offset similarity and semantic layer similarity, calculate The semantic similarity of first text and the second text.
3. intelligent answer method according to claim 2, which is characterized in that described to obtain described the according to the context figure The similarity of any one participle of any one participle and the second text of one text, to calculate first text and second The offset similarity of text, specifically:
According to the first text p1Word segmentation result, obtain the length len that sum m, the first text are segmented in first text (P1) and participle SiRelative position pos (S in the first texti);
According to the second text p2Word segmentation result, obtain the length len that sum n, the second text are segmented in second text (P2) and participle WjRelative position pos (W in the second textj);
Participle S is calculated according to the context figureiWith participle WjSimilarity sim (Si, Wj);
According to formulaCalculate the first text p1 With the second text p2Offset similarity Simp(p1, p2)。
4. intelligent answer method according to claim 3, which is characterized in that described to calculate participle S according to the context figurei With participle WjSimilarity sim (Si, Wj), specifically:
Participle S is obtained on the context figureiAbutment points π (Si) and degree len (π (Si));
Participle W is obtained on the context figurejAbutment points π (Wj) and degree len (π (Wj));
According to formulaCalculate similarity sim (Si, Wj);
Wherein, T (π (Si)∩π(Wj)) indicate participle SiWith participle WjShared abutment points.
5. intelligent answer method according to claim 2, which is characterized in that described to obtain described the according to the context figure It is not present in being not present in institute in the context word set and second text of all participles of second text in one text The context word set of all participles of the first text is stated, to calculate the semantic layer similarity of first text and the second text, tool Body are as follows:
In the first text p1It is middle to obtain the second text p2In the participle that is not present, first participle collection is constituted, in the context figure The upper context word for obtaining the first participle and concentrating all participles, constitutes the first context word set π (P1);
In the second text p2It is middle to obtain the first text p1In the participle that is not present, the second participle collection is constituted, in institute's predicate The context word that second participle concentrates all participles is obtained on the figure of border, constitutes the second context word set π (P2);
According to formulaJuice calculates the semanteme of first text and the second text Layer similarity SimL(p1, p2);
Wherein, work as p1And p2In be not present antonym when, α=1;Work as p1And p2In there are when antonym, α=- 1;T(π(P1)∩π (P2)) indicate the π (P1) and π (P2) in share context word;T(π(P1)∪π(P2)) indicate π (P1) and π (P2) in it is all Context word.
6. intelligent answer method according to claim 2, which is characterized in that described according to first text and the second text This cosine similarity, offset similarity and semantic layer similarity calculates the semanteme of first text and the second text Similarity, specifically:
According to formula: Simb(p1, p2)=Cosin (p1, p2)+α1×Simp(p1, p2) obtain the first text p1With the second text p2 Expression layer similarity Simb(p1, p2);
According to formula: m (p1, p2)=Simb(p1, p2)+β1×SimL(p1, p2) obtain the first text p1With the second text p2Language Adopted similarity m (p1, p2);
Wherein, Cosin (p1, p2)、Simp(p1, p2) and SimL(p1, p2) respectively indicate the first text p1With the second text p2It is remaining String similarity, offset similarity and semantic layer similarity, α1Indicate influence of the offset similarity for expression layer similarity The factor, β1Indicate semantic layer similarity for the impact factor of semantic similarity.
7. intelligent answer method according to claim 6, which is characterized in that
According to formula: α1=(1-Cosin (p1, p2))×Cosin(p1, p2) obtain impact factor α1
According to formula: β1=(1-Simb(p1, p2))×Simb(p1, p2) obtain impact factor β1
8. a kind of intelligent answer device characterized by comprising
Context obtains module, word segmentation processing is carried out for treating the text answered a question, according to the participle to be answered a question As a result the context for carrying out semantic similarity judgement is determined;
FAQs obtains module, for collecting a certain number of FAQs according to the context;
Context figure obtains module, word segmentation processing is carried out for the text to all FAQs, according to all described common The word segmentation result of problem establishes context figure;Wherein, the context figure is the group between each participle for indicating all FAQs The non-directed graph of conjunction relationship;
Similarity calculation module is used for for any one of FAQs, according to the word segmentation result of the FAQs, wait solve The word segmentation result of question and answer topic and the context figure, calculate the FAQs and wait the similarity between answering a question;
Answer matches module, for being answered a question using the corresponding candidate answers of the FAQs with highest similarity as described wait solve Inscribe corresponding answer.
9. a kind of electronic equipment characterized by comprising
At least one processor;And
At least one processor being connect with the processor communication, in which:
The memory is stored with the program instruction that can be executed by the processor, and the processor calls described program to instruct energy Enough methods executed as described in claim 1 to 7 is any.
10. a kind of non-transient computer readable storage medium, which is characterized in that the non-transient computer readable storage medium is deposited Computer instruction is stored up, the computer instruction makes the computer execute the method as described in claim 1 to 7 is any.
CN201810790249.5A 2018-07-18 2018-07-18 Intelligent question and answer method and device Active CN109033318B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810790249.5A CN109033318B (en) 2018-07-18 2018-07-18 Intelligent question and answer method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810790249.5A CN109033318B (en) 2018-07-18 2018-07-18 Intelligent question and answer method and device

Publications (2)

Publication Number Publication Date
CN109033318A true CN109033318A (en) 2018-12-18
CN109033318B CN109033318B (en) 2020-11-27

Family

ID=64643328

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810790249.5A Active CN109033318B (en) 2018-07-18 2018-07-18 Intelligent question and answer method and device

Country Status (1)

Country Link
CN (1) CN109033318B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109840277A (en) * 2019-02-20 2019-06-04 西南科技大学 A kind of government affairs Intelligent Service answering method and system
CN109918494A (en) * 2019-03-22 2019-06-21 深圳狗尾草智能科技有限公司 Context relation based on figure replys generation method, computer and medium
CN110069613A (en) * 2019-04-28 2019-07-30 河北省讯飞人工智能研究院 A kind of reply acquisition methods and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101566998A (en) * 2009-05-26 2009-10-28 华中师范大学 Chinese question-answering system based on neural network
CN102346766A (en) * 2011-09-20 2012-02-08 北京邮电大学 Method and device for detecting network hot topics found based on maximal clique
CN103425635A (en) * 2012-05-15 2013-12-04 北京百度网讯科技有限公司 Method and device for recommending answers
US20140229161A1 (en) * 2013-02-12 2014-08-14 International Business Machines Corporation Latent semantic analysis for application in a question answer system
CN107766511A (en) * 2017-10-23 2018-03-06 深圳市前海众兴电子商务有限公司 Intelligent answer method, terminal and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101566998A (en) * 2009-05-26 2009-10-28 华中师范大学 Chinese question-answering system based on neural network
CN102346766A (en) * 2011-09-20 2012-02-08 北京邮电大学 Method and device for detecting network hot topics found based on maximal clique
CN103425635A (en) * 2012-05-15 2013-12-04 北京百度网讯科技有限公司 Method and device for recommending answers
US20140229161A1 (en) * 2013-02-12 2014-08-14 International Business Machines Corporation Latent semantic analysis for application in a question answer system
CN107766511A (en) * 2017-10-23 2018-03-06 深圳市前海众兴电子商务有限公司 Intelligent answer method, terminal and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王宝勋: "面向网络社区问答对的语义挖掘研究", 《中国博士学位论文全文数据库信息科技辑》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109840277A (en) * 2019-02-20 2019-06-04 西南科技大学 A kind of government affairs Intelligent Service answering method and system
CN109918494A (en) * 2019-03-22 2019-06-21 深圳狗尾草智能科技有限公司 Context relation based on figure replys generation method, computer and medium
WO2020191828A1 (en) * 2019-03-22 2020-10-01 深圳狗尾草智能科技有限公司 Graph-based context association reply generation method, computer and medium
CN109918494B (en) * 2019-03-22 2022-11-04 元来信息科技(湖州)有限公司 Context association reply generation method based on graph, computer and medium
CN110069613A (en) * 2019-04-28 2019-07-30 河北省讯飞人工智能研究院 A kind of reply acquisition methods and device

Also Published As

Publication number Publication date
CN109033318B (en) 2020-11-27

Similar Documents

Publication Publication Date Title
US10831769B2 (en) Search method and device for asking type query based on deep question and answer
Revathy et al. Sentiment analysis using machine learning: Progress in the machine intelligence for data science
US11113323B2 (en) Answer selection using a compare-aggregate model with language model and condensed similarity information from latent clustering
US10642975B2 (en) System and methods for automatically detecting deceptive content
CN109145085A (en) The calculation method and system of semantic similarity
CN108984530A (en) A kind of detection method and detection system of network sensitive content
CN112667794A (en) Intelligent question-answer matching method and system based on twin network BERT model
US10339214B2 (en) Structured term recognition
CN109145292B (en) Paraphrase text depth matching model construction method and paraphrase text depth matching method
CN112084307B (en) Data processing method, device, server and computer readable storage medium
CN109033318A (en) Intelligent answer method and device
CN107273348A (en) The topic and emotion associated detecting method and device of a kind of text
Van Atteveldt et al. Studying political decision making with automatic text analysis
CN110348539B (en) Short text relevance judging method
CN115775349A (en) False news detection method and device based on multi-mode fusion
DeLong et al. Offline dominance and zeugmatic similarity normings of variably ambiguous words assessed against a neural language model (BERT)
Nithya et al. Meta-heuristic searched-ensemble learning for fake news detection with optimal weighted feature selection approach
Qi et al. What is the limitation of multimodal llms? a deeper look into multimodal llms through prompt probing
Mansoorizadeh et al. Persian Plagiarism Detection Using Sentence Correlations.
CN113658690A (en) Intelligent medical guide method and device, storage medium and electronic equipment
CN117454217A (en) Deep ensemble learning-based depression emotion recognition method, device and system
Le et al. CiteOpinion: evidence-based evaluation tool for academic contributions of research papers based on citing sentences
Okpala et al. Perception Analysis: Pro-and Anti-Vaccine Classification with NLP and Machine Learning.
Otani et al. Large-scale acquisition of commonsense knowledge via a quiz game on a dialogue system
Ling Coronavirus public sentiment analysis with BERT deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant