CN109033318A - Intelligent answer method and device - Google Patents
Intelligent answer method and device Download PDFInfo
- Publication number
- CN109033318A CN109033318A CN201810790249.5A CN201810790249A CN109033318A CN 109033318 A CN109033318 A CN 109033318A CN 201810790249 A CN201810790249 A CN 201810790249A CN 109033318 A CN109033318 A CN 109033318A
- Authority
- CN
- China
- Prior art keywords
- text
- context
- similarity
- faqs
- participle
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of intelligent answer method and device, treats the text answered a question and carries out word segmentation processing, the context for carrying out semantic similarity judgement is determined according to the word segmentation result to be answered a question;A certain number of FAQs are collected according to the context;Word segmentation processing is carried out to the text of all FAQs, context figure is established according to the word segmentation result of all FAQs;The FAQs is calculated and wait the semantic similarity between answering a question according to the word segmentation result of the FAQs, word segmentation result to be answered a question and the context figure for any one of FAQs;Using the corresponding candidate answers of the FAQs with highest similarity as described wait corresponding answer of answering a question.The embodiment of the present invention can more accurately analyze the problem of to be answered and furnish an answer.
Description
Technical field
The present invention relates to natural language processing technique fields, more particularly, to intelligent answer method and device.
Background technique
In question answering system, general chat answer push randomness is strong.But in professional application field, reply content needs
Precisely.It is known as sentence phase using research of the computer identification " user's enquirement " compared with sentence existing in sentence library carries out semanteme
It is studied like degree.It has been a hot spot of research and difficult point as a critical problem in natural language processing.Sentence is similar
Degree research is between excavates sentence word itself (as depended on WordNet framework other than relationship and Overlapping Calculation sentence similarity
With dependent on Hownet framework and corpus), feature extraction neural network based also starts to be developed.
Calculation method experts and scholars based on semantic similarity have been carried out extensive research.Such as: it is based on word
The statistical method of co-occurrence.This method mainly passes through word frequency in sentence and is counted, such as TF-IDF algorithm, Jaccard
The improved method etc. of Similarity Coefficient method and Metzler based on overlap.These methods realize it is simple,
Efficiently, but completely the morphology and semantic information of sentence are had ignored.Another kind is the method based on morphology and semantic information.The party
Method considers semantic information relevant factor, but building is relative complex, such as the Semantic Similarity Measurement based on ontology.Third
Kind, it is based on neural network corpus training characteristics extraction method, is also greatly developed in recent years, such as based on the sentence of Word2vec
Sub- Semantic Similarity Measurement research etc., depends on the quality and quantity of corpus, focuses on feature extraction, have ignored the reason of sentence justice
Solution can not achieve really to semantic excavation.4th kind is then the method for using comprehensive fusion means, is such as based on multiple features
Sentence semantic similarity calculating of fusion etc..With going deep into for research, connected applications experience discovery is various in practical applications
If method, departing from application scenarios, algorithm either realizes complicated or low efficiency, and uncertain factor interference is more, have certain
Operation limitation.Therefore, the prior art provides " a kind of Measurement of word similarity based on context ".This method,
On the basis of similarity calculating method, by introducing the context of word, meaning of a word phase is assessed using the concept of fuzzy mathematics
Like degree calculation method.The correlation that this method uses for reference degree of membership is fixed, constructs Fuzzy importance of the word in context of co-text, mentions
The sentence justice similarity degree of word has been risen, but has had deficiency in sentence entirety sentence justice similarity.
Summary of the invention
The present invention provide a kind of intelligent answer method for overcoming the above problem or at least being partially solved the above problem and
Device.
According to the first aspect of the invention, a kind of intelligent answer method is provided, comprising:
It treats the text answered a question and carries out word segmentation processing, be used for according to the word segmentation result determination to be answered a question
Carry out the context of semantic similarity judgement;
A certain number of FAQs are collected according to the context;
Word segmentation processing is carried out to the text of all FAQs, according to the word segmentation result of all FAQs
Establish context figure;
For any one of FAQs, according to the word segmentation result of the FAQs, participle knot to be answered a question
Fruit and the context figure, calculate the FAQs and wait the similarity between answering a question;
Using the corresponding candidate answers of the FAQs with highest similarity as described wait corresponding answer of answering a question
Case;
Wherein, the context figure is the non-directed graph of the syntagmatic between each participle for indicating all FAQs.
According to the second aspect of the invention, a kind of intelligent answer device is provided, comprising:
Context obtains module, for according to wait the context determined for carrying out semantic similarity judgement of answering a question;
FAQs obtains module, for collecting a certain number of FAQs according to the context;
Context figure obtains module, word segmentation processing is carried out for the text to all FAQs, according to all described
The word segmentation result of FAQs establishes context figure;Wherein, the context figure be indicate all FAQs it is each segment between
Syntagmatic non-directed graph;
Similarity calculation module is used for for any one of FAQs, according to the participle knot of the FAQs
Fruit, word segmentation result to be answered a question and the context figure, calculate the FAQs and wait the similarity between answering a question;
Answer matches module, for will have the corresponding candidate answers of the FAQs of highest similarity as it is described to
It answers a question corresponding answer.
According to the third aspect of the present invention, a kind of electronic equipment is also provided, comprising:
At least one processor;And
At least one processor being connect with the processor communication, in which:
The memory is stored with the program instruction that can be executed by the processor, and the processor calls described program
Instruction is able to carry out in the various possible implementations of first aspect intelligently asks provided by any possible implementation
Answer method.
According to the fourth aspect of the present invention, a kind of non-transient computer readable storage medium is also provided, it is described non-transient
Computer-readable recording medium storage computer instruction, the computer instruction make the computer execute each of first aspect
Intelligent answer method provided by any possible implementation in the possible implementation of kind.
Intelligent answer method and device proposed by the present invention treats to answer a question carrying out contextual analysis, obtains subsequent progress
The context of semantic similarity judgement, it is clear that the context be it is relevant to wait answer a question, continue to obtain a certain number of with the language
The same or similar FAQs in border will be treated to answer a question to be mapped in the same context with FAQs and be analyzed,
The precision of variance analysis between raising problem, so that the accuracy of Semantic Similarity Measurement is higher, further according to above-mentioned acquisition
The word segmentation result of FAQs constructs context figure, and context figure is according to above-mentioned a certain number of common in embodiments of the present invention
Problem building, embody the characteristic of big data, with it is existing based on semantic similarity to be compared wait answer a question and commonly use
The context of the context building of problem is entirely different, and the context in the embodiment of the present invention is context macroscopically.The present invention is implemented
Example can more accurately analyze the problem of to be answered and furnish an answer.
Detailed description of the invention
Fig. 1 is the flow diagram according to the intelligent answer method of the embodiment of the present invention;
Fig. 2 is the context figure according to the embodiment of the present invention;
Fig. 3 be according to the embodiment of the present invention according to the word segmentation result of FAQs, word segmentation result to be answered a question with
And context figure, calculate FAQs and the flow diagram wait the similarity between answering a question;
Fig. 4 is any one participle and the second text that the first text is obtained according to context figure according to the embodiment of the present invention
The similarity of this any one participle, to calculate the flow diagram of the offset similarity of the first text and the second text;
Fig. 5 is the functional block diagram according to the intelligent answer device of the embodiment of the present invention;
Fig. 6 is the block diagram according to the electronic equipment of the embodiment of the present invention.
Specific embodiment
With reference to the accompanying drawings and examples, specific embodiments of the present invention will be described in further detail.Implement below
Example is not intended to limit the scope of the invention for illustrating the present invention.
Have in the prior art about the calculation method of semantic similarity following several: the first: based on word co-occurrence
Statistical method.This method mainly passes through word frequency in sentence and is counted, such as TF-IDF algorithm, Jaccard Similarity
The improved method etc. of Coefficient method and Metzler based on overlap.These methods are realized simple, efficient but complete
Have ignored the morphology and semantic information of sentence.Second is the method based on morphology and semantic information.This method considers language
Adopted information relevant factor, but building is relative complex, such as the Semantic Similarity Measurement based on ontology.The third is based on nerve
Network corpus training characteristics extraction method, is also greatly developed, in recent years such as based on the sentence semantic similarity of Word2vec
Research etc. is calculated, the quality and quantity of corpus are depended on, focuses on feature extraction, has ignored the understanding of sentence justice, can not achieve true
The excavation of face semanteme.4th kind is then the method for using comprehensive fusion means, such as sentence language based on multi-feature fusion
Adopted similarity calculation etc..With going deep into for research, connected applications experience discovery, if various methods are detached from practical applications
Application scenarios, algorithm either realize complicated or low efficiency, and uncertain factor interference is more, has certain operation to limit to
Property.Therefore, the prior art provides " a kind of Measurement of word similarity based on context ".This method, in similarity meter
On the basis of calculation method, by introducing the context of word, the concept of fuzzy mathematics is used to assess the meaning of a word mutually to spend calculating side
Method.The correlation that this method uses for reference degree of membership is fixed, constructs Fuzzy importance of the word in context of co-text, improves the sentence of word
Adopted similarity degree, but have deficiency in sentence entirety sentence justice similarity.
In order to overcome the above problem of the prior art, the embodiment of the present invention provides a kind of calculation method of semantic similarity,
Its inventive concept is to treat to answer a question to carry out contextual analysis, obtains the subsequent context for carrying out semantic similarity judgement, it is clear that
The context be it is relevant to wait answer a question, continue to obtain it is a certain number of with the same or similar FAQs of the context, with
It will treat to answer a question to be mapped in the same context with FAQs and analyze, the essence of the variance analysis between raising problem
Degree constructs context further according to the word segmentation result of the FAQs of above-mentioned acquisition so that the accuracy of Semantic Similarity Measurement is higher
Figure, context figure is constructed according to above-mentioned a certain number of FAQs in embodiments of the present invention, embodies the spy of big data
Property, with it is existing based on semantic similarity to be compared wait answer a question with the context of common problem building context it is complete
It is different.Context in the embodiment of the present invention is context macroscopically.The embodiment of the present invention can more accurately be analyzed wait answer
The problem of and furnish an answer.
Fig. 1 shows the flow diagram of the calculation method of the semantic similarity of the embodiment of the present invention, as shown, packet
It includes:
S101, the text progress word segmentation processing answered a question is treated, is determined according to the word segmentation result to be answered a question
For carrying out the context of semantic similarity judgement.
Specifically, the process that the embodiment of the present invention obtains text to be answered a question can be with are as follows:
The text data for receiving the problem of to be answered, as text to be answered a question.
The voice data for receiving the problem of to be answered carries out speech recognition to voice data and obtains by speech recognition
Text data, and from the text data Jing Guo speech recognition as text to be answered a question.
It should be understood that the above-mentioned process for obtaining text to be answered a question is only several possible implementations, and
Any restriction should not be constituted to the embodiment of the present invention.
In order to more easily describe the basic principle of the embodiment of the present invention, text to be answered a question is referred to as first
Text p1, according to existing participle technique, by p1Participle is S1、S2、…Sm, wherein m is from p1Obtained participle number is segmented,
The participle of text and the number of participle thereby is achieved.
It is used to carry out the context of semantic similarity judgement in the embodiment of the present invention to be true according to the word segmentation result of the first text
Fixed.Word segmentation result can show the technical field of text to be answered, environment, theme, the tone etc. information, for example, the first text
This are as follows: the method that tomato carries out nursery in greenhouse.By participle, the word segmentation result of the first text are as follows: tomato, greenhouse, nursery,
Method, by analyzing word segmentation result, it is known that the context of the first text is agricultural breeding, and especially tomato cultivates field.
S102, a certain number of FAQs are collected according to context.
It should be noted that the embodiment of the present invention after determining context, is collected a certain number of from preset database
FAQs.It is understood that storing the FAQs of magnanimity and answering for each FAQs in database
Case.These FAQs and answer can be collected from internet by web crawlers processing method.In above-described embodiment
In, it determines and cultivates field wait answer a question for tomato, therefore, a certain number of tomato cultivation necks can be found out from database
The FAQs in domain.It should be understood that the above-mentioned process according to a certain number of FAQs of context mobile phone is only possible
Implementation constitutes any restriction without coping with the application.
S103, word segmentation processing is carried out to the text of all FAQs, is established according to the word segmentation result of all FAQs
Context figure;Wherein, context figure is the non-directed graph of the syntagmatic between each participle for indicating all FAQs.
Specifically, the process for carrying out word segmentation processing to the text of FAQs can refer to the description of above-described embodiment, herein
It repeats no more.
It should be noted that the context figure of the embodiment of the present invention is a net figure, the vertex in net figure is participle, connection
The side of word and word or arc indicate between two words there are syntagmatic (be also possible to weight relationship, the embodiment of the present invention to this not
It limits).Context figure is non-directed graph in the embodiment of the present invention, if context non-directed graph G has n vertex, (i.e. n different
Word), then adjacency matrix is the square matrix of a n*n, is defined as:
In above formula, g [i] [j] indicates participle i and the word of participle j composition to the value in adjacency matrix, and E indicates two words
There are syntagmatics.
For example, there are the text of two FAQs, hereinafter referred to as sample text 1 and sample text 2, sample text
1: the method that tomato carries out nursery in greenhouse;Sample text 2: the method for tomato progress nursery.By segment, go stop words and
After Feature Words extract, will propose four words: tomato, greenhouse, nursery, method in order to express easily, are set to herein: V1
(tomato), V2 (greenhouse), V3 (nursery), V4 (method);There are frontier juncture system (V1V2), (V1V3), and (V2V3), (V3V4) then gives birth to
At context figure (embodiment of the present invention does not consider locality, so be non-directed graph) as shown in Fig. 2, corresponding adjacent square
Battle array is as follows:
After context figure is converted into adjacency matrix, the degree of any vertex (word) can get (that is, of word associated therewith
Number), it is in fact exactly this vertex ViThe sum of the element of the i-th row in adjacency matrix.Example: V1Degree is 2, V2Degree is 2, V3Degree is 3,
V4Degree is 1;Seek vertex ViAll of its neighbor point be exactly that will abut against in matrix the i-th row element to scan one time, it is exactly to abut that element, which is 1,
The word set of point, all of its neighbor point composition is then the context word set of the word: V1The context word set of word includes V2And V3;V2The context of word
Word set includes V1And V3,V3The context word set of word includes V1、 V2And V4, V4The context word set of word includes V3。
S104, for any one FAQs, according to the word segmentation result of the FAQs, participle to be answered a question
As a result and the context figure, the FAQs is calculated and wait the semantic similarity between answering a question.
It should be noted that the embodiment of the present invention is in computing semantic similarity, by that will ask wait answer a question with common
The word segmentation result of topic is mapped in corresponding context and is calculated, the precision of the variance analysis between raising problem, so that semantic phase
The accuracy calculated like degree is higher.
S105, using the corresponding candidate answers of the FAQs with highest similarity as the correspondence to be answered a question
Answer;
Specifically, it to above-mentioned each FAQs and after answering a question and carrying out semantic similarity judgement, can obtain
One FAQs with highest similarity, using the answer of the FAQs with highest similarity as wait solve question and answer
The answer of topic realizes the effect of intelligent answer.
Content based on the above embodiment, as a kind of alternative embodiment, according to the word segmentation result of the FAQs, to
The word segmentation result answered a question and the context figure, calculate the FAQs and the process wait the similarity between answering a question
It is related to the calculating of two levels --- expression layer similarity and semantic layer similarity.So-called expression layer similarity refers to two
The modal similarity degree of sentence, with the number and the relative position in sentence of contained same words or synonym in two sentences
To measure.Semantic layer refers to literal cannot directly reflecting, it is to be understood that sentence face implicit semantic.Surface layer similarity meter
There are many calculation methods, such as cosine similarity, generalized J accard similarity.And semantic layer similarity then can be using semantic word
Allusion quotation and meaning of a word context.
Fig. 3 show the embodiment of the present invention according to the word segmentation result of the FAQs, word segmentation result to be answered a question with
And the context figure, the FAQs and the flow diagram wait the semantic similarity between answering a question are calculated, as shown in figure 3,
Specifically:
S301, the cosine similarity that the first text and the second text are calculated according to context figure.Wherein, first text
For the text of the file to be answered, second text is the text of the FAQs, uses p2It indicates.Wherein, p2Participle
For W1、W2、…Wn, n is from p2Segment obtained participle number.
It should be noted that the cosine value of the angle between cosine similarity i.e. two vector, cosine similarity is used to table
Show the difference degree between two sentences;Cosine similarity lays particular emphasis on the similarities and differences of the vector on direction, that is, the similarities and differences of trend,
Rather than the size of absolute distance.Its formula is as follows:
Wherein, xiIndicate the first text p1In i-th participle TF-IDF weight, yiIndicate the second text p2In i-th
The TF-IDF weight of participle, TF-IDF (term frequency-inverse document frequency) is that one kind is used for
The common weighting technique of information retrieval and data mining.TF means word frequency (Term Frequency) that IDF means inverse text
Frequency index (Inverse Document Frequency).Since context figure is a word set relational graph, in sentence point
After word, the weight progress sentence that can be good at calculating word in sentence using TF-IDF takes word, utilizes more than space vector after taking word
The measuring similarity of string angle will not be influenced by index scale, and cosine value is fallen within section [0,1], and value is bigger, then difference is got over
It is small.
Any one participle of S302, any one participle that the first text is obtained according to context figure and the second text
Similarity, to calculate the offset similarity of the first text and the second text.
It should be noted that the embodiment of the present invention is according to the participle in two texts when calculating offset similarity
What the similarity in context figure obtained, since context figure has recorded the abutment points (i.e. context word set) of each participle, pass through
The approximate situation for comparing abutment points between segmenting two-by-two, that is, can determine whether similarity degree of two texts in word position relationship.
S303, according to context figure obtain the first text in be not present in the second text all participles context word set with
And second be not present in text the first text all participles context word set, to calculate the first text and the second text
Semantic layer similarity.
It should be noted that semantic layer similarity embody two text implicit semantics relationship since be on literal not
The information that can directly translate, the embodiment of the present invention are obtained respectively by context figure and are not present in another text in each text
All participles context word set, semantic layer similarity is calculated by above-mentioned two context word set.
It is S304, similar according to the cosine similarity of the first text and the second text, offset similarity and semantic layer
Degree calculates the semantic similarity of the first text and the second text.
Method provided in an embodiment of the present invention obtains the cosine phase of the first text and the second text by context figure respectively
Like degree, offset similarity and semantic layer similarity, the participle of two texts is obtained in space vector cosine angle and position
The similarity of relationship and the mutual word that do not include finally obtain semantic similarity, to can be improved similar in the similarity of semantic layer
Spend the reliability and accuracy of judgement.
Content based on the above embodiment obtains the TF- segmented in the first/bis- text as a kind of alternative embodiment
The method of IDF weight specifically:
Abutment points of the participles all in first text on context figure are constituted into word set A, by participles all in the second text
Abutment points on context figure constitute word set B;
All participles in word set A and word set B are constituted into word set T, T=A ∪ B;
Abutment points of the participle being not present in the second text in first text on context figure are constituted into word set C;
Abutment points of the participle being not present in the first text in second text on context figure are constituted into word set D.
For the participle x in the first/bis- texti, obtain participle xiAbutment points on context figure constitute word set E, by word
Collect the registration in the participle and word set T in E as participle xiTF value;With 1g (nT/nE∩T) as participle xiIDF value, will
The product of TF value and IDF value is as participle xiTF-IDF weight, wherein nTIndicate the sum segmented in word set T, nE∩TIt indicates
The sum of word set E and word set T shared participle.
The method of the IF-IDF weight segmented in the first/bis- text of acquisition of the embodiment of the present invention, in conjunction with participle in context
Syntagmatic in figure combines context locating for text and obtains IF-IDF weight, can further increase the cosine of text
The precision of similarity.
Content based on the above embodiment obtains any of the first text according to context figure as a kind of alternative embodiment
The similarity of any one participle of one participle and the second text, to calculate the offset phase of the first text and the second text
Like degree, as shown in figure 4, specifically:
S401, according to the first text p1Word segmentation result, obtain the first text in segment sum m, the first text length
len(P1) and participle SiRelative position pos (S in the first texti)。
It should be noted that participle SiRelative position pos (S in the first texti) pass through formula
It calculates, wherein i indicates position of the participle in the first text.
S402, according to the second text p2Word segmentation result, obtain the second text in segment sum n, the second text length
len(P2) and participle WjRelative position pos (W in the second textj);.
It should be noted that participle WjRelative position pos (W in the second textj) pass through formula
It calculates, wherein j indicates position of the participle in the second text.It should be noted that the embodiment of the present invention to step S401 with
The sequencing of S402 is not construed as limiting.
S403, participle S is calculated according to context figureiWith participle WjSimilarity sim (Si, Wj)。
It should be noted that the similarity between the context calculating participle just for participle, this hair is different from the prior art
Bright embodiment obtains participle S especially by context figureiWith participle WjAbutment points, obtain similarity by comparing adjacent point data
sim(Si, Wj), that is, realize the similarity judgement segmented in macroscopical context.
S404, according to formulaMeter
Calculate the first text p1With the second text p2Offset similarity Simp(p1, p2)。
It should be noted that by the formula of offset similarity it is found that this two when the similarity segmented when two is consistent
The relative position of a participle is more consistent, then total offset similarity is bigger, and when the relative position of two participles is consistent,
The more big then total offset similarity of the similarity of participle is bigger.
The method provided in an embodiment of the present invention for calculating offset similarity, obtains the inclined of two texts from context figure
Shifting amount similarity only considers the offset similarity that the context relation of participle obtains compared with prior art, is further promoted
Difference precision between text, so that the accuracy of Semantic Similarity Measurement is higher.
Content based on the above embodiment calculates participle S according to the context figure as a kind of alternative embodimentiWith point
Word WjSimilarity sim (Si, Wj), specifically:
Participle S is obtained on context figureiAbutment points π (Si) and degree len (π (Si));
Participle W is obtained on context figurejAbutment points π (Wj) and degree len (π (Wj));
According to formulaCalculate similarity sim (Si, Wj);
Wherein, T (π (Si)∩π(Wj)) indicate participle SiWith participle WjShared abutment points.
The method provided in an embodiment of the present invention for calculating offset similarity obtains point of two texts from context figure
Similarity between word, only considers the context relation of participle compared with prior art, further promotes the difference essence between text
Degree, so that the accuracy of Semantic Similarity Measurement is higher.
Content based on the above embodiment is obtained in the first text according to context figure and is not deposited as a kind of alternative embodiment
It is to be not present in the language of all participles of the first text in the context word set and the second text of all participles of the second text
Border word set, to calculate the semantic layer similarity of the first text and the second text, specifically:
In the first text p1The second text p of middle acquisition2In the participle that is not present, first participle collection is constituted, on context figure
The context word that the first participle concentrates all participles is obtained, the first context word set π (P is constituted1), in the second text p2It is middle to obtain first
Text p1In the participle that is not present, constitute the second participle collection, the context that the second participle concentrates all participles obtained on context figure
Word constitutes the second context word set π (P2)。
With the first text are as follows: the method that tomato carries out nursery in greenhouse, the second text are as follows: U.S. tomato carries out nursery
Method is illustrated, the first text word segmentation result are as follows: tomato, greenhouse, nursery, method, the word segmentation result of the second text are as follows: beauty
State, tomato, nursery, method obtain in context figure then the participle that the second text in the first text is not present is greenhouse
Participle: the context word set in greenhouse.Similarly, the participle of the first text supplement in the second text is the U.S., is obtained in context figure
Take participle: the context word set in the U.S..
According to formulaCalculate the semanteme of the first text and the second text
Layer similarity SimL(p1, p2);
Wherein, work as p1And p2In be not present antonym when, α=1;Work as p1And p2In there are when antonym, α=- 1;T(π
(P1)∩π(P2)) indicate π (P1) and π (P2) in share context word; T(π(P1)∪π(P2)) indicate π (P1) and π (P2) in institute
Some context words.
It should be noted that when calculating semantic layer similarity using above-mentioned formula, it is also necessary in advance to the first text and
Whether retrieved containing antonym in second text.When containing antonym, the semanteme of two texts is with greater probability
Opposite.According to π (P1) and π (P2) in share context word account for π (P1) and π (P2) in all context word ratio and be
The no state containing antonym, the embodiment of the present invention realize the calculating to semantic layer similarity.The embodiment of the present invention provides
Method combine context figure in the case where, analyze in two sentences mutually do not include word semantic layer similarity have more
High precision.
Content based on the above embodiment, as a kind of alternative embodiment, according to the cosine of the first text and the second text
Similarity, offset similarity and semantic layer similarity calculate the semantic similarity of the first text and the second text, specifically
Are as follows:
According to formula: Simb(p1, p2)=Cosin (p1, p2)+α1×Simp(p1, p2) obtain the first text p1With second
Text p2Expression layer similarity Simb(p1, p2);
According to formula: m (p1, p2)=Simb(p1, p2)+β1×SimL(p1, p2) obtain the first text p1With the second text p2
Semantic similarity m (p1, p2);
Wherein, Cosin (p1, p2)、Simp(p1, p2) and SimL(p1, p2) respectively indicate the first text p1With the second text
p2Cosine similarity, offset similarity and semantic layer similarity, α1Indicate offset similarity for expression layer similarity
Impact factor, β1Indicate semantic layer similarity for the impact factor of semantic similarity.
It should be noted that cosine similarity and offset similarity are collectively formed expression layer phase by the embodiment of the present invention
Like degree, semantic pixel is obtained further according to expression layer similarity and semantic layer similarity are comprehensive.The embodiment of the present invention fully considers
Macroscopical context has carried out the excavation of deeper degree to semantic image, to semanteme.
Content based on the above embodiment passes through practice analysis α as a kind of alternative embodiment1Value should ensure that with partially
The product of shifting amount similarity is less than cosine similarity value, while guaranteeing α1Product with offset similarity is with cosine similarity value
Become larger by 0 and become larger, starts to be become smaller with cosine similarity value by becoming larger when reaching a certain value.Therefore, according to public affairs
Formula: α1=(1-Cosin (p1, p2))×Cosin(p1, p2) obtain impact factor α1;
Pass through practice analysis β1Value, which should ensure that, is less than expression layer similarity value with the product of semantic layer similarity, protects simultaneously
Demonstrate,prove β1Become larger with the product of semantic layer similarity as expression layer similarity value becomes larger by 0, when reaching a certain Zhi Lin circle point
Start to be become smaller with expression layer similarity value by becoming larger.Therefore, according to formula: β1=(1-Simb(p1, p2))×Simb(p1,
p2) obtain impact factor β1。
According to another aspect of the present invention, the embodiment of the present invention also provides a kind of intelligent answer device, referring to Fig. 5, figure
5 show the functional block diagram of the intelligent answer device of the embodiment of the present invention, the system be used in foregoing embodiments according to
It answers a question and matches answer with the semantic similarity of FAQs.Therefore, in the intelligent answer method in foregoing embodiments
Description and definition, can be used for the understanding of each execution module in the embodiment of the present invention.
As shown, the intelligent answer device includes:
Context obtains module 501, word segmentation processing is carried out for treating the text answered a question, according to described wait solve question and answer
The word segmentation result of topic determines the context for carrying out semantic similarity judgement;
FAQs obtains module 502, for collecting a certain number of FAQs according to the context;
Context figure obtains module 503, word segmentation processing is carried out for the text to all FAQs, according to all
The word segmentation result of the FAQs establishes context figure;Wherein, the context figure is each point for indicating all FAQs
The non-directed graph of syntagmatic between word;
Similarity calculation module 504 is used for for any one of FAQs, according to the participle of the FAQs
As a result, word segmentation result to be answered a question and the context figure, calculate the FAQs to wait similar between answering a question
Degree;
Answer matches module 505, for that will have described in the corresponding candidate answers conduct of the FAQs of highest similarity
Wait corresponding answer of answering a question.
The intelligent answer device of the embodiment of the present invention obtains module by context and determines according to wait answer a question for carrying out
The sentence of semantic similarity judgement, then module is obtained by FAQs, a certain number of FAQs are collected according to context, with
It will treat to answer a question to be mapped in the same context with FAQs and analyze, the essence of the variance analysis between raising problem
Degree obtains module according to the receipts of the FAQs of mobile phone by context figure so that the accuracy of Semantic Similarity Measurement is higher
Collection word segmentation result establishes context figure, and the phase wait answer a question with FAQs is calculated according to context figure by similarity calculation module
Like degree, context figure is constructed according to above-mentioned a certain number of FAQs in embodiments of the present invention, embodies big data
Characteristic, with it is existing based on semantic similarity to be compared wait answer a question with the context of common problem building context it is complete
It is complete different.Context in the embodiment of the present invention is context macroscopically, finally will have highest similar by answer matches module
The corresponding candidate answers of the FAQs of degree are as described wait corresponding answer of answering a question.The embodiment of the present invention can be more quasi-
It really analyzes the problem of to be answered and furnishes an answer.
The embodiment of the invention provides a kind of electronic equipment.Referring to Fig. 6, which includes: processor (processor)
601, memory (memory) 602 and bus 603;
Wherein, processor 601 and memory 602 complete mutual communication by bus 603 respectively;Processor 601 is used
In calling the program instruction in memory 602, to execute the calculation method of semantic similarity provided by above-described embodiment, example
Such as include: to treat the text answered a question to carry out word segmentation processing, according to the word segmentation result to be answered a question determine for into
The context of row semantic similarity judgement;A certain number of FAQs are collected according to the context;It described common is asked to all
The text of topic carries out word segmentation processing, establishes context figure according to the word segmentation result of all FAQs;For any one institute
FAQs is stated, according to the word segmentation result of the FAQs, word segmentation result to be answered a question and the context figure, is calculated
The FAQs and wait the similarity between answering a question;The corresponding candidate answers of FAQs with highest similarity are made
To be described wait corresponding answer of answering a question;Wherein, the context figure be indicate all FAQs it is each segment between
The non-directed graph of syntagmatic.
The embodiment of the present invention provides a kind of non-transient computer readable storage medium, the non-transient computer readable storage
Medium storing computer instruction, the computer instruction make computer execute the meter of semantic similarity provided by above-described embodiment
Calculation method, for example, treat the text answered a question and carry out word segmentation processing, according to the word segmentation result to be answered a question
Determine the context for carrying out semantic similarity judgement;A certain number of FAQs are collected according to the context;To all
The text of the FAQs carries out word segmentation processing, establishes context figure according to the word segmentation result of all FAQs;For
Any one of FAQs, according to the word segmentation result of the FAQs, word segmentation result to be answered a question and institute's predicate
Border figure calculates the FAQs and wait the similarity between answering a question;By the corresponding time of FAQs with highest similarity
Select answer as described wait corresponding answer of answering a question;Wherein, the context figure is to indicate all FAQs
The non-directed graph of syntagmatic between each participle.
The apparatus embodiments described above are merely exemplary, wherein unit can be with as illustrated by the separation member
It is or may not be and be physically separated, component shown as a unit may or may not be physical unit,
Can be in one place, or may be distributed over multiple network units.It can select according to the actual needs wherein
Some or all of the modules achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying creativeness
Labour in the case where, it can understand and implement.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment
It can realize by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Based on such reason
Solution, substantially the part that contributes to existing technology can embody above-mentioned technical proposal in the form of software products in other words
Out, which may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD,
It uses including some instructions so that a computer equipment (can be personal computer, server or the network equipment etc.) is held
The method of certain parts of each embodiment of row or embodiment.
Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;To the greatest extent
Present invention has been described in detail with reference to the aforementioned embodiments for pipe, those skilled in the art should understand that: it is still
It is possible to modify the technical solutions described in the foregoing embodiments, or part of technical characteristic is equally replaced
It changes;And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution
Spirit and scope.
Claims (10)
1. a kind of intelligent answer method characterized by comprising
It treats the text answered a question and carries out word segmentation processing, determined according to the word segmentation result to be answered a question for carrying out language
The context of adopted similarity judgement;
A certain number of FAQs are collected according to the context;
Word segmentation processing is carried out to the text of all FAQs, language is established according to the word segmentation result of all FAQs
Border figure;
For any one of FAQs, according to the word segmentation result of the FAQs, word segmentation result to be answered a question with
And the context figure, calculate the FAQs and wait the semantic similarity between answering a question;
Using the corresponding candidate answers of the FAQs with highest similarity as described wait corresponding answer of answering a question;
Wherein, the context figure is the non-directed graph of the syntagmatic between each participle for indicating all FAQs.
2. intelligent answer method according to claim 1, which is characterized in that the participle knot according to the FAQs
It is similar to wait the semanteme between answering a question to calculate the FAQs for fruit, word segmentation result to be answered a question and the context figure
Degree, specifically:
The cosine similarity of the first text and the second text is calculated according to the context figure;Wherein, first text is described
The text of file to be answered, second text are the text of the FAQs;
The phase of any one participle of first text and any one participle of the second text is obtained according to the context figure
Like degree, to calculate the offset similarity of first text and the second text;
The context word set that all participles of second text are not present in first text is obtained according to the context figure,
And the context word set of all participles of first text is not present in second text, to calculate first text
With the semantic layer similarity of the second text;
According to the cosine similarity of first text and the second text, offset similarity and semantic layer similarity, calculate
The semantic similarity of first text and the second text.
3. intelligent answer method according to claim 2, which is characterized in that described to obtain described the according to the context figure
The similarity of any one participle of any one participle and the second text of one text, to calculate first text and second
The offset similarity of text, specifically:
According to the first text p1Word segmentation result, obtain the length len that sum m, the first text are segmented in first text
(P1) and participle SiRelative position pos (S in the first texti);
According to the second text p2Word segmentation result, obtain the length len that sum n, the second text are segmented in second text
(P2) and participle WjRelative position pos (W in the second textj);
Participle S is calculated according to the context figureiWith participle WjSimilarity sim (Si, Wj);
According to formulaCalculate the first text p1
With the second text p2Offset similarity Simp(p1, p2)。
4. intelligent answer method according to claim 3, which is characterized in that described to calculate participle S according to the context figurei
With participle WjSimilarity sim (Si, Wj), specifically:
Participle S is obtained on the context figureiAbutment points π (Si) and degree len (π (Si));
Participle W is obtained on the context figurejAbutment points π (Wj) and degree len (π (Wj));
According to formulaCalculate similarity sim (Si, Wj);
Wherein, T (π (Si)∩π(Wj)) indicate participle SiWith participle WjShared abutment points.
5. intelligent answer method according to claim 2, which is characterized in that described to obtain described the according to the context figure
It is not present in being not present in institute in the context word set and second text of all participles of second text in one text
The context word set of all participles of the first text is stated, to calculate the semantic layer similarity of first text and the second text, tool
Body are as follows:
In the first text p1It is middle to obtain the second text p2In the participle that is not present, first participle collection is constituted, in the context figure
The upper context word for obtaining the first participle and concentrating all participles, constitutes the first context word set π (P1);
In the second text p2It is middle to obtain the first text p1In the participle that is not present, the second participle collection is constituted, in institute's predicate
The context word that second participle concentrates all participles is obtained on the figure of border, constitutes the second context word set π (P2);
According to formulaJuice calculates the semanteme of first text and the second text
Layer similarity SimL(p1, p2);
Wherein, work as p1And p2In be not present antonym when, α=1;Work as p1And p2In there are when antonym, α=- 1;T(π(P1)∩π
(P2)) indicate the π (P1) and π (P2) in share context word;T(π(P1)∪π(P2)) indicate π (P1) and π (P2) in it is all
Context word.
6. intelligent answer method according to claim 2, which is characterized in that described according to first text and the second text
This cosine similarity, offset similarity and semantic layer similarity calculates the semanteme of first text and the second text
Similarity, specifically:
According to formula: Simb(p1, p2)=Cosin (p1, p2)+α1×Simp(p1, p2) obtain the first text p1With the second text p2
Expression layer similarity Simb(p1, p2);
According to formula: m (p1, p2)=Simb(p1, p2)+β1×SimL(p1, p2) obtain the first text p1With the second text p2Language
Adopted similarity m (p1, p2);
Wherein, Cosin (p1, p2)、Simp(p1, p2) and SimL(p1, p2) respectively indicate the first text p1With the second text p2It is remaining
String similarity, offset similarity and semantic layer similarity, α1Indicate influence of the offset similarity for expression layer similarity
The factor, β1Indicate semantic layer similarity for the impact factor of semantic similarity.
7. intelligent answer method according to claim 6, which is characterized in that
According to formula: α1=(1-Cosin (p1, p2))×Cosin(p1, p2) obtain impact factor α1;
According to formula: β1=(1-Simb(p1, p2))×Simb(p1, p2) obtain impact factor β1。
8. a kind of intelligent answer device characterized by comprising
Context obtains module, word segmentation processing is carried out for treating the text answered a question, according to the participle to be answered a question
As a result the context for carrying out semantic similarity judgement is determined;
FAQs obtains module, for collecting a certain number of FAQs according to the context;
Context figure obtains module, word segmentation processing is carried out for the text to all FAQs, according to all described common
The word segmentation result of problem establishes context figure;Wherein, the context figure is the group between each participle for indicating all FAQs
The non-directed graph of conjunction relationship;
Similarity calculation module is used for for any one of FAQs, according to the word segmentation result of the FAQs, wait solve
The word segmentation result of question and answer topic and the context figure, calculate the FAQs and wait the similarity between answering a question;
Answer matches module, for being answered a question using the corresponding candidate answers of the FAQs with highest similarity as described wait solve
Inscribe corresponding answer.
9. a kind of electronic equipment characterized by comprising
At least one processor;And
At least one processor being connect with the processor communication, in which:
The memory is stored with the program instruction that can be executed by the processor, and the processor calls described program to instruct energy
Enough methods executed as described in claim 1 to 7 is any.
10. a kind of non-transient computer readable storage medium, which is characterized in that the non-transient computer readable storage medium is deposited
Computer instruction is stored up, the computer instruction makes the computer execute the method as described in claim 1 to 7 is any.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810790249.5A CN109033318B (en) | 2018-07-18 | 2018-07-18 | Intelligent question and answer method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810790249.5A CN109033318B (en) | 2018-07-18 | 2018-07-18 | Intelligent question and answer method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109033318A true CN109033318A (en) | 2018-12-18 |
CN109033318B CN109033318B (en) | 2020-11-27 |
Family
ID=64643328
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810790249.5A Active CN109033318B (en) | 2018-07-18 | 2018-07-18 | Intelligent question and answer method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109033318B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109840277A (en) * | 2019-02-20 | 2019-06-04 | 西南科技大学 | A kind of government affairs Intelligent Service answering method and system |
CN109918494A (en) * | 2019-03-22 | 2019-06-21 | 深圳狗尾草智能科技有限公司 | Context relation based on figure replys generation method, computer and medium |
CN110069613A (en) * | 2019-04-28 | 2019-07-30 | 河北省讯飞人工智能研究院 | A kind of reply acquisition methods and device |
CN115098653A (en) * | 2022-06-06 | 2022-09-23 | 北京惠及智医科技有限公司 | Question answering method, question answering device, relevant equipment and storage medium |
CN117874202A (en) * | 2024-01-12 | 2024-04-12 | 深圳爱护者科技有限公司 | Intelligent question-answering method and system based on large model |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101566998A (en) * | 2009-05-26 | 2009-10-28 | 华中师范大学 | Chinese question-answering system based on neural network |
CN102346766A (en) * | 2011-09-20 | 2012-02-08 | 北京邮电大学 | Method and device for detecting network hot topics found based on maximal clique |
CN103425635A (en) * | 2012-05-15 | 2013-12-04 | 北京百度网讯科技有限公司 | Method and device for recommending answers |
US20140229161A1 (en) * | 2013-02-12 | 2014-08-14 | International Business Machines Corporation | Latent semantic analysis for application in a question answer system |
CN107766511A (en) * | 2017-10-23 | 2018-03-06 | 深圳市前海众兴电子商务有限公司 | Intelligent answer method, terminal and storage medium |
-
2018
- 2018-07-18 CN CN201810790249.5A patent/CN109033318B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101566998A (en) * | 2009-05-26 | 2009-10-28 | 华中师范大学 | Chinese question-answering system based on neural network |
CN102346766A (en) * | 2011-09-20 | 2012-02-08 | 北京邮电大学 | Method and device for detecting network hot topics found based on maximal clique |
CN103425635A (en) * | 2012-05-15 | 2013-12-04 | 北京百度网讯科技有限公司 | Method and device for recommending answers |
US20140229161A1 (en) * | 2013-02-12 | 2014-08-14 | International Business Machines Corporation | Latent semantic analysis for application in a question answer system |
CN107766511A (en) * | 2017-10-23 | 2018-03-06 | 深圳市前海众兴电子商务有限公司 | Intelligent answer method, terminal and storage medium |
Non-Patent Citations (1)
Title |
---|
王宝勋: "面向网络社区问答对的语义挖掘研究", 《中国博士学位论文全文数据库信息科技辑》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109840277A (en) * | 2019-02-20 | 2019-06-04 | 西南科技大学 | A kind of government affairs Intelligent Service answering method and system |
CN109918494A (en) * | 2019-03-22 | 2019-06-21 | 深圳狗尾草智能科技有限公司 | Context relation based on figure replys generation method, computer and medium |
WO2020191828A1 (en) * | 2019-03-22 | 2020-10-01 | 深圳狗尾草智能科技有限公司 | Graph-based context association reply generation method, computer and medium |
CN109918494B (en) * | 2019-03-22 | 2022-11-04 | 元来信息科技(湖州)有限公司 | Context association reply generation method based on graph, computer and medium |
CN110069613A (en) * | 2019-04-28 | 2019-07-30 | 河北省讯飞人工智能研究院 | A kind of reply acquisition methods and device |
CN115098653A (en) * | 2022-06-06 | 2022-09-23 | 北京惠及智医科技有限公司 | Question answering method, question answering device, relevant equipment and storage medium |
CN117874202A (en) * | 2024-01-12 | 2024-04-12 | 深圳爱护者科技有限公司 | Intelligent question-answering method and system based on large model |
CN117874202B (en) * | 2024-01-12 | 2024-08-30 | 深圳爱护者科技有限公司 | Intelligent question-answering method and system based on large model |
Also Published As
Publication number | Publication date |
---|---|
CN109033318B (en) | 2020-11-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10831769B2 (en) | Search method and device for asking type query based on deep question and answer | |
CN109033318A (en) | Intelligent answer method and device | |
Nie et al. | Data-driven answer selection in community QA systems | |
US11113323B2 (en) | Answer selection using a compare-aggregate model with language model and condensed similarity information from latent clustering | |
CN109145085A (en) | The calculation method and system of semantic similarity | |
CN108984530A (en) | A kind of detection method and detection system of network sensitive content | |
CN110532571A (en) | Text handling method and relevant apparatus | |
WO2015058604A1 (en) | Apparatus and method for obtaining degree of association of question and answer pair and for search ranking optimization | |
US10339214B2 (en) | Structured term recognition | |
CN109145292B (en) | Paraphrase text depth matching model construction method and paraphrase text depth matching method | |
CN112084307B (en) | Data processing method, device, server and computer readable storage medium | |
CN107273348A (en) | The topic and emotion associated detecting method and device of a kind of text | |
CN113868387A (en) | Word2vec medical similar problem retrieval method based on improved tf-idf weighting | |
Van Atteveldt et al. | Studying political decision making with automatic text analysis | |
Qi et al. | What is the limitation of multimodal llms? a deeper look into multimodal llms through prompt probing | |
CN110348539B (en) | Short text relevance judging method | |
CN115775349A (en) | False news detection method and device based on multi-mode fusion | |
Le et al. | CiteOpinion: evidence-based evaluation tool for academic contributions of research papers based on citing sentences | |
Nithya et al. | Meta-heuristic searched-ensemble learning for fake news detection with optimal weighted feature selection approach | |
Mansoorizadeh et al. | Persian Plagiarism Detection Using Sentence Correlations. | |
CN113658690A (en) | Intelligent medical guide method and device, storage medium and electronic equipment | |
CN117454217A (en) | Deep ensemble learning-based depression emotion recognition method, device and system | |
Zhang et al. | Business chatbots with deep learning technologies: State-of-the-art, taxonomies, and future research directions | |
Otani et al. | Large-scale acquisition of commonsense knowledge via a quiz game on a dialogue system | |
CN114580430B (en) | Method for extracting fish disease description emotion words based on neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |