CN109062977A - A kind of automatic question answering text matching technique, automatic question-answering method and system based on semantic similarity - Google Patents
A kind of automatic question answering text matching technique, automatic question-answering method and system based on semantic similarity Download PDFInfo
- Publication number
- CN109062977A CN109062977A CN201810700950.3A CN201810700950A CN109062977A CN 109062977 A CN109062977 A CN 109062977A CN 201810700950 A CN201810700950 A CN 201810700950A CN 109062977 A CN109062977 A CN 109062977A
- Authority
- CN
- China
- Prior art keywords
- word
- text
- automatic question
- weight
- answering
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The present invention provides a kind of automatic question answering text matching technique, automatic question-answering method and system based on semantic similarity, specific method include segmenting to the sentence in text;The stop words in text is removed, non-stop words is retained;Weight is assigned to each word according to text part of speech, is classified according to the severity level of each word, rank is higher, and weight is higher, and rank is lower, and weight is lower;The term vector that is weighted of word each in text is indicated;Text similarity matching is carried out to the term vector after weighting.Compared with prior art, it can more accurately realize that question and answer match, be intended to convenient for more accurately identification user, match corresponding the characteristics of answering template.
Description
Technical field
The automatic question answering text matching technique that the present invention relates to a kind of based on semantic similarity, automatic question-answering method and it is
System, is related to intelligent customer service field.
Background technique
In the prior art, conversational system can be generally divided into three categories: chat type conversational system (Chitchat-bot),
Retrieval type conversational system (IR-bot), Task conversational system (Task-bot).With the development of artificial intelligence, conversational system
Research also achieves different degrees of achievement, some Successful utilization in all trades and professions.However, the industry-specific consulting in portion is automatic
Question answering system is actually rare, and most effect is undesirable, the phenomenon that " giving an irrelevant answer " usually occurs, can not know that user is intended to, difficult
To realize that question and answer match well, reduces the accuracy rate of system and rate of recalling, injury is caused to user experience.
In view of this, the present inventor specially devise a kind of automatic question answering text matching technique based on semantic similarity,
Thus automatic question-answering method and system, this case generate.
Summary of the invention
The present invention provides a kind of automatic question answering text matching technique based on semantic similarity, having can be more accurately
It realizes question and answer matching, is intended to convenient for more accurately identification user, match corresponding the characteristics of answering template.
The present invention also provides a kind of automatic question-answering method and system based on semantic similarity, having can be more accurately
It identifies that user is intended to, matches corresponding the characteristics of answering template.
A kind of automatic question answering text matching technique based on semantic similarity provided according to the present invention, specific method packet
It includes,
Participle operation is carried out to text, the sentence in text is segmented;
Text is carried out to stop word operation, the stop words in text is removed, retains non-stop words;
Weight is assigned to each word according to text part of speech, is classified according to the severity level of each word, rank gets over Gao Quan
Again higher, rank is lower, and weight is lower;
The weighted words vector of each word is expressed as in text: t=v*w;
Text similarity matching, if the weighting term vector of each word is a in text a1,a2,...,an, each word in text b
Weighting term vector be b1,b2,...,bm, then the similarity of text a and b are as follows:
Wherein, v is term vector, and w is term vector weight, and n and m respectively represent the number of word in text a, b, and i and j distinguish table
Show the sequence subscript of some word in text a, b.
The severity level includes core from high to low, secondary core, general and inessential;Wherein, the word of core level and/
Or word includes the noun in sentence trunk;The word and/or word of secondary core level include the verb in sentence trunk;General rank
Word and/or word include pronoun, adjective and adverbial word;The word and/or word of inessential rank include auxiliary word, punctuate, unknown symbols and
Modal particle.
The method also includes being set as the word of other non-classified parts of speech the word and/or word of general rank.
Wherein, the word of core level and/or word weight are 3;The word and/or word weight of secondary core level are 2;General rank
Word and/or word weight be 1;The word and/or word weight of inessential rank are 0.
The method also includes being 1 the word setting weight of other non-classified parts of speech.
A kind of automatic question answering text matching technique based on semantic similarity provided according to the present invention, using above-mentioned automatic
Question and answer text matching technique is applied to shaping and seeks advice from automatic question answering matching process.
A kind of automatic question-answering method based on semantic similarity provided according to the present invention, specific method include being based on
On the basis of above-mentioned automatic question answering text matching technique carry out Matching Model training, be based on trained model, to question and answer into
Row arranges, and realizes that the intention assessment of user, matching template provide corresponding answer.
A kind of automatically request-answering system based on semantic similarity provided according to the present invention, which is characterized in that including client
End and server-side;Wherein server-side carries out Matching Model training on the basis of being based on above-mentioned automatic question answering text matching technique,
Based on trained model, to question and answer to arranging, realize that the intention assessment of user, matching template provide corresponding answer.
Compared with prior art, the present invention can more accurately realize that question and answer match, and can more accurately identify that user anticipates
Figure enables conversational system more more smooth and intelligence to match corresponding the phenomenon that answering template, avoiding the occurrence of " giving an irrelevant answer "
Energyization meets the needs of artificial intelligence, greatly improves the experience of user.
Detailed description of the invention
Fig. 1 is that the automatic question-answering method of a wherein embodiment of the invention realizes schematic diagram.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to embodiments and attached drawing, right
The present invention is further elaborated.It should be appreciated that described herein, specific examples are only used to explain the present invention, not
For limiting the present invention.
Any feature disclosed in this specification (including abstract and attached drawing) unless specifically stated can be equivalent by other
Or the alternative features with similar purpose are replaced.That is, unless specifically stated, each feature is a series of equivalent or class
Like an example in feature.
A kind of automatic question answering text matching technique based on semantic similarity, specific method include,
Participle operation is carried out to text, the sentence in text is segmented;
Text is carried out to stop word operation, the stop words in text is removed, retains non-stop words;
Weight is assigned to each word according to text part of speech, is classified according to the severity level of each word, rank gets over Gao Quan
Again higher, rank is lower, and weight is lower;
The weighted words vector of each word is expressed as in text: t=v*w;
Text similarity matching, if the weighting term vector of each word is a in text a1,a2,...,an, each word in text b
Weighting term vector be b1,b2,...,bm, then the similarity of text a and b are as follows:
Wherein, v is term vector, and w is term vector weight, and n and m respectively indicate the number of word in text a, b, and i and j distinguish table
Show the sequence subscript of some word in text a, b.
Stop word, i.e. stop words, refer in information retrieval, to save storage hole man and improving search efficiency, is handling certainly
Certain words or word are fallen in meeting automatic fitration before or after right language data (or text), these words or word are referred to as
StopWords (stop words).
As a kind of specific embodiment of the invention, the severity level includes core from high to low, secondary core, general
With it is inessential;Wherein, the word and/or word of core level include the noun in sentence trunk;The word and/or word packet of secondary core level
Include the verb in sentence trunk;The word and/or word of general rank include pronoun, adjective and adverbial word;The word of inessential rank and/
Or word includes auxiliary word, punctuate, unknown symbols and modal particle.
As a kind of specific embodiment of the invention, the method also includes setting the word of other non-classified parts of speech
It is set to the word and/or word of general rank.
Wherein, the word of core level and/or word weight are 3;The word and/or word weight of secondary core level are 2;General rank
Word and/or word weight be 1;The word and/or word weight of inessential rank are 0.
The method also includes being 1 the word setting weight of other non-classified parts of speech.
A kind of automatic question answering text matching technique based on semantic similarity, using above-mentioned automatic question answering text matches side
Method is applied to shaping and seeks advice from automatic question answering matching process.
As the specific embodiment of the present invention, it is assumed that text a: are your hospital address at which? text b: it may I ask doctor
Does is the address of institute?
Result after participle:
A: your hospital address are at which
B: may I ask hospital address is
Go stop words result:
A: your hospital address
B: it may I ask hospital address
Each word weight is assigned according to part of speech:
Come assuming that the term vector of each word has trained, the weighting term vector of each word are as follows:
Then the weighting term vector of word is v in text aYou,3vHospital,3vAddress, the weighting term vector of word is 2v in bIt may I ask,3vHospital,
3vAddress。
The similarity of text a, b:
Sim (a, b)=0.5* (max { vYou*2vIt may I ask,vYou*3vHospital,vYou*3vAddress}
+max{3vHospital*2vIt may I ask,3vHospital*3vHospital,3vHospital*3vAddress}
+max{3vAddress*2vIt may I ask,3vAddress*3vHospital,3vAddress*3vAddress})
+0.5*(max{2vIt may I ask*vYou,2vIt may I ask*3vHospital,2vIt may I ask*3vAddress}
+max{3vHospital*vYou,3vHospital*3vHospital,3vHospital*3vAddress}
+max{3vAddress*vYou,3vAddress*3vHospital,3vAddress*3vAddress})
In this embodiment, n indicates that noun, r indicate that pronoun, v indicate verb.
The present invention also provides a kind of automatic question-answering method based on semantic similarity, specific method includes, based on upper
Progress Matching Model is trained on the basis of stating automatic question answering text matching technique, trained model is based on, to question and answer to progress
It arranges, realizes that the intention assessment of user, matching template provide corresponding answer.
Problem can answer be classified as two classes, common question and professional problem.Common question such as inquires hospital address, hospital's working
Time etc., problems can directly carry out FAQ matching, provide unified answer;Professional problem such as cuts double-edged eyelid how much, hand
Art will how long, be permanent? problems can just provide answer after needing the first information such as clear shaping project, Training system.
These wait for that specific information is known as slot position, that is to say, that need first to fill slot position when answering relevant issues, that is, fill out slot.
As shown in Figure 1, either answering common question or professional problem, relevant real corpus has been required as branch
Support, cannot manufacture answer without foundation.Therefore, there are the question and answer of a high quality to seem corpus to be even more important.First, in accordance with problem class
It is other that problem is sorted out, question and answer pair are then arranged respectively.The answer of common question can be unique, but rich for answer,
The expression that can make diversified forms to unified answer, then provides at random;Professional problem needs the filling feelings according to current slot position
Condition provides corresponding answer, needs to combine question and answer to matching and template matching.
Problem sort out be actually intention assessment process, i.e., to user the problem of carry out intention assessment, according to identification
As a result the classification of problem is determined.
Fill out the process that slot is actually Entity recognition, i.e., to user the problem of carry out Entity recognition, according to the result of identification
Corresponding slot position is filled out.(identification be shaping position, project, the mode of shaping of shaping are two-way using crf+
Lstm algorithm)
Question and answer are directed to matching when providing the answer of common question and professional problem, this provides accurate answer
Committed step.The present invention uses the automatic question answering based on semantic similarity to matching process, can more accurately realize question and answer
Matching is intended to convenient for more accurately identification user, and matching is corresponding to answer template.
The present invention also provides a kind of automatically request-answering systems based on semantic similarity, including client and server-side;Its
Middle server-side carries out Matching Model training on the basis of being based on above-mentioned automatic question answering text matching technique, is based on trained mould
Type realizes that the intention assessment of user, matching template provide corresponding answer to question and answer to arranging.
Claims (8)
1. a kind of automatic question answering text matching technique based on semantic similarity, which is characterized in that specific method includes,
Participle operation is carried out to text, the sentence in text is segmented;
Text is carried out to stop word operation, the stop words in text is removed, retains non-stop words;
Weight is assigned to each word according to text part of speech, is classified according to the severity level of each word, the higher weight of rank is more
Height, rank is lower, and weight is lower;
The weighted words vector of each word is expressed as in text: t=v*w;
Text similarity matching, if the weighting term vector of each word is a in text a1,a2,...,an, in text b each word plus
Power term vector is b1,b2,...,bm, then the similarity of text a and b are as follows:
Wherein, v is term vector, and w is term vector weight, and n and m respectively represent the number of word in text a, b, and i and j respectively indicate text
The sequence subscript of some word in this, b.
2. automatic question answering text matching technique according to claim 1, which is characterized in that the severity level is from high to low
Including core, secondary core, general and inessential;Wherein, the word and/or word of core level include the noun in sentence trunk;It is secondary
The word and/or word of core level include the verb in sentence trunk;The word and/or word of general rank include pronoun, adjective and
Adverbial word;The word and/or word of inessential rank include auxiliary word, punctuate, unknown symbols and modal particle.
3. automatic question answering text matching technique according to claim 1, which is characterized in that the method also includes not
The word of other parts of speech of classification is set as the word and/or word of general rank.
4. automatic question answering text matching technique according to claim 2, which is characterized in that wherein, the word of core level and/
Or word weight is 3;The word and/or word weight of secondary core level are 2;The word and/or word weight of general rank are 1;Inessential grade
Other word and/or word weight are 0.
5. automatic question answering text matching technique according to claim 4, which is characterized in that the method also includes not
The word setting weight of other parts of speech of classification is 1.
6. a kind of automatic question answering text matching technique based on semantic similarity, which is characterized in that using claims 1 to 5 it
Automatic question answering text matching technique described in one is applied to shaping and seeks advice from automatic question answering matching process.
7. a kind of automatic question-answering method based on semantic similarity, which is characterized in that specific method includes being based on claim
Matching Model training is carried out on the basis of automatic question answering text matching technique described in one of 1 to 6, is based on trained model,
To question and answer to arranging, realize that the intention assessment of user, matching template provide corresponding answer.
8. a kind of automatically request-answering system based on semantic similarity, which is characterized in that including client and server-side;Wherein service
End carries out Matching Model training, base on the basis of based on automatic question answering text matching technique described in one of claims 1 to 6
In trained model, to question and answer to arranging, realize that the intention assessment of user, matching template provide corresponding answer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810700950.3A CN109062977A (en) | 2018-06-29 | 2018-06-29 | A kind of automatic question answering text matching technique, automatic question-answering method and system based on semantic similarity |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810700950.3A CN109062977A (en) | 2018-06-29 | 2018-06-29 | A kind of automatic question answering text matching technique, automatic question-answering method and system based on semantic similarity |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109062977A true CN109062977A (en) | 2018-12-21 |
Family
ID=64818589
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810700950.3A Pending CN109062977A (en) | 2018-06-29 | 2018-06-29 | A kind of automatic question answering text matching technique, automatic question-answering method and system based on semantic similarity |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109062977A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110033022A (en) * | 2019-03-08 | 2019-07-19 | 腾讯科技(深圳)有限公司 | Processing method, device and the storage medium of text |
CN111429903A (en) * | 2020-03-19 | 2020-07-17 | 百度在线网络技术(北京)有限公司 | Audio signal identification method, device, system, equipment and readable medium |
WO2021008015A1 (en) * | 2019-07-18 | 2021-01-21 | 平安科技(深圳)有限公司 | Intention recognition method, device and computer readable storage medium |
CN112836032A (en) * | 2021-02-07 | 2021-05-25 | 浙江理工大学 | Automatic response method integrating double word segmentation and iterative feedback |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106484664A (en) * | 2016-10-21 | 2017-03-08 | 竹间智能科技(上海)有限公司 | Similarity calculating method between a kind of short text |
CN107273350A (en) * | 2017-05-16 | 2017-10-20 | 广东电网有限责任公司江门供电局 | A kind of information processing method and its device for realizing intelligent answer |
CN107729468A (en) * | 2017-10-12 | 2018-02-23 | 华中科技大学 | Answer extracting method and system based on deep learning |
CN108090077A (en) * | 2016-11-23 | 2018-05-29 | 中国科学院沈阳计算技术研究所有限公司 | A kind of comprehensive similarity computational methods based on natural language searching |
-
2018
- 2018-06-29 CN CN201810700950.3A patent/CN109062977A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106484664A (en) * | 2016-10-21 | 2017-03-08 | 竹间智能科技(上海)有限公司 | Similarity calculating method between a kind of short text |
CN108090077A (en) * | 2016-11-23 | 2018-05-29 | 中国科学院沈阳计算技术研究所有限公司 | A kind of comprehensive similarity computational methods based on natural language searching |
CN107273350A (en) * | 2017-05-16 | 2017-10-20 | 广东电网有限责任公司江门供电局 | A kind of information processing method and its device for realizing intelligent answer |
CN107729468A (en) * | 2017-10-12 | 2018-02-23 | 华中科技大学 | Answer extracting method and system based on deep learning |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110033022A (en) * | 2019-03-08 | 2019-07-19 | 腾讯科技(深圳)有限公司 | Processing method, device and the storage medium of text |
WO2021008015A1 (en) * | 2019-07-18 | 2021-01-21 | 平安科技(深圳)有限公司 | Intention recognition method, device and computer readable storage medium |
CN111429903A (en) * | 2020-03-19 | 2020-07-17 | 百度在线网络技术(北京)有限公司 | Audio signal identification method, device, system, equipment and readable medium |
CN111429903B (en) * | 2020-03-19 | 2021-02-05 | 百度在线网络技术(北京)有限公司 | Audio signal identification method, device, system, equipment and readable medium |
CN112836032A (en) * | 2021-02-07 | 2021-05-25 | 浙江理工大学 | Automatic response method integrating double word segmentation and iterative feedback |
CN112836032B (en) * | 2021-02-07 | 2022-05-06 | 浙江理工大学 | Automatic response method integrating double word segmentation and iterative feedback |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107609121B (en) | News text classification method based on LDA and word2vec algorithm | |
CN111414479B (en) | Label extraction method based on short text clustering technology | |
CN109062977A (en) | A kind of automatic question answering text matching technique, automatic question-answering method and system based on semantic similarity | |
CN103577989B (en) | A kind of information classification approach and information classifying system based on product identification | |
KR101715118B1 (en) | Deep Learning Encoding Device and Method for Sentiment Classification of Document | |
WO2019228466A1 (en) | Named entity recognition method, device and apparatus, and storage medium | |
CN110717023B (en) | Method and device for classifying interview answer text, electronic equipment and storage medium | |
JP3682529B2 (en) | Summary automatic evaluation processing apparatus, summary automatic evaluation processing program, and summary automatic evaluation processing method | |
WO2024131111A1 (en) | Intelligent writing method and apparatus, device, and nonvolatile readable storage medium | |
CN107180026B (en) | Event phrase learning method and device based on word embedding semantic mapping | |
CN109271524B (en) | Entity linking method in knowledge base question-answering system | |
CN105868179A (en) | Intelligent asking-answering method and device | |
CN109255022B (en) | Automatic abstract extraction method for network articles | |
CN111104803B (en) | Semantic understanding processing method, device, equipment and readable storage medium | |
WO2023065642A1 (en) | Corpus screening method, intention recognition model optimization method, device, and storage medium | |
CN109522415B (en) | Corpus labeling method and apparatus | |
CN110516057A (en) | A kind of petition letter problem answer method and device | |
CN111859961A (en) | Text keyword extraction method based on improved TopicRank algorithm | |
CN109214445A (en) | A kind of multi-tag classification method based on artificial intelligence | |
CN110969005A (en) | Method and device for determining similarity between entity corpora | |
CN113486143A (en) | User portrait generation method based on multi-level text representation and model fusion | |
CN104572628B (en) | A kind of science based on syntactic feature defines automatic extraction system and method | |
CN110633468A (en) | Information processing method and device for object feature extraction | |
CN107818078B (en) | Semantic association and matching method for Chinese natural language dialogue | |
CN115617974A (en) | Dialogue processing method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181221 |