CN109062977A - A kind of automatic question answering text matching technique, automatic question-answering method and system based on semantic similarity - Google Patents

A kind of automatic question answering text matching technique, automatic question-answering method and system based on semantic similarity Download PDF

Info

Publication number
CN109062977A
CN109062977A CN201810700950.3A CN201810700950A CN109062977A CN 109062977 A CN109062977 A CN 109062977A CN 201810700950 A CN201810700950 A CN 201810700950A CN 109062977 A CN109062977 A CN 109062977A
Authority
CN
China
Prior art keywords
word
text
automatic question
weight
answering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810700950.3A
Other languages
Chinese (zh)
Inventor
康祖荫
肖龙源
蔡振华
李稀敏
刘晓葳
谭玉坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Kuaishangtong Technology Corp ltd
Original Assignee
Xiamen Kuaishangtong Technology Corp ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Kuaishangtong Technology Corp ltd filed Critical Xiamen Kuaishangtong Technology Corp ltd
Priority to CN201810700950.3A priority Critical patent/CN109062977A/en
Publication of CN109062977A publication Critical patent/CN109062977A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The present invention provides a kind of automatic question answering text matching technique, automatic question-answering method and system based on semantic similarity, specific method include segmenting to the sentence in text;The stop words in text is removed, non-stop words is retained;Weight is assigned to each word according to text part of speech, is classified according to the severity level of each word, rank is higher, and weight is higher, and rank is lower, and weight is lower;The term vector that is weighted of word each in text is indicated;Text similarity matching is carried out to the term vector after weighting.Compared with prior art, it can more accurately realize that question and answer match, be intended to convenient for more accurately identification user, match corresponding the characteristics of answering template.

Description

A kind of automatic question answering text matching technique based on semantic similarity, automatic question answering side Method and system
Technical field
The automatic question answering text matching technique that the present invention relates to a kind of based on semantic similarity, automatic question-answering method and it is System, is related to intelligent customer service field.
Background technique
In the prior art, conversational system can be generally divided into three categories: chat type conversational system (Chitchat-bot), Retrieval type conversational system (IR-bot), Task conversational system (Task-bot).With the development of artificial intelligence, conversational system Research also achieves different degrees of achievement, some Successful utilization in all trades and professions.However, the industry-specific consulting in portion is automatic Question answering system is actually rare, and most effect is undesirable, the phenomenon that " giving an irrelevant answer " usually occurs, can not know that user is intended to, difficult To realize that question and answer match well, reduces the accuracy rate of system and rate of recalling, injury is caused to user experience.
In view of this, the present inventor specially devise a kind of automatic question answering text matching technique based on semantic similarity, Thus automatic question-answering method and system, this case generate.
Summary of the invention
The present invention provides a kind of automatic question answering text matching technique based on semantic similarity, having can be more accurately It realizes question and answer matching, is intended to convenient for more accurately identification user, match corresponding the characteristics of answering template.
The present invention also provides a kind of automatic question-answering method and system based on semantic similarity, having can be more accurately It identifies that user is intended to, matches corresponding the characteristics of answering template.
A kind of automatic question answering text matching technique based on semantic similarity provided according to the present invention, specific method packet It includes,
Participle operation is carried out to text, the sentence in text is segmented;
Text is carried out to stop word operation, the stop words in text is removed, retains non-stop words;
Weight is assigned to each word according to text part of speech, is classified according to the severity level of each word, rank gets over Gao Quan Again higher, rank is lower, and weight is lower;
The weighted words vector of each word is expressed as in text: t=v*w;
Text similarity matching, if the weighting term vector of each word is a in text a1,a2,...,an, each word in text b Weighting term vector be b1,b2,...,bm, then the similarity of text a and b are as follows:
Wherein, v is term vector, and w is term vector weight, and n and m respectively represent the number of word in text a, b, and i and j distinguish table Show the sequence subscript of some word in text a, b.
The severity level includes core from high to low, secondary core, general and inessential;Wherein, the word of core level and/ Or word includes the noun in sentence trunk;The word and/or word of secondary core level include the verb in sentence trunk;General rank Word and/or word include pronoun, adjective and adverbial word;The word and/or word of inessential rank include auxiliary word, punctuate, unknown symbols and Modal particle.
The method also includes being set as the word of other non-classified parts of speech the word and/or word of general rank.
Wherein, the word of core level and/or word weight are 3;The word and/or word weight of secondary core level are 2;General rank Word and/or word weight be 1;The word and/or word weight of inessential rank are 0.
The method also includes being 1 the word setting weight of other non-classified parts of speech.
A kind of automatic question answering text matching technique based on semantic similarity provided according to the present invention, using above-mentioned automatic Question and answer text matching technique is applied to shaping and seeks advice from automatic question answering matching process.
A kind of automatic question-answering method based on semantic similarity provided according to the present invention, specific method include being based on On the basis of above-mentioned automatic question answering text matching technique carry out Matching Model training, be based on trained model, to question and answer into Row arranges, and realizes that the intention assessment of user, matching template provide corresponding answer.
A kind of automatically request-answering system based on semantic similarity provided according to the present invention, which is characterized in that including client End and server-side;Wherein server-side carries out Matching Model training on the basis of being based on above-mentioned automatic question answering text matching technique, Based on trained model, to question and answer to arranging, realize that the intention assessment of user, matching template provide corresponding answer.
Compared with prior art, the present invention can more accurately realize that question and answer match, and can more accurately identify that user anticipates Figure enables conversational system more more smooth and intelligence to match corresponding the phenomenon that answering template, avoiding the occurrence of " giving an irrelevant answer " Energyization meets the needs of artificial intelligence, greatly improves the experience of user.
Detailed description of the invention
Fig. 1 is that the automatic question-answering method of a wherein embodiment of the invention realizes schematic diagram.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to embodiments and attached drawing, right The present invention is further elaborated.It should be appreciated that described herein, specific examples are only used to explain the present invention, not For limiting the present invention.
Any feature disclosed in this specification (including abstract and attached drawing) unless specifically stated can be equivalent by other Or the alternative features with similar purpose are replaced.That is, unless specifically stated, each feature is a series of equivalent or class Like an example in feature.
A kind of automatic question answering text matching technique based on semantic similarity, specific method include,
Participle operation is carried out to text, the sentence in text is segmented;
Text is carried out to stop word operation, the stop words in text is removed, retains non-stop words;
Weight is assigned to each word according to text part of speech, is classified according to the severity level of each word, rank gets over Gao Quan Again higher, rank is lower, and weight is lower;
The weighted words vector of each word is expressed as in text: t=v*w;
Text similarity matching, if the weighting term vector of each word is a in text a1,a2,...,an, each word in text b Weighting term vector be b1,b2,...,bm, then the similarity of text a and b are as follows:
Wherein, v is term vector, and w is term vector weight, and n and m respectively indicate the number of word in text a, b, and i and j distinguish table Show the sequence subscript of some word in text a, b.
Stop word, i.e. stop words, refer in information retrieval, to save storage hole man and improving search efficiency, is handling certainly Certain words or word are fallen in meeting automatic fitration before or after right language data (or text), these words or word are referred to as StopWords (stop words).
As a kind of specific embodiment of the invention, the severity level includes core from high to low, secondary core, general With it is inessential;Wherein, the word and/or word of core level include the noun in sentence trunk;The word and/or word packet of secondary core level Include the verb in sentence trunk;The word and/or word of general rank include pronoun, adjective and adverbial word;The word of inessential rank and/ Or word includes auxiliary word, punctuate, unknown symbols and modal particle.
As a kind of specific embodiment of the invention, the method also includes setting the word of other non-classified parts of speech It is set to the word and/or word of general rank.
Wherein, the word of core level and/or word weight are 3;The word and/or word weight of secondary core level are 2;General rank Word and/or word weight be 1;The word and/or word weight of inessential rank are 0.
The method also includes being 1 the word setting weight of other non-classified parts of speech.
A kind of automatic question answering text matching technique based on semantic similarity, using above-mentioned automatic question answering text matches side Method is applied to shaping and seeks advice from automatic question answering matching process.
As the specific embodiment of the present invention, it is assumed that text a: are your hospital address at which? text b: it may I ask doctor Does is the address of institute?
Result after participle:
A: your hospital address are at which
B: may I ask hospital address is
Go stop words result:
A: your hospital address
B: it may I ask hospital address
Each word weight is assigned according to part of speech:
Come assuming that the term vector of each word has trained, the weighting term vector of each word are as follows:
Then the weighting term vector of word is v in text aYou,3vHospital,3vAddress, the weighting term vector of word is 2v in bIt may I ask,3vHospital, 3vAddress
The similarity of text a, b:
Sim (a, b)=0.5* (max { vYou*2vIt may I ask,vYou*3vHospital,vYou*3vAddress}
+max{3vHospital*2vIt may I ask,3vHospital*3vHospital,3vHospital*3vAddress}
+max{3vAddress*2vIt may I ask,3vAddress*3vHospital,3vAddress*3vAddress})
+0.5*(max{2vIt may I ask*vYou,2vIt may I ask*3vHospital,2vIt may I ask*3vAddress}
+max{3vHospital*vYou,3vHospital*3vHospital,3vHospital*3vAddress}
+max{3vAddress*vYou,3vAddress*3vHospital,3vAddress*3vAddress})
In this embodiment, n indicates that noun, r indicate that pronoun, v indicate verb.
The present invention also provides a kind of automatic question-answering method based on semantic similarity, specific method includes, based on upper Progress Matching Model is trained on the basis of stating automatic question answering text matching technique, trained model is based on, to question and answer to progress It arranges, realizes that the intention assessment of user, matching template provide corresponding answer.
Problem can answer be classified as two classes, common question and professional problem.Common question such as inquires hospital address, hospital's working Time etc., problems can directly carry out FAQ matching, provide unified answer;Professional problem such as cuts double-edged eyelid how much, hand Art will how long, be permanent? problems can just provide answer after needing the first information such as clear shaping project, Training system. These wait for that specific information is known as slot position, that is to say, that need first to fill slot position when answering relevant issues, that is, fill out slot.
As shown in Figure 1, either answering common question or professional problem, relevant real corpus has been required as branch Support, cannot manufacture answer without foundation.Therefore, there are the question and answer of a high quality to seem corpus to be even more important.First, in accordance with problem class It is other that problem is sorted out, question and answer pair are then arranged respectively.The answer of common question can be unique, but rich for answer, The expression that can make diversified forms to unified answer, then provides at random;Professional problem needs the filling feelings according to current slot position Condition provides corresponding answer, needs to combine question and answer to matching and template matching.
Problem sort out be actually intention assessment process, i.e., to user the problem of carry out intention assessment, according to identification As a result the classification of problem is determined.
Fill out the process that slot is actually Entity recognition, i.e., to user the problem of carry out Entity recognition, according to the result of identification Corresponding slot position is filled out.(identification be shaping position, project, the mode of shaping of shaping are two-way using crf+ Lstm algorithm)
Question and answer are directed to matching when providing the answer of common question and professional problem, this provides accurate answer Committed step.The present invention uses the automatic question answering based on semantic similarity to matching process, can more accurately realize question and answer Matching is intended to convenient for more accurately identification user, and matching is corresponding to answer template.
The present invention also provides a kind of automatically request-answering systems based on semantic similarity, including client and server-side;Its Middle server-side carries out Matching Model training on the basis of being based on above-mentioned automatic question answering text matching technique, is based on trained mould Type realizes that the intention assessment of user, matching template provide corresponding answer to question and answer to arranging.

Claims (8)

1. a kind of automatic question answering text matching technique based on semantic similarity, which is characterized in that specific method includes,
Participle operation is carried out to text, the sentence in text is segmented;
Text is carried out to stop word operation, the stop words in text is removed, retains non-stop words;
Weight is assigned to each word according to text part of speech, is classified according to the severity level of each word, the higher weight of rank is more Height, rank is lower, and weight is lower;
The weighted words vector of each word is expressed as in text: t=v*w;
Text similarity matching, if the weighting term vector of each word is a in text a1,a2,...,an, in text b each word plus Power term vector is b1,b2,...,bm, then the similarity of text a and b are as follows:
Wherein, v is term vector, and w is term vector weight, and n and m respectively represent the number of word in text a, b, and i and j respectively indicate text The sequence subscript of some word in this, b.
2. automatic question answering text matching technique according to claim 1, which is characterized in that the severity level is from high to low Including core, secondary core, general and inessential;Wherein, the word and/or word of core level include the noun in sentence trunk;It is secondary The word and/or word of core level include the verb in sentence trunk;The word and/or word of general rank include pronoun, adjective and Adverbial word;The word and/or word of inessential rank include auxiliary word, punctuate, unknown symbols and modal particle.
3. automatic question answering text matching technique according to claim 1, which is characterized in that the method also includes not The word of other parts of speech of classification is set as the word and/or word of general rank.
4. automatic question answering text matching technique according to claim 2, which is characterized in that wherein, the word of core level and/ Or word weight is 3;The word and/or word weight of secondary core level are 2;The word and/or word weight of general rank are 1;Inessential grade Other word and/or word weight are 0.
5. automatic question answering text matching technique according to claim 4, which is characterized in that the method also includes not The word setting weight of other parts of speech of classification is 1.
6. a kind of automatic question answering text matching technique based on semantic similarity, which is characterized in that using claims 1 to 5 it Automatic question answering text matching technique described in one is applied to shaping and seeks advice from automatic question answering matching process.
7. a kind of automatic question-answering method based on semantic similarity, which is characterized in that specific method includes being based on claim Matching Model training is carried out on the basis of automatic question answering text matching technique described in one of 1 to 6, is based on trained model, To question and answer to arranging, realize that the intention assessment of user, matching template provide corresponding answer.
8. a kind of automatically request-answering system based on semantic similarity, which is characterized in that including client and server-side;Wherein service End carries out Matching Model training, base on the basis of based on automatic question answering text matching technique described in one of claims 1 to 6 In trained model, to question and answer to arranging, realize that the intention assessment of user, matching template provide corresponding answer.
CN201810700950.3A 2018-06-29 2018-06-29 A kind of automatic question answering text matching technique, automatic question-answering method and system based on semantic similarity Pending CN109062977A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810700950.3A CN109062977A (en) 2018-06-29 2018-06-29 A kind of automatic question answering text matching technique, automatic question-answering method and system based on semantic similarity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810700950.3A CN109062977A (en) 2018-06-29 2018-06-29 A kind of automatic question answering text matching technique, automatic question-answering method and system based on semantic similarity

Publications (1)

Publication Number Publication Date
CN109062977A true CN109062977A (en) 2018-12-21

Family

ID=64818589

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810700950.3A Pending CN109062977A (en) 2018-06-29 2018-06-29 A kind of automatic question answering text matching technique, automatic question-answering method and system based on semantic similarity

Country Status (1)

Country Link
CN (1) CN109062977A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110033022A (en) * 2019-03-08 2019-07-19 腾讯科技(深圳)有限公司 Processing method, device and the storage medium of text
CN111429903A (en) * 2020-03-19 2020-07-17 百度在线网络技术(北京)有限公司 Audio signal identification method, device, system, equipment and readable medium
WO2021008015A1 (en) * 2019-07-18 2021-01-21 平安科技(深圳)有限公司 Intention recognition method, device and computer readable storage medium
CN112836032A (en) * 2021-02-07 2021-05-25 浙江理工大学 Automatic response method integrating double word segmentation and iterative feedback

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106484664A (en) * 2016-10-21 2017-03-08 竹间智能科技(上海)有限公司 Similarity calculating method between a kind of short text
CN107273350A (en) * 2017-05-16 2017-10-20 广东电网有限责任公司江门供电局 A kind of information processing method and its device for realizing intelligent answer
CN107729468A (en) * 2017-10-12 2018-02-23 华中科技大学 Answer extracting method and system based on deep learning
CN108090077A (en) * 2016-11-23 2018-05-29 中国科学院沈阳计算技术研究所有限公司 A kind of comprehensive similarity computational methods based on natural language searching

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106484664A (en) * 2016-10-21 2017-03-08 竹间智能科技(上海)有限公司 Similarity calculating method between a kind of short text
CN108090077A (en) * 2016-11-23 2018-05-29 中国科学院沈阳计算技术研究所有限公司 A kind of comprehensive similarity computational methods based on natural language searching
CN107273350A (en) * 2017-05-16 2017-10-20 广东电网有限责任公司江门供电局 A kind of information processing method and its device for realizing intelligent answer
CN107729468A (en) * 2017-10-12 2018-02-23 华中科技大学 Answer extracting method and system based on deep learning

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110033022A (en) * 2019-03-08 2019-07-19 腾讯科技(深圳)有限公司 Processing method, device and the storage medium of text
WO2021008015A1 (en) * 2019-07-18 2021-01-21 平安科技(深圳)有限公司 Intention recognition method, device and computer readable storage medium
CN111429903A (en) * 2020-03-19 2020-07-17 百度在线网络技术(北京)有限公司 Audio signal identification method, device, system, equipment and readable medium
CN111429903B (en) * 2020-03-19 2021-02-05 百度在线网络技术(北京)有限公司 Audio signal identification method, device, system, equipment and readable medium
CN112836032A (en) * 2021-02-07 2021-05-25 浙江理工大学 Automatic response method integrating double word segmentation and iterative feedback
CN112836032B (en) * 2021-02-07 2022-05-06 浙江理工大学 Automatic response method integrating double word segmentation and iterative feedback

Similar Documents

Publication Publication Date Title
CN107609121B (en) News text classification method based on LDA and word2vec algorithm
CN111414479B (en) Label extraction method based on short text clustering technology
CN109062977A (en) A kind of automatic question answering text matching technique, automatic question-answering method and system based on semantic similarity
CN103577989B (en) A kind of information classification approach and information classifying system based on product identification
KR101715118B1 (en) Deep Learning Encoding Device and Method for Sentiment Classification of Document
WO2019228466A1 (en) Named entity recognition method, device and apparatus, and storage medium
CN110717023B (en) Method and device for classifying interview answer text, electronic equipment and storage medium
JP3682529B2 (en) Summary automatic evaluation processing apparatus, summary automatic evaluation processing program, and summary automatic evaluation processing method
WO2024131111A1 (en) Intelligent writing method and apparatus, device, and nonvolatile readable storage medium
CN107180026B (en) Event phrase learning method and device based on word embedding semantic mapping
CN109271524B (en) Entity linking method in knowledge base question-answering system
CN105868179A (en) Intelligent asking-answering method and device
CN109255022B (en) Automatic abstract extraction method for network articles
CN111104803B (en) Semantic understanding processing method, device, equipment and readable storage medium
WO2023065642A1 (en) Corpus screening method, intention recognition model optimization method, device, and storage medium
CN109522415B (en) Corpus labeling method and apparatus
CN110516057A (en) A kind of petition letter problem answer method and device
CN111859961A (en) Text keyword extraction method based on improved TopicRank algorithm
CN109214445A (en) A kind of multi-tag classification method based on artificial intelligence
CN110969005A (en) Method and device for determining similarity between entity corpora
CN113486143A (en) User portrait generation method based on multi-level text representation and model fusion
CN104572628B (en) A kind of science based on syntactic feature defines automatic extraction system and method
CN110633468A (en) Information processing method and device for object feature extraction
CN107818078B (en) Semantic association and matching method for Chinese natural language dialogue
CN115617974A (en) Dialogue processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181221