CN105955976A - Automatic answering system and method - Google Patents

Automatic answering system and method Download PDF

Info

Publication number
CN105955976A
CN105955976A CN201610237009.3A CN201610237009A CN105955976A CN 105955976 A CN105955976 A CN 105955976A CN 201610237009 A CN201610237009 A CN 201610237009A CN 105955976 A CN105955976 A CN 105955976A
Authority
CN
China
Prior art keywords
synonym
vocabulary
word
key
historical record
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610237009.3A
Other languages
Chinese (zh)
Other versions
CN105955976B (en
Inventor
张佶
盛丽晔
范融
于志安
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
ICBC Technology Co Ltd
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN201610237009.3A priority Critical patent/CN105955976B/en
Publication of CN105955976A publication Critical patent/CN105955976A/en
Application granted granted Critical
Publication of CN105955976B publication Critical patent/CN105955976B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3338Query expansion

Abstract

The invention discloses an automatic answering system and method, and the system comprises a question receiving unit which is used for receiving a question inputted by a user; a keyword extraction unit which is used for analyzing the question, and extracting a key question word; a synonym expansion unit which is used for carrying out the synonym expansion of the key question word, and obtaining a question keyword after synonym expansion; a search unit which is used for searching a historical record with the highest matching degree with the question keyword in a data storage unit after the synonym expansion; a display unit which is used for displaying the historical record to the user; a result receiving unit which is used for receiving an optimal matching result selected by the user in the historical record and storing the optimal matching result in the data storage unit; and the data storage unit which is used for storing the search index data, historical record and optimal matching result.

Description

A kind of automatic answering system and method
Technical field
The present invention relates to the technical field of data processing in computer information system, espespecially a kind of automatic answering system and side Method.
Background technology
Under big data age, constantly create the new customer service channels such as note, wechat, microblogging, enterprises service The data volume of class text record is growing.These records typically include client enquirement, complain, the key such as suggestion Information, and the answer record of service personal.If substantial amounts of history text record can be carried out the matching analysis, and can be Short time provides optimum response for terminal use automatically, will be greatly promoted service quality, and be conducive to establishing good enterprise Image.
For above-mentioned consideration, the most general way is terminal service personnel by using research tool, to history service Text entry scans for, and select answer that degree of association is the highest as with reference to response to terminal use.But, this Method has following limitation: the word first in the problem of terminal use being described product, service has certain random Property, there is the identical concept and use the situation of different vocabulary.It addition, terminal service personnel are when the problem of record, also It is likely to be due to the reasons such as another name, wrong word and have recorded different sayings, thus cause under the accuracy of customer service text matches Fall, data-handling efficiency is low;Secondly cannot accomplish the automatic-answering back device of terminal client problem, reply inefficient.
Summary of the invention
For the deficiency existing for existing response mode, the present invention proposes a kind of automatic answering system and method, passes through Analysis to service text in the past, extracts the synonym pair of replaceable use, and receives similar terminal in terminal system During the problem of user, the vocabulary in problem is first carried out synonym extension, automatic-answering back device after carrying out coupling search, with The time of shortening system response, promote Data Matching degree of accuracy simultaneously.
For reaching above-mentioned purpose, the present invention proposes a kind of automatic answering system, and this system includes: problem receives unit, For the problem receiving user's input;Keyword extracting unit, for being analyzed problem, extracts key issue word Converge;Synonym expanding element, for carrying out synonym extension to key to the issue vocabulary, it is thus achieved that asking after synonym extension Topic key vocabularies;Search unit, the key to the issue vocabulary after search extends with synonym in the data store The historical record that matching degree is the highest;Display unit, for being shown to user by historical record;Result receives unit, uses In receiving the best matching result that this user selects in historical record, best matching result is stored to data storage single Unit;Data storage cell, is used for storing search index data, historical record, best matching result.
For reaching above-mentioned purpose, the invention allows for a kind of method utilizing automatic answering system to carry out automatic-answering back device, The method includes: step 1, the problem receiving user's input;Step 2, is analyzed problem, extracts key and asks Epigraph converges;Step 3, carries out synonym extension to key to the issue vocabulary, it is thus achieved that the key to the issue word after synonym extension Converge;Step 4, key to the issue the highest the going through of terminology match degree after search extends with synonym in the data store The Records of the Historian is recorded;Step 5, is shown to user by historical record;Step 6, receives what this user selected in historical record Best matching result, stores best matching result to data storage cell.
The automatic answering system of present invention proposition and method, can automatically find terminal service text phase by analyzing and processing The near synonym closed to or synonym pair, when terminal use inputs enquirement, automatically carry out synonym extension, promote coupling Accuracy, and automatically carry out response, promote the efficiency of answer problem.
Accompanying drawing explanation
Accompanying drawing described herein is used for providing a further understanding of the present invention, constitutes the part of the application, not Constitute limitation of the invention.In the accompanying drawings:
Fig. 1 is the automatic answering system structural representation of one embodiment of the invention.
Fig. 2 is the structural representation of the data storage cell of one embodiment of the invention.
Fig. 3 is the data structure schematic diagram of the best matching result of one embodiment of the invention storage.
Fig. 4 is that the near synonym of one embodiment of the invention storage are to data structural representation.
Fig. 5 is that the synonym of one embodiment of the invention storage is to data structural representation.
Fig. 6 is the structural representation of the data analysis unit of one embodiment of the invention.
Fig. 7 is that the degree of approximation of one embodiment of the invention analyzes process schematic.
Fig. 8 is the method flow diagram of the automatic-answering back device of one embodiment of the invention.
Fig. 9 is the analysis method flow diagram of the near synonym pair of one embodiment of the invention.
Figure 10 is the analysis method flow diagram of the synonym pair of one embodiment of the invention.
Detailed description of the invention
Hereinafter coordinate diagram and presently preferred embodiments of the present invention, the present invention is expanded on further for reaching predetermined goal of the invention institute The technological means taked.Term " unit " used in it or " module " can realize the software of predetermined function And/or the combination of hardware.Although the system described by following example preferably realizes with software, but hardware, Or the realization of the combination of software and hardware also may and be contemplated.
Fig. 1 is the automatic answering system structural representation of one embodiment of the invention.As it is shown in figure 1, this system includes:
Problem receives unit 100, for the problem receiving user's input;
Keyword extracting unit 200, for being analyzed problem, extracts key issue vocabulary;
Synonym expanding element 300, for carrying out synonym extension to key to the issue vocabulary, it is thus achieved that after synonym extension Key to the issue vocabulary;
Search unit 400, the key to the issue vocabulary after search extends with synonym in data storage cell 700 The historical record that matching degree is the highest;
Display unit 500, for being shown to user by historical record;Wherein, when display unit 500 shows, permissible Historical record the highest for first three matching degree is fed back to user, selects for user.
Result receives unit 600, for receiving the best matching result that this user selects in historical record, and will be optimal Matching result stores to data storage cell 700;
Data storage cell 700, is used for storing search index data, historical record, best matching result.
In the present embodiment, keyword extracting unit 200 is extracted the step of key issue vocabulary and is included:
Question text carrying out Chinese word segmentation, calculates vocabulary weighted value TF_IDF in the text, wherein, TF represents Vocabulary occurrence frequency in current text is the highest, then weight is the biggest, and IDF represents vocabulary occurrence frequency in full text The lowest, then weight is the biggest, extracts a number of vocabulary that TF_IDF value is the highest, as key issue vocabulary;
The computing formula of TF_IDF value is as follows:
TF_IDFi,j=TFi.j×IDFi
Wherein, TF_IDFi,jRepresent vocabulary i weight in problem j;
TF i . j = n i , j Σ k n k , j ;
Wherein TFi.jRepresent vocabulary i word frequency in problem j, ni,jFor vocabulary i occurrence number in problem j, ∑knk,jFor being the occurrence number sum of all words k in problem j;
IDF i = l o g | D | | { j : t i ∈ d j ] ;
Wherein, IDFi represents the document frequency that falls of vocabulary i, and | D | problem of representation is total, | { j:ti∈djRepresent comprise word Problem d of language tijNumber.
In the present embodiment, shown in Fig. 2, for the structural representation of data storage cell.As in figure 2 it is shown, number Include according to memory element 700: coupling record memory module 710, search index memory module 720, terminal service note Address book stored module 730, near synonym to memory module 740, synonym to memory module 750.Wherein,
Coupling record memory module 710, is used for storing best matching result.As it is shown on figure 3, for optimal of storage Join the data structure schematic diagram of result.Wherein have recorded session id, terminal use's numbering, historical problem, problem word Converge and match time.
Search index memory module 720, is used for storing search index data, historical record is set up inverted index, supply Searcher is inquired about, the incremental update by the increase along with historical record content.
Terminal service record memory module 730, for storing history, described historical record includes: historical problem And by near synonym to or synonym to generate answer text.
Memory module 740, key to the issue vocabulary and near synonym after storing synonym extension are converged and form by near synonym Near synonym pair.As shown in Figure 4, for storage near synonym to data structural representation.Wherein contain key issue Vocabulary, near synonym converge and the degree of association.
Synonym is to memory module 750, the key to the issue vocabulary after storing synonym extension and synonym vocabulary composition Synonym pair.As it is shown in figure 5, be that the synonym of storage is to data structural representation.Wherein contain key issue Vocabulary, synonym vocabulary.
Further, in conjunction with shown in Fig. 1, automatic answering system also includes: data analysis unit 800, for right Key to the issue vocabulary is analyzed with best matching result, according to analysis result set up near synonym to or synonym pair, and It is stored in data storage cell 700.
Shown in Fig. 6, for the structural representation of data analysis unit.As shown in Figure 6, data analysis unit 800 Including: module 810, retrieval sequence analysis module 820, phonetic analysis module 830, Co-occurrence Analysis analyzed near synonym Module 840, click feature analyze module 850.Wherein,
Module 810 analyzed near synonym, for key to the issue vocabulary and best matching result degree of being associated analysis, obtaining After having the near synonym of certain degree of association, storage near synonym is to memory module 740, and these near synonym are probably generally Read close vocabulary, it is also possible to there is father and son's concept of hyponymy.
The computing formula of nearly justice relation is as follows:
p u j = Σ i ∈ N ( u ) ∩ S ( j , K ) w j i r u i ;
Wherein, pujProblem of representation key vocabularies u and the degree of approximation of best matching result j, N (u) is that problem is closed The best matching result set that keyword remittance is mated, S (j, K) is its highest with best matching result j matching times Its K best matching result set, wjiIt is best matching result j and the matching times of a number of vocabulary i, rui It it is the key to the issue vocabulary u matching times to a number of vocabulary i;
The degree of approximation calculating the word pair obtained being normalized, formula is:
Y=(x-MinValue)/(MaxValue-MinValue), stores degree of approximation more than the near synonym of a setting threshold value To near synonym to memory module 740.
More clearly explaining to above-mentioned near synonym be analyzed the function of module 810, coming below by way of an embodiment Illustrate.
Shown in Fig. 7, the degree of approximation for one embodiment of the invention analyzes process schematic.As it is shown in fig. 7, ask Epigraph converge " ATM " have 30 times and be have matched historical problem one by terminal use, have matched historical problem 2 10 times;Go through History problem one has also been matched vocabulary " self-service facility " 6 times and self-service ATM 8 times;Historical problem two is also matched Vocabulary " self-service ATM " 12 times and " gulping down card " 10 times.
Calculate according to above-mentioned formula, and after being normalized, obtain following word pairing approximation degree:
The self-service facility of ATM: 0.3;
The self-service ATM of ATM: 1;
ATM gulps down card: 0;
If degree of approximation threshold value is 0.2, then " the self-service ATM of ATM ", " the self-service facility of ATM " are judged as Near synonym.
Retrieval sequence analysis module 820, is used near synonym reading near synonym pair memory module 740, and analyzes Described near synonym are replaced the probability of use in coupling record memory module 710 in a time series set, by In terminal use when inputting a problem vocabulary, if not obtaining preferable result, often can be a relatively short period of time The vocabulary that interior selection same meaning can mutually be replaced carries out the rewriting of problem description.Therefore, it can probability more than one The near synonym of setting threshold value, to being judged to synonym pair, are stored in synonym to memory module 750.
Phonetic analyzes module 830, is used near synonym reading near synonym pair memory module 740, and analyzes nearly justice The pronunciation similarity of word pair, this be due to terminal use input problem time in order to ensure input speed, have input unisonance The probability of wrong word is relatively big, as;Microblogging and meagre, same meaning in fact.Therefore pronunciation similarity can be more than The unisonance near synonym of one setting threshold value, to being judged to synonym pair, are stored in synonym to memory module 750;Wherein, Pronunciation calculating formula of similarity is as follows:
Sim d i s = 1 - min D i s ( S w 1 , S w 2 ) m a x ( | S w 1 | , | S w 2 | ) ;
Wherein, SimdisRepresent pronunciation similarity, SwiRepresent the pronunciation character string of wi, | Swi| represent wi pronunciation character The length of string, i=1,2, minDis (Sw1,Sw2) represent smallest edit distance.
Co-occurrence Analysis module 840, is used near synonym reading near synonym pair memory module 740, and analyzes nearly justice Word is to the size of co-occurrence degree in the terminal service recording text that terminal service record memory module 730 stores.Due to Terminal use describe problem time, may have different saying to before and after a vocabulary, as first used full name, follow-up again Mention, use abbreviation.Therefore, if co-occurrence degree reaches a setting threshold value, it is possible to determine that near synonym are to for synonym Right, it is stored in synonym to memory module 750;Wherein, the co-occurrence degree formula calculating two vocabulary is as follows:
w i j = | N ( i ) ∩ N ( j ) | | N ( i ) | | N ( j ) | ;
Wherein, wijRepresenting the co-occurrence degree of vocabulary i and vocabulary j, N (i) represents the historical record of the vocabulary i that goes wrong Set;N (i) ∩ N (j) represents historical record set vocabulary i and vocabulary j simultaneously occur;| N (i) | represents the word that goes wrong The quantity of the historical record set of remittance i.
Click feature analyzes module 850, is used near synonym reading near synonym pair memory module 740, and analyzes Described near synonym are in the terminal service recording text that terminal service record memory module 730 stores, and word i occurs in In inquiry, but not appearing in the title of historical record, word j occurs in the title of historical record, calculates word i With the computing formula of the ratio that is exchanged with each other of word j it is:
C i j = | wt i wq j | | wt i wq j | + | wt j wq i | ;
Wherein, CijRepresent and be exchanged with each other ratio, wtiRepresent that word i occurs in title, | wtiwqj| represent that word i occurs in In title, word j quantity in queries occurs;
The ratio word more than a setting threshold value will be exchanged with each other to storage to synonym memory module 750.
The automatic answering system that the present invention proposes is that mode based on synonym extension carries out automatic-answering back device, data therein The end user problems of a large amount of history and corresponding optimal response can be analyzed by analytic unit, obtain having relevant Property near synonym pair, the near synonym obtained are processed doing further calculating, filter out available synonym pair, will knot Fruit is stored in synonym to memory element.When service, the problem vocabulary in first extraction problem, is expanded by synonym Problem vocabulary is extended by exhibition unit, promotes vocabulary coverage rate, most preferably goes through search further according to vocabulary association degree History response record returns to terminal use.
Based on same inventive concept, the embodiment of the present invention additionally provides a kind of auto-answer method, such as following enforcement Described in example.Owing to the principle of the method solution problem is similar to said system, therefore the enforcement of the method may refer to State the enforcement of system, repeat no more in place of repetition.
Fig. 8 is the method flow diagram of the automatic-answering back device of one embodiment of the invention.The method can be by above-mentioned automatic-answering back device System is carried out, including:
Step S1, the problem receiving user's input;
Step S2, is analyzed problem, extracts key issue vocabulary;
Step S3, carries out synonym extension to key to the issue vocabulary, it is thus achieved that the key to the issue vocabulary after synonym extension;
Step S4, the key to the issue terminology match degree after search extends with synonym in the data store is the highest Historical record;Wherein, historical record includes: historical problem and by near synonym to or synonym to generate answer literary composition This.
Step S5, is shown to user by historical record, can show the historical record that first three matching degree is the highest.
Step S6, receives the best matching result that this user selects in historical record, best matching result is stored To data storage cell.
Shown in Fig. 9, for the analysis method flow diagram of near synonym pair.As it is shown in figure 9, include:
Step 101, obtains the key to the issue vocabulary after synonym extension and the best matching result of correspondence;
Step 102, is successively read the key to the issue vocabulary after synonym extension;
Step 103, matching times w between key to the issue vocabulary and best matching result after statistics synonym extension;
Step 104, is successively read best matching result;
Step 105, searches the historical record mating described best matching result, is successively read historical record;
Step 106, adds up described historical record and matches the number of times r of best matching result;
Step 107, calculating the nearly justice degree p=w × r between vocabulary, if running into the best matching result of repetition, then will P adds up;
Step 108, reads near synonym judgment threshold s, if p > s, is then stored as near synonym pair;
Step 109, it may be judged whether for last historical record, be then to perform step 110, otherwise repeat step Rapid 105;
Step 110, it may be judged whether for last best matching result, be then to perform step 111, otherwise repeat to hold Row step 104;
Step 111, it may be judged whether the key to the issue vocabulary after extending for last synonym, is to analyze and terminate, Otherwise repeated execution of steps 102.
Shown in Figure 10, for the analysis method flow diagram of synonym pair.As shown in Figure 10, including:
Step 201, is successively read near synonym pair;
Step 202, is calculated the pronunciation degree of approximation between the pinyin character string of near synonym pair;
Step 203, it is judged that whether the pronunciation degree of approximation, more than a setting threshold value, is then to perform step 210, otherwise continues Perform step 204;
Step 204, according to retrieval sequence, calculates in same session, on the basis of search the first word, searches again The rope conditional probability of the second word;
Step 205, it is judged that whether conditional probability, more than a setting threshold value, is then to perform step 210, otherwise continues to hold Row step 206;
Step 206, analyzes the co-occurrence degree of the first word and the second word;
Step 207, it is judged that whether co-occurrence degree, more than a setting threshold value, is then to perform step 210, otherwise continues to hold Row step 208;
Step 208, analyzes the click feature of two vocabulary;
Step 209, it is judged that click feature, more than a setting threshold value, is then to perform step 210, otherwise terminates synonym Analyze, it is determined that for non-synonym;
Step 210, stores synonym pair.
The automatic answering system of present invention proposition and method, can automatically find terminal service text phase by analyzing and processing The near synonym closed to or synonym pair, when terminal use inputs enquirement, automatically carry out synonym extension, promote coupling Accuracy, and automatically carry out response, promote the efficiency of answer problem.
Particular embodiments described above, has been carried out the purpose of the present invention, technical scheme and beneficial effect the most in detail Describe in detail bright, be it should be understood that the specific embodiment that the foregoing is only the present invention, be not used to limit this Bright protection domain, all within the spirit and principles in the present invention, any modification, equivalent substitution and improvement etc. done, Should be included within the scope of the present invention.

Claims (13)

1. an automatic answering system, it is characterised in that this system includes:
Problem receives unit, for the problem receiving user's input;
Keyword extracting unit, for being analyzed problem, extracts key issue vocabulary;
Synonym expanding element, for carrying out synonym extension to key to the issue vocabulary, it is thus achieved that asking after synonym extension Topic key vocabularies;
Search unit, the key to the issue terminology match degree after search extends with synonym in the data store is High historical record;
Display unit, for being shown to user by historical record;
Result receives unit, for receiving the best matching result that this user selects in historical record, will most preferably mate Result stores to data storage cell;
Data storage cell, is used for storing search index data, historical record, best matching result.
System the most according to claim 1, it is characterised in that described keyword extracting unit, for asking Topic is analyzed, and extracts key issue vocabulary, including:
Question text carrying out Chinese word segmentation, calculates vocabulary weighted value TF_IDF in the text, wherein, TF represents Vocabulary occurrence frequency in current text is the highest, then weight is the biggest, and IDF represents vocabulary occurrence frequency in full text The lowest, then weight is the biggest, extracts a number of vocabulary that TF_IDF value is the highest, as key issue vocabulary;
The computing formula of TF_IDF value is as follows:
TF_IDFi,j=TFi.j×IDFi
Wherein, TF_IDFi,jRepresent vocabulary i weight in problem j;
TF i . j = n i , j Σ k n k , j ;
Wherein TFi.jRepresent vocabulary i word frequency in problem j, ni,jFor vocabulary i occurrence number in problem j, ∑knk,jFor being the occurrence number sum of all words k in problem j;
IDF i = l o g | D | | { j : t i ∈ d j } | ;
Wherein, IDFiRepresenting the document frequency that falls of vocabulary i, | D | problem of representation is total, | { j:ti∈dj| represent and comprise word Language tiProblem djNumber.
System the most according to claim 1, it is characterised in that described data storage cell includes:
Coupling record memory module, is used for storing best matching result;
Search index memory module, is used for storing search index data, historical record is set up inverted index, for search Device is inquired about, the incremental update by the increase along with historical record content;
Terminal service record memory module, for storing history, described historical record includes: historical problem and logical Cross near synonym to or synonym to generate answer text;
Near synonym are to memory module, and key to the issue vocabulary after storing synonym extension and near synonym converge the near of composition Justice word pair;
Synonym to memory module, key to the issue vocabulary after storing synonym extension and synonym vocabulary composition same Justice word pair.
System the most according to claim 3, it is characterised in that this system also includes: data analysis unit, For key to the issue vocabulary is analyzed with best matching result, according to analysis result set up near synonym to or synonym Right, and it is stored in data storage cell.
System the most according to claim 4, it is characterised in that described data analysis unit includes:
Module analyzed near synonym, and for key to the issue vocabulary and best matching result degree of being associated analysis, nearly justice is closed The computing formula of system is as follows:
p u j = Σ i ∈ N ( u ) ∩ S ( j , K ) w j i r u i ;
Wherein, pujProblem of representation key vocabularies u and the degree of approximation of best matching result j, N (u) is that problem is closed The best matching result set that keyword remittance is mated, S (j, K) is its highest with best matching result j matching times Its K best matching result set, wjiIt is best matching result j and the matching times of a number of vocabulary i, rui It it is the key to the issue vocabulary u matching times to a number of vocabulary i;
It is normalized calculating the degree of approximation of word pair obtained, near more than a setting threshold value of degree of approximation Justice word stores near synonym to memory module.
System the most according to claim 5, it is characterised in that described data analysis unit also includes: retrieval Sequence analysis module, near synonym to memory module read near synonym pair, and analyze described near synonym to Join the probability being replaced use in record memory module in a time series set, by probability more than a setting threshold value Near synonym, to being judged to synonym pair, are stored in synonym to memory module.
System the most according to claim 5, it is characterised in that described data analysis unit also includes: phonetic Analyze module, be used near synonym memory module reads near synonym pair, and analyze the pronunciation similarity of near synonym pair, By pronunciation similarity more than one setting threshold value unisonance near synonym to being judged to synonym pair, be stored in synonym to storage Module;Wherein, pronunciation calculating formula of similarity is as follows:
Sim d i s = 1 - min D i s ( S w 1 , S w 2 ) m a x ( | S w 1 | , | S w 2 | ) ;
Wherein, SimdisRepresent pronunciation similarity, SwiRepresent the pronunciation character string of wi, | Swi| represent wi pronunciation character The length of string, i=1,2, minDis (Sw1,Sw2) represent smallest edit distance.
System the most according to claim 5, it is characterised in that described data analysis unit also includes: co-occurrence Analyze module, be used near synonym memory module reads near synonym pair, and analyze near synonym in terminal service note The size of co-occurrence degree in the terminal service recording text of address book stored module stores, if co-occurrence degree reaches a setting threshold Value, it is determined that near synonym, to for synonym pair, are stored in synonym to memory module;Wherein, being total to of two vocabulary is calculated Existing degree formula is as follows:
w i j = | N ( i ) ∩ N ( j ) | | N ( i ) | | N ( j ) | ;
Wherein, wijRepresenting the co-occurrence degree of vocabulary i and vocabulary j, N (i) represents the historical record of the vocabulary i that goes wrong Set;N (i) ∩ N (j) represents historical record set vocabulary i and vocabulary j simultaneously occur;| N (i) | represents the word that goes wrong The quantity of the historical record set of remittance i.
System the most according to claim 5, it is characterised in that described data analysis unit also includes: click on Characteristics analysis module, is used near synonym reading near synonym pair memory module, and analyzes described near synonym at end In the terminal service recording text of end service log memory module storage, word i occurs in queries, but does not appears in In the title of historical record, word j occurs in the title of historical record, and calculate word i and word j is exchanged with each other ratio Computing formula be:
C i j = | wt i wq j | | wt i wq j | + | wt j wq i | ;
Wherein, CijRepresent and be exchanged with each other ratio, wtiRepresent that word i occurs in title, | wtiwqj| represent that word i occurs in In title, word j quantity in queries occurs;
The ratio word more than a setting threshold value will be exchanged with each other to storage to synonym memory module.
10. the method that the automatic answering system utilizing claim 1 carries out automatic-answering back device, it is characterised in that should Method includes:
Step 1, the problem receiving user's input;
Step 2, is analyzed problem, extracts key issue vocabulary;
Step 3, carries out synonym extension to key to the issue vocabulary, it is thus achieved that the key to the issue vocabulary after synonym extension;
Step 4, key to the issue the highest the going through of terminology match degree after search extends with synonym in the data store The Records of the Historian is recorded;
Step 5, is shown to user by historical record;
Step 6, receives the best matching result that this user selects in historical record, best matching result is stored to Data storage cell.
11. methods according to claim 10, it is characterised in that described historical record includes: historical problem And by near synonym to or synonym to generate answer text.
12. methods according to claim 11, it is characterised in that the analysis method of described near synonym pair includes:
Step 101, obtains the key to the issue vocabulary after synonym extension and the best matching result of correspondence;
Step 102, is successively read the key to the issue vocabulary after synonym extension;
Step 103, matching times w between key to the issue vocabulary and best matching result after statistics synonym extension;
Step 104, is successively read best matching result;
Step 105, searches the historical record mating described best matching result, is successively read historical record;
Step 106, adds up described historical record and matches the number of times r of best matching result;
Step 107, calculating the nearly justice degree p=w × r between vocabulary, if running into the best matching result of repetition, then will P adds up;
Step 108, reads near synonym judgment threshold s, if p > s, is then stored as near synonym pair;
Step 109, it may be judged whether for last historical record, be then to perform step 110, otherwise repeat step Rapid 105;
Step 110, it may be judged whether for last best matching result, be then to perform step 111, otherwise repeat to hold Row step 104;
Step 111, it may be judged whether the key to the issue vocabulary after extending for last synonym, is to analyze and terminate, Otherwise repeated execution of steps 102.
13. methods according to claim 12, it is characterised in that the analysis method of described synonym pair includes:
Step 201, is successively read near synonym pair;
Step 202, is calculated the pronunciation degree of approximation between the pinyin character string of near synonym pair;
Step 203, it is judged that whether the pronunciation degree of approximation, more than a setting threshold value, is then to perform step 210, otherwise continues Perform step 204;
Step 204, according to retrieval sequence, calculates in same session, on the basis of search the first word, searches again The rope conditional probability of the second word;
Step 205, it is judged that whether conditional probability, more than a setting threshold value, is then to perform step 210, otherwise continues to hold Row step 206;
Step 206, analyzes the co-occurrence degree of the first word and the second word;
Step 207, it is judged that whether co-occurrence degree, more than a setting threshold value, is then to perform step 210, otherwise continues to hold Row step 208;
Step 208, analyzes the click feature of two vocabulary;
Step 209, it is judged that click feature, more than a setting threshold value, is then to perform step 210, otherwise terminates synonym Analyze, it is determined that for non-synonym;
Step 210, stores synonym pair.
CN201610237009.3A 2016-04-15 2016-04-15 A kind of automatic answering system and method Active CN105955976B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610237009.3A CN105955976B (en) 2016-04-15 2016-04-15 A kind of automatic answering system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610237009.3A CN105955976B (en) 2016-04-15 2016-04-15 A kind of automatic answering system and method

Publications (2)

Publication Number Publication Date
CN105955976A true CN105955976A (en) 2016-09-21
CN105955976B CN105955976B (en) 2019-05-14

Family

ID=56917383

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610237009.3A Active CN105955976B (en) 2016-04-15 2016-04-15 A kind of automatic answering system and method

Country Status (1)

Country Link
CN (1) CN105955976B (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106503265A (en) * 2016-11-30 2017-03-15 北京赛迈特锐医疗科技有限公司 Structured search system and its searching method based on weights
CN106599297A (en) * 2016-12-28 2017-04-26 北京百度网讯科技有限公司 Method and device for searching question-type search terms on basis of deep questions and answers
CN106649868A (en) * 2016-12-30 2017-05-10 首都师范大学 Method and device for matching between questions and answers
CN107220317A (en) * 2017-05-17 2017-09-29 北京百度网讯科技有限公司 Matching degree appraisal procedure, device, equipment and storage medium based on artificial intelligence
CN107453980A (en) * 2017-07-26 2017-12-08 北京小米移动软件有限公司 Problem response method and device in instant messaging
CN108009253A (en) * 2017-12-05 2018-05-08 昆明理工大学 A kind of improved character string Similar contrasts method
CN108509474A (en) * 2017-09-15 2018-09-07 腾讯科技(深圳)有限公司 Search for the synonym extended method and device of information
CN109063060A (en) * 2018-07-20 2018-12-21 吴怡 A kind of semantic net legal advice service robot
CN109299320A (en) * 2018-10-30 2019-02-01 上海智臻智能网络科技股份有限公司 A kind of information interacting method, device, computer equipment and storage medium
WO2019041517A1 (en) * 2017-08-29 2019-03-07 平安科技(深圳)有限公司 Electronic device, question recognition and confirmation method, and computer-readable storage medium
CN109710732A (en) * 2018-11-19 2019-05-03 东软集团股份有限公司 Information query method, device, storage medium and electronic equipment
CN110019701A (en) * 2017-09-18 2019-07-16 京东方科技集团股份有限公司 Method, question and answer service system and storage medium for question and answer service
CN110222192A (en) * 2019-05-20 2019-09-10 国网电子商务有限公司 Corpus method for building up and device
CN110442760A (en) * 2019-07-24 2019-11-12 银江股份有限公司 A kind of the synonym method for digging and device of question and answer searching system
CN109189897B (en) * 2018-07-27 2020-07-31 什伯(上海)智能技术有限公司 Chatting method and chatting device based on data content matching
CN113609273A (en) * 2021-08-12 2021-11-05 云知声(上海)智能科技有限公司 Method and device for configuring mechanical speech technology, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101174259A (en) * 2007-09-17 2008-05-07 张琰亮 Intelligent interactive request-answering system
CN101398835A (en) * 2007-09-30 2009-04-01 日电(中国)有限公司 Service selecting system and method, and service enquiring system and method based on natural language
CN103902652A (en) * 2014-02-27 2014-07-02 深圳市智搜信息技术有限公司 Automatic question-answering system
US20140337329A1 (en) * 2010-09-28 2014-11-13 International Business Machines Corporation Providing answers to questions using multiple models to score candidate answers
CN104809197A (en) * 2015-04-24 2015-07-29 同程网络科技股份有限公司 On-line question and answer method based on intelligent robot

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101174259A (en) * 2007-09-17 2008-05-07 张琰亮 Intelligent interactive request-answering system
CN101398835A (en) * 2007-09-30 2009-04-01 日电(中国)有限公司 Service selecting system and method, and service enquiring system and method based on natural language
US20140337329A1 (en) * 2010-09-28 2014-11-13 International Business Machines Corporation Providing answers to questions using multiple models to score candidate answers
CN103902652A (en) * 2014-02-27 2014-07-02 深圳市智搜信息技术有限公司 Automatic question-answering system
CN104809197A (en) * 2015-04-24 2015-07-29 同程网络科技股份有限公司 On-line question and answer method based on intelligent robot

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106503265A (en) * 2016-11-30 2017-03-15 北京赛迈特锐医疗科技有限公司 Structured search system and its searching method based on weights
CN106599297A (en) * 2016-12-28 2017-04-26 北京百度网讯科技有限公司 Method and device for searching question-type search terms on basis of deep questions and answers
CN106649868A (en) * 2016-12-30 2017-05-10 首都师范大学 Method and device for matching between questions and answers
US11481419B2 (en) 2017-05-17 2022-10-25 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for evaluating matching degree based on artificial intelligence, device and storage medium
CN107220317A (en) * 2017-05-17 2017-09-29 北京百度网讯科技有限公司 Matching degree appraisal procedure, device, equipment and storage medium based on artificial intelligence
CN107220317B (en) * 2017-05-17 2020-12-18 北京百度网讯科技有限公司 Matching degree evaluation method, device, equipment and storage medium based on artificial intelligence
CN107453980A (en) * 2017-07-26 2017-12-08 北京小米移动软件有限公司 Problem response method and device in instant messaging
WO2019041517A1 (en) * 2017-08-29 2019-03-07 平安科技(深圳)有限公司 Electronic device, question recognition and confirmation method, and computer-readable storage medium
CN108509474B (en) * 2017-09-15 2022-01-07 腾讯科技(深圳)有限公司 Synonym expansion method and device for search information
CN108509474A (en) * 2017-09-15 2018-09-07 腾讯科技(深圳)有限公司 Search for the synonym extended method and device of information
CN110019701A (en) * 2017-09-18 2019-07-16 京东方科技集团股份有限公司 Method, question and answer service system and storage medium for question and answer service
CN108009253A (en) * 2017-12-05 2018-05-08 昆明理工大学 A kind of improved character string Similar contrasts method
CN109063060A (en) * 2018-07-20 2018-12-21 吴怡 A kind of semantic net legal advice service robot
CN109189897B (en) * 2018-07-27 2020-07-31 什伯(上海)智能技术有限公司 Chatting method and chatting device based on data content matching
CN109299320B (en) * 2018-10-30 2020-09-25 上海智臻智能网络科技股份有限公司 Information interaction method and device, computer equipment and storage medium
CN109299320A (en) * 2018-10-30 2019-02-01 上海智臻智能网络科技股份有限公司 A kind of information interacting method, device, computer equipment and storage medium
CN109710732B (en) * 2018-11-19 2021-03-05 东软集团股份有限公司 Information query method, device, storage medium and electronic equipment
CN109710732A (en) * 2018-11-19 2019-05-03 东软集团股份有限公司 Information query method, device, storage medium and electronic equipment
CN110222192A (en) * 2019-05-20 2019-09-10 国网电子商务有限公司 Corpus method for building up and device
CN110442760A (en) * 2019-07-24 2019-11-12 银江股份有限公司 A kind of the synonym method for digging and device of question and answer searching system
CN110442760B (en) * 2019-07-24 2022-02-15 银江技术股份有限公司 Synonym mining method and device for question-answer retrieval system
CN113609273A (en) * 2021-08-12 2021-11-05 云知声(上海)智能科技有限公司 Method and device for configuring mechanical speech technology, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN105955976B (en) 2019-05-14

Similar Documents

Publication Publication Date Title
CN105955976A (en) Automatic answering system and method
CN108536852B (en) Question-answer interaction method and device, computer equipment and computer readable storage medium
CN106649818B (en) Application search intention identification method and device, application search method and server
CA2556202C (en) Method and apparatus for fundamental operations on token sequences: computing similarity, extracting term values, and searching efficiently
CN106874292B (en) Topic processing method and device
US20100010803A1 (en) Text paraphrasing method and program, conversion rule computing method and program, and text paraphrasing system
CN106970991B (en) Similar application identification method and device, application search recommendation method and server
CN106126619A (en) A kind of video retrieval method based on video content and system
CN105912629A (en) Intelligent question and answer method and device
CN110597978B (en) Article abstract generation method, system, electronic equipment and readable storage medium
CN111078893A (en) Method for efficiently acquiring and identifying linguistic data for dialog meaning graph in large scale
CN113468891A (en) Text processing method and device
CN114168841A (en) Content recommendation method and device
CN111782793A (en) Intelligent customer service processing method, system and equipment
CN114281972A (en) Dialog control method, system storage medium and server based on subject object tracking and cognitive inference
CN111209367A (en) Information searching method, information searching device, electronic equipment and storage medium
CN106649279A (en) Specific information automatic generation system and method
CN111309882B (en) Method and device for realizing intelligent customer service question and answer
CN109635289B (en) Entry classification method and audit information extraction method
CN110866393B (en) Resume information extraction method and system based on domain knowledge base
CN112115237B (en) Construction method and device of tobacco science and technology literature data recommendation model
CN111625722B (en) Talent recommendation method, system and storage medium based on deep learning
CN113836377A (en) Information association method and device, electronic equipment and storage medium
KR101147508B1 (en) Apparatus and Method for recommending of search formula
CN116775813B (en) Service searching method, device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20210106

Address after: 100140, 55, Fuxing Avenue, Xicheng District, Beijing

Patentee after: INDUSTRIAL AND COMMERCIAL BANK OF CHINA

Patentee after: ICBC Technology Co.,Ltd.

Address before: 100140, 55, Fuxing Avenue, Xicheng District, Beijing

Patentee before: INDUSTRIAL AND COMMERCIAL BANK OF CHINA

TR01 Transfer of patent right