WO2020233386A1 - 基于aiml的智能问答方法、装置、计算机设备及存储介质 - Google Patents

基于aiml的智能问答方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2020233386A1
WO2020233386A1 PCT/CN2020/088052 CN2020088052W WO2020233386A1 WO 2020233386 A1 WO2020233386 A1 WO 2020233386A1 CN 2020088052 W CN2020088052 W CN 2020088052W WO 2020233386 A1 WO2020233386 A1 WO 2020233386A1
Authority
WO
WIPO (PCT)
Prior art keywords
answer
preset
question
text
information
Prior art date
Application number
PCT/CN2020/088052
Other languages
English (en)
French (fr)
Inventor
艾明
Original Assignee
深圳壹账通智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹账通智能科技有限公司 filed Critical 深圳壹账通智能科技有限公司
Publication of WO2020233386A1 publication Critical patent/WO2020233386A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Definitions

  • This application relates to the technical field of intelligent question answering robots, in particular to AIML-based intelligent question answering methods, devices, computer equipment and storage media.
  • AIML Artificial Intelligence Markup Language
  • XML eXtensible Markup Language
  • Extensible Markup Language Extensible Markup Language
  • AIML can realize the interaction between users and question answering robots, but the inventor found that the current AIML has the following problems when used in Chinese conversations: there is basically no public Chinese rule base, and the rule base is equivalent to the "brain" of the dialogue robot.
  • the AIML interpreter does not support Chinese well, for example, English input uses spaces to separate words, while Chinese The input is not separated by spaces, which leads to a low problem recognition rate and matching rate.
  • the main purpose of this application is to provide an AIML-based intelligent question answering method, device, computer equipment and storage medium, increase the AIML Chinese rule library, solve the problem of the existing AIML's low support for Chinese, and achieve improved problem recognition rate and Match rate.
  • This application proposes an AIML-based intelligent question and answer method, including: obtaining question information input by a user, and obtaining text information according to the question information; converting Chinese in the text information into the same Chinese font to obtain the first text corresponding to the text information,
  • the Chinese font is Chinese simplified or Chinese traditional; according to the preset filtering rules, delete the specified symbols in the first text to obtain the second text; according to the preset Chinese word segmentation rules, the second text is segmented in Chinese to obtain the second text correspondence
  • This application also proposes an AIML-based intelligent question answering device, which includes: a first acquisition module for acquiring question information input by a user and obtaining text information according to the question information; a conversion module for converting Chinese in the text information into The same Chinese font is used to obtain the first text corresponding to the text information, and the Chinese font is Chinese simplified or traditional Chinese; the filtering module is used to delete the specified symbols in the first text according to the preset filtering rules to obtain the second text; the word segmentation module, It is used to perform Chinese word segmentation on the second text according to the preset Chinese word segmentation rules to obtain multiple first fields corresponding to the second text; the first matching module is used to perform synonym matching on each first field to obtain each first field.
  • the module is used to match each target text with the preset question in the preset question and answer file.
  • the preset question and answer file contains the mapping relationship information between the preset question and the first answer; the second acquisition module is used to if the target text matches If the preset question in the preset question and answer file is successfully matched, the first answer corresponding to the preset question is obtained, and the first answer is used as the answer to the question information.
  • This application also proposes a computer device, including a memory and an executor, the memory stores a computer program, and the executor implements the steps of the AIML-based intelligent question answering method when the computer program is executed.
  • This application also proposes a storage medium on which a computer program is stored.
  • the computer program is executed by an executor, the steps of the AIML-based intelligent question answering method are realized.
  • This application adds the Chinese rule database by configuring the question and answer data table, the synonym table, the professional vocabulary of Chinese word segmentation, the correspondence table between traditional and simplified in the intelligent question and answer database; font conversion and special symbols are performed on the text information corresponding to the question information Normalization processing such as filtering, Chinese word segmentation, synonym matching, text replacement, etc., enhance AIML's support for Chinese, so that AIML can better recognize text information, thereby improving problem recognition rate and matching rate; configure presets corresponding to various business types Question and answer files to ensure data security and avoid mutual interference, making AIML support multiple business question and answer scenarios at the same time. By matching the target text with the preset question, the answer to the question information is obtained.
  • Normalization processing such as filtering, Chinese word segmentation, synonym matching, text replacement, etc.
  • Figure 1 is a schematic diagram of the steps of an AIML-based intelligent question answering method in an embodiment of this application
  • Figure 2 is a schematic structural diagram of an AIML-based intelligent question answering device in an embodiment of the application
  • Fig. 3 is a schematic structural diagram of a computer device in an embodiment of the application.
  • AIML-based intelligent question answering method in an embodiment of the present application includes:
  • S4 Perform Chinese word segmentation on the second text according to preset Chinese word segmentation rules to obtain multiple first fields corresponding to the second text;
  • S7 Match each target text with a preset question in a preset question and answer file, and the preset question and answer file contains the mapping relationship information between the preset question and the first answer;
  • the access terminal for the above question information may be a question and answer dialogue scenario such as WeChat chat, web site online question answering and customer service.
  • the question information can be input by the user's voice or manually. If it is a voice input, the voice is converted into text information through a voice-to-text conversion tool.
  • the above question information can include questions of multiple business types.
  • the intelligent question and answer file corresponding to each business type can be configured and maintained through the background management system, that is, each business type has a corresponding table of questions and answers, so the intelligent question answering robot It can support multiple business types of question information at the same time, and are separated from each other without interfering with each other, which overcomes the limitation that an intelligent question answering robot can only support one business type in the past.
  • the above-mentioned Chinese is converted to the same Chinese font, that is, the simplified Chinese in the text information is converted into traditional Chinese or the traditional Chinese in the text information is converted into simplified Chinese.
  • the Chinese font of the preset question and answer file For conversion, if the preset question in the preset question and answer file is simplified Chinese, and the text information corresponding to the question information input by the Hong Kong user is a traditional font, then the simplified Chinese of the text information is converted to traditional Chinese.
  • the principle can be to configure the corresponding relationship configuration file between traditional Chinese and simplified Chinese, or it can be a database table that configures the corresponding relationship between traditional Chinese and simplified Chinese. Because the database table is convenient to add or modify the data in the table flexibly through the background management system interface , So it is preferably a database table, which can be implemented with the help of an open source toolkit, such as zhconverter in Java.
  • the above preset filtering rules are the processing rules added to the specified punctuation marks and spaces in the text according to the Chinese language habits. If spaces appear in the text, the spaces will be deleted, and the dashes in the text will also be deleted. , And if the foreigner’s first name and last name are connected with a ⁇ sign, the filter rule also deletes the symbol.
  • the above-mentioned preset Chinese word segmentation rules include simultaneous full segmentation and atomic segmentation of the second text, the segmentation plan to achieve the optimal path according to the hidden Markov model and Viterbi algorithm, and then perform name recognition, system dictionary supplementation, and user self-segmentation.
  • the above Chinese word segmentation is to divide the text information into multiple phrases or fields. Specifically, it can be realized by open source Chinese word segmentation tools, such as Ansj. Ansj supports custom dictionaries. Therefore, users can edit the proprietary vocabulary corresponding to the business type to make AIML support Problem information of different business types and improve the recognition rate of problems, such as the name of a company’s insurance product: eLife Insurance.
  • the above-mentioned synonym matching is to match the above-mentioned Chinese word segmentation to obtain synonyms corresponding to the phrase or field, and the principle is to configure the file or database table according to the correspondence relationship between the Chinese vocabulary and its corresponding synonyms.
  • the above-mentioned replacing the first field corresponding to the second field according to the second field is replacing the phrase or field in the text information with a corresponding synonym, so as to obtain a new text, including replacing the words in the second text
  • the first field is replaced multiple times to replace one or more fields in the first field with the corresponding second field, thereby obtaining multiple target texts.
  • the step S6 of replacing the first field corresponding to the second field according to the second field to obtain multiple target texts includes:
  • step S61 it includes replacing one field in all the first fields with a second field corresponding to the field each time, and replacing multiple fields in all the first fields with a first field corresponding to it each time.
  • Two fields where when one first field corresponds to multiple second fields, the first field is replaced one by one corresponding to one of the multiple second fields. For example, if the text information is "What are the benefits of applying for pension insurance?", the text information is segmented in Chinese, and the first fields such as "application”, “endowment”, “insurance”, “benefits” and “what” are obtained, and then synonyms are matched.
  • the original text information "What are the benefits of handling endowment insurance" will also be used as the target text.
  • the question information entered by the user is in traditional Chinese
  • the returned data is also converted to traditional Chinese and used as the answer to the question information.
  • multiple target texts are obtained, so that AIML can better recognize text information and improve the problem recognition rate and matching rate.
  • the above-mentioned preset question and answer file is a question and answer data table corresponding to multiple service types in the intelligent question and answer database, and the question and answer data table includes a correspondence table between the preset question and the first answer.
  • the above-mentioned preset question and answer files can be configured through the background management system. Specifically, each business person can log in to the back-end management system with his account. After entering the system, they can configure the question and answer data table, the synonym table, the professional vocabulary list of Chinese word segmentation, the correspondence table between traditional and simplified, etc., and the table can be modified or deleted The data.
  • the intelligent question answering system interface which triggers the intelligent question answering system to generate the question and answer data table, the synonym table, the professional vocabulary list of Chinese word segmentation, the correspondence table between traditional and simplified, etc., or update the above table data.
  • each business person can only see and configure the relevant content of his business type to ensure data security and avoid mutual interference.
  • the user may be prompted to select a business type for consultation option. For example, if the user selects the option of a wealth management business, the question and answer corresponding to the wealth management business The preset question corresponding to the matching target text in the data table.
  • step S8 if the matching is successful, it means that there is a preset question that is the same as the question information in the question and answer data table, that is, the first answer corresponding to the preset question is used as the answer to the target text.
  • multiple target texts are based on the same text information, so the meaning of multiple target texts is basically the same, so a first answer is obtained as the answer to the text information, but in multiple target texts There may also be differences in the meaning of the target text, and the first answer that is inconsistent with the answers of other target texts is matched, and different first answers matched by multiple target texts are returned to the front-end interface for the user's reference.
  • step S1 of obtaining question information input by a user and obtaining text information according to the question information includes:
  • S12 Perform voice preprocessing on the voice signal to obtain an observation sequence of the voice signal
  • the voice signal is composed of the user's voice and the environmental noise, and the environmental noise will cause interference to the voice recognition, so the voice signal is voice preprocessed.
  • the above-mentioned voice preprocessing is to divide the voice signal into frames through the VAD (Voice Activity Detection) technology and establish an HMM (Hidden Markov Model) model corresponding to the voice signal.
  • VAD Voice Activity Detection
  • HMM Hidden Markov Model
  • the user’s voice signal is divided into overlapping voice frames according to its period to ensure that the LPC (Linear Predictive Coding) spectrum estimation is relevant from frame to frame; the endpoint detection algorithm is used to find the start and end of the voice, Then look for the intensity and the number of zero-crossing points of each speech frame to calculate the threshold of the energy zero-crossing point value, thereby removing most of the environmental noise; passing the speech signal through a low-order low-pass filter to flatten the signal frequency domain and weaken the signal The influence of the finite word length effect on the signal during the processing; windowing is performed on each speech frame to reduce the signal discontinuity between the start speech frame and the end speech frame; autocorrelation analysis is performed on each speech frame to obtain the autocorrelation The Levsion Durbin algorithm is used to find the LPC coefficients; the LPC coefficients are weighted using a tapered window to obtain the Cepatral coefficients, and the Cepatral coefficients are used as the feature vector of the speech frame. Furthermore, the time-domain Ce
  • steps S13 and S14 perform HMM training for each phoneme system in the preset Chinese vocabulary (fields containing preset text), obtain digitized voice sample values, and perform preprocessing, feature vector extraction, and vector quantization. , Baum-Welch modeling, etc. Obtain the observation sequence of the speech model corresponding to the preset text. When the observation sequence of the speech signal matches the observation sequence of the speech model corresponding to the preset text, the probability calculation is performed on the observation sequence of each speech signal.
  • the maximum likelihood estimation algorithm is used to calculate the maximum probability (that is, the observation sequence described above)
  • the similarity of the observation sequence corresponding to the preset text if the maximum probability is greater than the preset similarity, the preset text corresponding to the observation sequence with the greatest probability of the observation sequence of the speech signal is used as the text information corresponding to the question information.
  • the step S7 of matching each target text with the preset question in the preset question and answer file includes:
  • the above semantic analysis includes the Chinese word segmentation of the text information, and then uses the statistical language model to determine the optimal word segmentation result, and calculates the weight of each term (term) after the word segmentation according to the term-weighting method.
  • the weight of each term extracts the core words in the text;
  • the language model can be an N-Gram model based on HMM, or a language model based on recurrent neural network, such as a state-of-the-art language model; term- Weighting methods such as TF-IDF, Okapi, MI, ATC, LTU, etc.; the more times a term appears in the text, the greater the weight and the more important it is.
  • Each service type corresponds to one or more first preset question and answer files, and the first preset question and answer files are included in the preset question and answer files.
  • matching the target text with the preset question is actually a process of text matching, in which text matching can be divided into a single semantic model, a multiple semantic model, a matching matrix model and a deep-level inter-sentence model.
  • the single semantic model uses a fully connected, CNN or RNN neural network to encode two sentences and then calculates the similarity between the sentences, without considering the local structure of the phrases in the sentences, such as DSSM (Deep Structured Semantic Models); multiple semantic models Interpret sentences from a multi-grain perspective, taking into account the local structure of the sentence, such as MV-LSTM (MultiView Long Short Term Memory); the matching matrix model calculates the similarity between the sentence and the unused word, and then uses the deep network to extract the features, taking the sentence into consideration Interaction between different words, more precise processing of the relationship in the sentence, such as Text Matching as Image Recognition; deep-level inter-sentence model, according to the interaction mechanism such as attention, use a more refined structure to mine the relationship between different words within and between sentences , Such as the state of the art model.
  • DSSM Deep Structured Semantic Models
  • multiple semantic models Interpret sentences from a multi-grain perspective, taking into account the local structure of the sentence, such as MV-LSTM (
  • the similarity between the target text and the question text can be calculated according to the similarity algorithm of the text matching model through any one of the above-mentioned text matching models, and it can be judged whether the similarity reaches the preset threshold, and if so, match Success, otherwise, the match fails.
  • the question information is the text information "What are the benefits of handling endowment insurance?”
  • the target text is "What are the benefits of having endowment insurance?” and "What are the advantages of buying endowment insurance?”
  • Set the preset questions in the question and answer file to match, and calculate the similarity between "what are the benefits of having pension insurance” and “what are the advantages of buying pension insurance", and “what are the benefits of handling pension insurance”
  • the similarity between one or more target texts and the question "What are the advantages of endowment insurance” in the first predetermined question and answer file reaches the preset threshold, the answer corresponding to "What are the advantages of endowment insurance?” What are the benefits of pension insurance?”;
  • multiple target texts may also match multiple question texts in the first preset question and answer file. Although this is a small probability event, if this happens, they will be The answers corresponding to multiple question texts are used as answers to the question information for users' reference.
  • step S7 of matching each target text with the preset question in the preset question and answer file the method includes:
  • S071 Receive preset questions and first answers corresponding to various service types respectively;
  • S072 Write the preset question and first answer corresponding to the first service type in the first preset question and answer file, where the first service type is included in all service types, and the first preset question and answer file is included in all preset questions and answers File.
  • the first preset question and answer file can be configured through the background management system. Specifically, each business person logs into the background management system (part of the intelligent question answering system) through his account, enters an instruction to create a first preset question and answer file, and enters the preset question and first answer of the type of business the business person is responsible for. For example, a business person in charge of basketball business uses his account to log in to the background management system, and the background management system identifies the account of the business person, and according to the preset permissions corresponding to the account (only the editing permissions and browsing permissions related to the basketball business), the displayed The editable Q&A data table corresponding to the preset permissions.
  • the salesperson enters the basketball business-related preset questions and the first answer in the Q&A data table, such as the preset question "What is the jersey number Yao Ming retired in the NBA Rockets",
  • the first answer "No. 11” receives the above-mentioned preset question and first answer through the background management system, and generates the first preset question and answer file corresponding to the basketball business from the question and answer data table according to the confirmation instruction entered by the user.
  • the business staff in charge of the football business can only edit the first question and answer file corresponding to the football business. Due to account permissions, each business person can only see and configure the relevant content of his business type, ensuring data security and avoiding mutual interference, and enabling the intelligent question answering system to support multiple business scenarios without interference.
  • the method includes:
  • S703 Analyze the returned data, obtain a number of second answers with higher relevance in the returned data, and use the plurality of second answers as answers corresponding to the text information.
  • the data crawling technology is used to obtain the answer to the user's question.
  • the above-mentioned data crawling technology is a web crawler, which is a program or script that automatically crawls the information in a specified URL according to certain rules. They are widely used in Internet search engines or other similar websites, and can automatically collect all the information they can access. To obtain or update the content and retrieval methods of these websites.
  • the web crawler analyzes the calling address of the search engine (such as Baidu search) in advance according to the business type in the text information ( Preset URL address), the program sends a search query request carrying text information and obtaining a number of second answers corresponding to the text information to the calling address, and obtains the return data (html code) returned by the calling address, and then parses it through jsoup (java HTML) The processor) parses the above html code to obtain the second answer corresponding to the text information.
  • the search engine such as Baidu search
  • the program sends a search query request carrying text information and obtaining a number of second answers corresponding to the text information to the calling address, and obtains the return data (html code) returned by the calling address, and then parses it through jsoup (java HTML)
  • jsoup java HTML
  • the process of crawling out technology to find the answer corresponding to the question is itself a process of filtering out relevant answers, for example, the answer displayed when searching for the question "What are the benefits of pension insurance" through Baidu search
  • the list itself is obtained after screening by the Baidu search engine, but in order to obtain more accurate answers, the most relevant answer in the answer list is taken as the second answer.
  • the second answer is highly relevant and reliable.
  • the first several answers are for user reference; among them, the high relevance can analyze the relevance of the answer to the question according to the position in the search result list, the number of results viewed, the number of likes, and the number of useful numbers.
  • the answer value (the preset score value corresponding to the position/the position in the result list) * weight 1 + the number of views of the result * weight 2 + the number of likes * weight 3 + useful number * weight 4, and then calculate the result list
  • Each answer has a corresponding value, and the answer with the highest value is used as the second answer.
  • the second answer corresponding to the text information is obtained through data crawling technology, so that the machine has a learning function.
  • the method includes:
  • S706 Based on the user selecting the second answer as a useful option, accumulate the first useful number corresponding to each second answer;
  • the second answer is displayed in the form of a list.
  • the second answer can also be converted into voice output.
  • Each second answer in the above answer list corresponds to an option that selects the second answer as a useful option.
  • the user can select the second answer's useful options for the second answer that they agree with, and accumulate the first useful option of the second answer.
  • the intelligent question answering system to recommend answers to new questions. For example, the user will judge the second answer after viewing the second answer. If the user feels good, he can choose a useful option, or if he feels bad, he can choose a useless option.
  • the highest answer is used as the answer to the subsequent question, that is, when the first useful number of one of the second answers reaches the preset value, the second answer is added to the preset question and answer file as the answer to the corresponding question, thereby enhancing the question and answer
  • the learning ability of the robot makes the question answering robot more intelligent.
  • the method further includes:
  • the question and answer robot receives the text information as in step S703 again, and the text information matches the first text information
  • the first answer corresponding to the first text information in the preset question and answer file is called and returned to the front end
  • the UI displays both useful and useless options for the first answer. Since the answer is time-sensitive, it is judged whether the second useful number is less than the useless number. When the second useful number is less than the useless number, it means that the first answer cannot be recognized by the user. Therefore, delete the first answer and its The corresponding first text message. For example, people in ancient times believed that the sun moved around the earth was right, but today people think this is wrong, so useless numbers will increase, and finally the useless numbers are greater than the second useful number.
  • the AIML-based intelligent question answering device includes:
  • the first obtaining module 1 is used to obtain question information input by a user, and obtain text information according to the question information;
  • the conversion module 2 is used to convert the Chinese in the text information into the same Chinese font to obtain the first text corresponding to the text information, and the Chinese font is simplified or traditional Chinese;
  • the filtering module 3 is used to delete the specified symbols in the first text according to the preset filtering rules to obtain the second text;
  • the word segmentation module 4 is used to perform Chinese word segmentation on the second text according to preset Chinese word segmentation rules to obtain multiple first fields corresponding to the second text;
  • the first matching module 5 is configured to perform synonym matching on each first field to obtain a second field corresponding to each first field;
  • the replacement module 6 is configured to replace the first field corresponding to the second field in the second text according to the second field to obtain multiple target texts;
  • the second matching module 7 is configured to match each target text with the preset question in the preset question and answer file, and the preset question and answer file contains the mapping relationship information between the preset question and the first answer;
  • the second obtaining module 8 is configured to obtain the first answer corresponding to the preset question if the target text is successfully matched with the preset question in the preset question and answer file, so as to use the first answer as the answer to the question information.
  • the access terminal for the above question information may be a question-and-answer dialogue scenario such as WeChat chat, web site online question-and-answer customer service, etc.
  • the question information can be input by the user's voice or manually. If it is a voice input, the voice is converted into text information through a voice-to-text conversion tool.
  • the above question information can include questions of multiple business types.
  • the intelligent question and answer file corresponding to each business type can be configured and maintained through the background management system, that is, each business type has a corresponding table of questions and answers, so the intelligent question answering robot It can support multiple business types of question information at the same time, and are separated from each other without interfering with each other, which overcomes the limitation that an intelligent question answering robot can only support one business type in the past.
  • the above-mentioned Chinese is converted into the same Chinese font, that is, the simplified Chinese in the text information is converted into traditional Chinese or the traditional Chinese in the text information is converted into simplified Chinese.
  • the font is converted. If the preset question in the preset question and answer file is simplified Chinese, and the text information corresponding to the question information input by the Hong Kong user is a traditional font, the simplified Chinese of the text information is converted to traditional Chinese.
  • the principle can be to configure the corresponding relationship configuration file between traditional Chinese and simplified Chinese, or it can be a database table that configures the corresponding relationship between traditional Chinese and simplified Chinese. Because the database table is convenient to add or modify the data in the table flexibly through the background management system interface , So it is preferably a database table, which can be implemented with the help of an open source toolkit, such as zhconverter in Java.
  • the filtering module 3, the word segmentation module 4 and the first matching module 5 above, the above preset filtering rules are the processing rules added to the specified punctuation marks and spaces in the text according to the Chinese language habits. If there are spaces in the text, the spaces will be deleted , If a dash appears in the text, the dash will also be deleted. If the foreigner’s first name and surname are connected with a ⁇ sign, the filter rule will also delete the symbol.
  • the above-mentioned preset Chinese word segmentation rules include simultaneous full segmentation and atomic segmentation of the second text, the segmentation plan to achieve the optimal path according to the hidden Markov model and Viterbi algorithm, and then perform name recognition, system dictionary supplementation, and user self-segmentation.
  • the above Chinese word segmentation is to divide the text information into multiple phrases or fields. Specifically, it can be realized by open source Chinese word segmentation tools, such as Ansj. Ansj supports custom dictionaries. Therefore, users can edit the proprietary vocabulary corresponding to the business type to make AIML support Problem information of different business types and improve the recognition rate of problems, such as the name of a company’s insurance product: eLife Insurance.
  • the above-mentioned synonym matching is to match the above-mentioned Chinese word segmentation to obtain synonyms corresponding to the phrase or field, and the principle is to configure the file or database table according to the correspondence relationship between the Chinese vocabulary and its corresponding synonyms.
  • the replacement of the first field corresponding to the second field according to the second field is replacing a phrase or field in the text information with a corresponding synonym to obtain a new text, including replacing the second field in the
  • the first field of is replaced multiple times to replace one or more fields in the first field with the corresponding second field to obtain multiple target texts.
  • the aforementioned replacement module 6 includes:
  • the replacement unit in the second text, replaces one or more fields in the first field with a corresponding second field to obtain multiple target texts.
  • replacing one or more fields in the first field with the corresponding second field includes replacing one field in all the first fields with a second field corresponding to the field each time, and each time Replace multiple fields in all first fields with a corresponding second field, where when a first field corresponds to multiple second fields, replace the first field with multiple second fields one by one one of the.
  • the text information is "What are the benefits of applying for pension insurance?"
  • the text information is segmented in Chinese, and the first fields such as "application”, “endowment”, “insurance”, “benefits” and “what” are obtained, and then synonyms are matched.
  • the original text information "What are the benefits of handling pension insurance?" will also be used as the target text.
  • the question information entered by the user is in traditional Chinese
  • the returned data is also converted into traditional Chinese and used as the answer to the question information.
  • multiple target texts are obtained, so that AIML can better recognize the text information and improve the question recognition rate and matching rate.
  • the preset question and answer file is a question and answer data table corresponding to multiple service types in the intelligent question and answer database, and the question and answer data table includes a corresponding table of the preset question and the first answer.
  • the above-mentioned preset question and answer files can be configured through the background management system. Specifically, each business person can log in to the back-end management system with his account. After entering the system, they can configure the question and answer data table, the synonym table, the professional vocabulary list of Chinese word segmentation, the correspondence table between traditional and simplified, etc., and the table can be modified or deleted The data.
  • the intelligent question answering system interface which triggers the intelligent question answering system to generate the question and answer data table, the synonym table, the professional vocabulary list of Chinese word segmentation, the correspondence table between traditional and simplified, etc., or update the above table data.
  • each business person can only see and configure the relevant content of his business type to ensure data security and avoid mutual interference.
  • semantic analysis and grammatical analysis are performed on the text information to determine the type of business consulted in the user's question information according to the keywords in the text information.
  • the user may be prompted to select a business type for consultation option.
  • the preset question corresponding to the matching target text in the data table.
  • the match is successful in the second acquisition module 8, it means that there is a preset question that is the same as the question information in the question and answer data table, that is, the first answer corresponding to the preset question is used as the answer to the target text.
  • multiple target texts are based on the same text information, so the meaning of multiple target texts is basically the same, so a first answer is obtained as the answer to the text information, but in multiple target texts There may also be differences in the meaning of the target text, and the first answer that is inconsistent with the answers of other target texts is matched, and different first answers matched by multiple target texts are returned to the front-end interface for the user's reference.
  • the above-mentioned first obtaining module 1 includes:
  • the first acquiring unit is used to acquire the user's voice signal, and the voice signal carries problem information;
  • the processing unit is used to perform voice preprocessing on the voice signal to obtain an observation sequence of the voice signal
  • the detection unit is used to detect whether the similarity between the observation sequence and the observation sequence corresponding to the preset text is greater than the preset similarity
  • the voice signal is composed of the user's voice and environmental noise, and the environmental noise will cause interference to the voice recognition, so the voice signal is voice preprocessed.
  • the above-mentioned voice preprocessing is to divide the voice signal into frames through the VAD (Voice Activity Detection) technology and establish an HMM (Hidden Markov Model) model corresponding to the voice signal.
  • VAD Voice Activity Detection
  • HMM Hidden Markov Model
  • the user’s voice signal is divided into overlapping voice frames according to its period to ensure that the LPC (Linear Predictive Coding) spectrum estimation is relevant from frame to frame; the endpoint detection algorithm is used to find the start and end of the voice, Then look for the intensity and the number of zero-crossing points of each speech frame to calculate the threshold of the energy zero-crossing point value, thereby removing most of the environmental noise; passing the speech signal through a low-order low-pass filter to flatten the signal frequency domain and weaken the signal The influence of the finite word length effect on the signal during the processing; windowing is performed on each speech frame to reduce the signal discontinuity between the start speech frame and the end speech frame; autocorrelation analysis is performed on each speech frame to obtain the autocorrelation The Levsion Durbin algorithm is used to find the LPC coefficients; the LPC coefficients are weighted using a tapered window to obtain the Cepatral coefficients, and the Cepatral coefficients are used as the feature vector of the speech frame. Furthermore, the time-domain Ce
  • the matching unit and as a unit perform HMM training for each phoneme system in the preset Chinese vocabulary (fields containing preset text) to obtain digitized voice sample values, and perform preprocessing, feature vector extraction, and vector quantization After processing and Baum-Welch modeling, the observation sequence of the speech model corresponding to the preset text is obtained. When the observation sequence of the speech signal is matched with the observation sequence of the speech model corresponding to the preset text, the probability calculation is performed on the observation sequence of each speech signal.
  • the maximum likelihood estimation algorithm is used to calculate the maximum probability, (that is, the above observation The similarity between the sequence and the observation sequence corresponding to the preset text), if the maximum probability is greater than the preset similarity, the preset text corresponding to the observation sequence with the greatest probability of the speech signal observation sequence is used as the text information corresponding to the question information.
  • the above-mentioned second matching module includes:
  • the analysis unit is used to perform semantic analysis on the text information to analyze the business type corresponding to the text information;
  • the searching unit is configured to find the first preset question and answer file corresponding to the service type in the preset question and answer file based on the service type corresponding to the text information;
  • the second obtaining unit is used to obtain the similarity between the target text and the question text in the first preset question and answer file
  • the judging unit is used to judge whether the similarity reaches a preset threshold
  • the determining unit is configured to determine that the matching is successful if the similarity reaches the preset threshold, and determine that the matching fails if it does not.
  • the above semantic analysis includes Chinese word segmentation of the text information, and then uses the statistical language model to determine the optimal word segmentation result, and calculates the weight of each term (term) after the word segmentation according to the term-weighting method.
  • the weight of each term extracts the core words in the text;
  • the language model can be the N-Gram model based on HMM, or the language model based on recurrent neural network, such as the state-of-the-art language model; term -weighting methods such as TF-IDF, Okapi, MI, ATC, LTU, etc.; the more times a term appears in the text, the greater the weight and the more important it is.
  • Each service type corresponds to one or more first preset question and answer files, and the first preset question and answer files are included in the preset question and answer files.
  • matching the target text with the preset question is actually a process of text matching.
  • the text matching can be divided into single semantic model, multiple semantic model, matching matrix model and deep-semantic model. Inter-sentence model.
  • the single semantic model uses a fully connected, CNN or RNN neural network to encode two sentences and then calculates the similarity between the sentences, without considering the local structure of the phrases in the sentences, such as DSSM (Deep Structured Semantic Models); multiple semantic models Interpret sentences from a multi-grain perspective, taking into account the local structure of the sentence, such as MV-LSTM (MultiView Long Short Term Memory); the matching matrix model calculates the similarity between the sentence and the unused word, and then uses the deep network to extract the features, taking the sentence into consideration Interaction between different words, more precise processing of the relationship in the sentence, such as Text Matching as Image Recognition; deep-level inter-sentence model, according to the interaction mechanism such as attention, use a more refined structure to mine the relationship between different words within and between sentences , Such as the state of the art model.
  • DSSM Deep Structured Semantic Models
  • multiple semantic models Interpret sentences from a multi-grain perspective, taking into account the local structure of the sentence, such as MV-LSTM (
  • the similarity between the target text and the question text can be calculated according to the similarity algorithm of the text matching model through any one of the above-mentioned text matching models, and it can be judged whether the similarity reaches the preset threshold, and if so, match Success, otherwise, the match fails.
  • the question information is the text information "What are the benefits of handling endowment insurance?”
  • the target text is "What are the benefits of having endowment insurance?” and "What are the advantages of buying endowment insurance?”
  • Set the preset questions in the question and answer file to match, and calculate the similarity between "what are the benefits of having pension insurance” and “what are the advantages of buying pension insurance", and “what are the benefits of handling pension insurance”
  • the similarity between one or more target texts and the question "What are the advantages of endowment insurance” in the first predetermined question and answer file reaches the preset threshold, the answer corresponding to "What are the advantages of endowment insurance?” What are the benefits of pension insurance?”;
  • multiple target texts may also match multiple question texts in the first preset question and answer file. Although this is a small probability event, if this happens, they will be The answers corresponding to multiple question texts are used as answers to the question information for users' reference.
  • the above device further includes:
  • the second receiving module is configured to receive preset questions and first answers corresponding to various service types
  • the writing module is used to write the preset question and the first answer corresponding to the first service type in the first preset question and answer file, where the first service type is included in all service types, and the first preset question and answer file is included in All preset question and answer files.
  • the above-mentioned first preset question and answer file can be configured through the background management system. Specifically, each business person logs into the background management system (part of the intelligent question answering system) through his account, enters an instruction to create a first preset question and answer file, and enters the preset question and first answer of the type of business the business person is responsible for. For example, a business person in charge of basketball business uses his account to log in to the background management system, and the background management system identifies the account of the business person, and according to the preset permissions corresponding to the account (only the editing permissions and browsing permissions related to the basketball business), the displayed The editable Q&A data table corresponding to the preset permissions.
  • the salesperson enters the basketball business-related preset questions and the first answer in the Q&A data table, such as the preset question "What is the jersey number Yao Ming retired in the NBA Rockets",
  • the first answer "No. 11” receives the above-mentioned preset question and first answer through the background management system, and generates the first preset question and answer file corresponding to the basketball business from the question and answer data table according to the confirmation instruction entered by the user.
  • the business staff in charge of the football business can only edit the first question and answer file corresponding to the football business. Due to account permissions, each business person can only see and configure the relevant content of his business type, ensuring data security and avoiding mutual interference, and enabling the intelligent question answering system to support multiple business scenarios without interference.
  • the above device further includes:
  • the query module is used to send a query request to the preset URL address if the target text fails to match the preset question in the preset question and answer file, and the query request carries text information;
  • the first receiving module is used to receive the return data corresponding to the query request
  • the parsing module is used to parse the returned data, obtain a number of second answers with the highest relevance in the returned data, and use the number of second answers as answers corresponding to the text information.
  • the data crawling technology is used to obtain the answer to the user's question.
  • the above-mentioned data crawling technology is a web crawler, which is a program or script that automatically crawls the information in a specified URL according to certain rules. They are widely used in Internet search engines or other similar websites, and can automatically collect all the information they can access. To obtain or update the content and retrieval methods of these websites.
  • the web crawler analyzes the calling address of the search engine (such as Baidu search) in advance according to the business type in the text information ( Preset URL address), the program sends a search query request carrying text information and obtaining a number of second answers corresponding to the text information to the calling address, and obtains the return data (html code) returned by the calling address, and then parses it through jsoup (java HTML) The processor) parses the above html code to obtain the second answer corresponding to the text information.
  • the search engine such as Baidu search
  • the program sends a search query request carrying text information and obtaining a number of second answers corresponding to the text information to the calling address, and obtains the return data (html code) returned by the calling address, and then parses it through jsoup (java HTML)
  • jsoup java HTML
  • the process of crawling out technology to find the answer corresponding to the question is itself a process of filtering out relevant answers, for example, the answer displayed when searching for the question "What are the benefits of pension insurance" through Baidu search
  • the list itself is obtained after screening by the Baidu search engine, but in order to obtain more accurate answers, the most relevant answer in the answer list is taken as the second answer.
  • the second answer is highly relevant and reliable.
  • the first several answers are for user reference; among them, the high relevance can analyze the relevance of the answer to the question according to the position in the search result list, the number of results viewed, the number of likes, and the number of useful numbers.
  • the answer value (the preset score value corresponding to the position/the position in the result list) * weight 1 + the number of views of the result * weight 2 + the number of likes * weight 3 + useful number * weight 4, and then calculate the result list
  • Each answer has a corresponding value, and the answer with the highest value is used as the second answer.
  • the second answer corresponding to the text information is obtained through data crawling technology, so that the machine has a learning function.
  • the above device further includes:
  • the generation module is used to add several second answers to the preset blank list to generate the answer list and save the answer list;
  • the first display module is used for displaying a list of answers and for the user to select each second answer as a useful option
  • the first accumulation module is used to accumulate the first useful number corresponding to each second answer based on the user's selection of the second answer as a useful option;
  • the first judgment module is used to judge whether the first useful number of the second answer reaches a preset value
  • the adding module is used to add the second answer whose first useful number reaches the preset value and the first text information to the preset question and answer file if it is, the first text information is the text information corresponding to the second answer, and as the preset Suppose the preset question in the question and answer file, the second answer whose first useful number reaches the preset value is taken as the first answer in the preset file.
  • the second answer is displayed in the form of a list.
  • the second answer can also be converted into voice output.
  • Each second answer in the above answer list corresponds to an option that selects the second answer as a useful option.
  • the user can select the second answer's useful options for the second answer that they agree with, and accumulate the first useful option of the second answer.
  • the highest answer is used as the answer to the subsequent question, that is, when the first useful number of one of the second answers reaches the preset value, the second answer is added to the preset question and answer file as the answer to the corresponding question, thereby enhancing the question and answer
  • the learning ability of the robot makes the question answering robot more intelligent.
  • the above device further includes:
  • the second display module is configured to display the first answer corresponding to the first text information when the text information matches the first text information, and display options for the user to select whether the first answer is useful or useless;
  • the second accumulation module is used to accumulate the second useful number and useless number corresponding to the first answer based on the user's choice of the first answer as useful or useless option, and the second useful number is accumulated on the basis of the first useful number;
  • the second judgment module is used to judge whether the second useful number is less than the useless number
  • the deleting module is used for deleting the first answer and the first text information corresponding to the first answer in the preset question and answer file if the second useful number is less than the useless number.
  • the Q&A robot again receives the text information parsed by the parsing module, the text information matches the first text information, then the first answer corresponding to the first text information in the preset Q&A file is called and returned to the front-end UI , And both useful and useless options of the first answer are displayed. Since the answer is time-sensitive, it is judged whether the second useful number is less than the useless number. When the second useful number is less than the useless number, it means that the first answer cannot be recognized by the user. Therefore, delete the first answer and its The corresponding first text message. For example, people in ancient times believed that the sun moved around the earth was right, but today people think this is wrong, so useless numbers will increase, and finally the useless numbers are greater than the second useful number.
  • the computer device in an embodiment of the present application includes a memory and an executor.
  • the memory stores a computer program.
  • the executor implements the steps of the AIML-based intelligent question answering method when the computer program is executed.
  • the above-mentioned computer equipment may be a server, and its internal structure may be as shown in FIG. 3.
  • the computer equipment includes a processor, a memory, a network interface and a database connected through a system bus.
  • the computer designed processor is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer readable instructions, and a database.
  • the memory provides an environment for the operation of the operating system and computer readable instructions in the non-volatile storage medium.
  • the database of the computer equipment is used to store data such as tasks, database tables, and tables to be processed.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • FIG. 3 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the storage medium in an embodiment of the present application is a computer-readable storage medium.
  • the computer-readable storage medium may be non-volatile or volatile.
  • a computer program is stored thereon, and the computer program is executed. The steps of the AIML-based intelligent question answering method are implemented when the device is executed.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

本申请涉及人工智能,揭示了一种基于AIML的智能问答方法,方包括获取用户输入的问题信息,并根据问题信息得到文本信息;将文本信息进行字体转换、符号过滤、中文分词、同义词匹配以及文本替换后,得到多个目标文本;将各个目标文本分别与预设问答文件中的预设问题进行匹配;若目标文本与预设问答文件中的预设问题匹配成功,则获取预设问题对应的第一答案,以将第一答案作为问题信息的答案。解决了现有AIML对中文支持不高的问题,实现提高问题识别率和匹配率。

Description

基于AIML的智能问答方法、装置、计算机设备及存储介质
本申请要求于2019年5月23日提交中国专利局,申请号为201910435063.2、发明名称为“基于AIML的智能问答方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及到智能问答机器人技术领域,特别是涉及到基于AIML的智能问答方法、装置、计算机设备及存储介质。
背景技术
AIML(Artificial Intelligence Markup Language,人工智能标记语言)是一种创建自然语言软件代理的XML(eXtensible Markup Language,可扩展标记语言)语言。AIML可以实现用户与问答机器人的交互,但是发明人发现,目前的AIML用于中文对话时存在以下问题:基本没有公开的中文规则库,而规则库相当于对话机器人的“大脑”,一般来说,规则库越丰富,对话机器人的信息处理能力就越像人,则用户与对话机器人的交互体验就越好;AIML解释器对中文支持不好,例如英文输入中会使用空格分隔单词,而中文输入没有以空格分隔的习惯,从而导致问题识别率和匹配率不高。
技术问题
本申请的主要目的为提供一种基于AIML的智能问答方法、装置、计算机设备及存储介质,增加AIML的中文规则库,解决了现有AIML对中文支持不高的问题,实现提高问题识别率和匹配率。
技术解决方案
本申请提出一种基于AIML的智能问答方法,包括:获取用户输入的问题信息,并根据问题信息得到文本信息;将文本信息中的中文转换为同一中文字体,得到文本信息对应的第一文本,中文字体为中文简体或中文繁体;根据预设过滤规则,删除第一文本中的指定符号,得到第二文本;根据预设的中文分词规则,将第二文本进行中文分词,得到第二文本对应的多个第一字段;将各个第一字段分别进行同义词匹配,得到各个第一字段分别对应的第二字段;在第二文本中,根据第二字段替换与第二字段对应的第一字段,得到多个目标文本;将各个目标文本分别与预设问答文件中的预设问题进行匹配,预设问答文件包含预设问题与第一答案的映射关系信息;若目标文本与预设问答文件中的预设问题匹配成功,则获取预设问题对应的第一答案,以将第一答案作为问题信息的答案。
本申请还提出一种基于AIML的智能问答装置,包括:第一获取模块,用于获取用户输入的问题信息,并根据问题信息得到文本信息;转换模块,用于将文本信息中的中文转换为同一中文字体,得到文本信息对应的第一文本,中文字体为中文简体或中文繁体;过滤模块,用于根据预设过滤规则,删除第一文本中的指定符号,得到第二文本;分词模块,用于根据预设的中文分词规则,将第二文本进行中文分词,得到第二文本对应的多个第一字段;第一匹配模块,用于将各个第一字段分别进行同义词匹配,得到各个第一字段分别对应的第二字段;替换模块,用于在所述第二文本中,根据所述第二字段替换与所述第二字段对应的第一字段,得到多个目标文本;第二匹配模块,用于将各个目标文本分别与预设问答文件中的预设问题进行匹配,预设问答文件包含预设问题与第一答案的映射关系信息;第二获取模块,用于若目标文本与预设问答文件中的预设问题匹配成功,则获取预设问题对应的第一答案,以将第一答案作为问题信息的答案。
本申请还提出一种计算机设备,包括存储器和执行器,存储器存储有计算机程序,执行器执行计算机程序时实现上述基于AIML的智能问答方法的步骤。
本申请还提出一种存储介质,其上存储有计算机程序,计算机程序被执行器执行时实现上述基于AIML的智能问答方法的步骤。
有益效果
本申请通过配置智能问答数据库中的问答数据表、同义词表、中文分词的专业词汇表、繁体与简体的对应关系表等,增加中文规则库;将问题信息对应的文本信息进行字体转换、特殊符号过滤、中文分词、同义词匹配、文本替换等归一化处理,增强AIML对中文的支持,使得AIML更好识别文本信息,从而提高问题识别率和匹配率;配置各种业务类型分别对应的预设问答文件,保证数据安全和避免相互干扰,使得AIML同时支持多种业务的问答场景。通过目标文本与预设问题匹配,从而获取问题信息的答案。
附图说明
图1为本申请一实施例中基于AIML的智能问答方法的步骤示意图;
图2为本申请一实施例中基于AIML的智能问答装置的结构示意图;
图3为本申请一实施例中计算机设备的结构示意图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
本发明的最佳实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。参照图1,本申请一实施例中基于AIML的智能问答方法,包括:
S1,获取用户输入的问题信息,并根据问题信息得到文本信息;
S2,将文本信息中的中文转换为同一中文字体,得到文本信息对应的第一文本,中文字体为中文简体或中文繁体;
S3,根据预设过滤规则,删除第一文本中的指定符号,得到第二文本;
S4,根据预设的中文分词规则,将第二文本进行中文分词,得到第二文本对应的多个第一字段;
S5,将各个第一字段分别进行同义词匹配,得到各个第一字段分别对应的第二字段;
S6,在所述第二文本中,根据所述第二字段替换与所述第二字段对应的第一字段,得到多个目标文本;
S7,将各个目标文本分别与预设问答文件中的预设问题进行匹配,预设问答文件包含预设问题与第一答案的映射关系信息;
S8,若目标文本与预设问答文件中的预设问题匹配成功,则获取预设问题对应的第一答案,以将第一答案作为问题信息的答案。
如上述步骤S1,上述问题信息的接入端可以为微信聊天、web站点在线问答客服等问答对话场景。问题信息可以为用户语音输入的或手动输入的,若为语音输入,则通过语音文字转换工具将语音转为文字信息。上述问题信息可包括多种业务类型的问题,其中,可通过后台管理系统,配置及维护各个业务类型对应的智能问答文件,即各个业务类型分别有问题与答案的对应关系表,因此智能问答机器人可以同时支持多种业务类型的问题信息,并且各自分离、互不干扰,克服了以往一台智能问答机器人只能支持一种业务类型的局限性。
如上述步骤S2,上述中文转换为同一中文字体,即为将文本信息中的中文简体转换为中文繁体或将文本信息中的中文繁体转换为中文简体,具体地,根据预设问答文件的中文字体进行转换,如预设问答文件中的预设问题为中文简体,而香港用户输入的问题信息对应的文本信息为繁体字体,则将文本信息的中文简体转换为中文繁体。其原理可以是配置中文繁体与简体之间的对应关系配置文件,也可以是配置中文繁体与简体的对应关系的数据库表,由于数据库表便于通过后台管理系统界面灵活地增加或修改表中的数据,所以优选为数据库表,可借助开源的工具包实现,如Java中zhconverter。
如上述步骤S3至S5,上述预设过滤规则为根据中文用语习惯对文本中的指定标点符号和空格等添加的处理规则,如文本中出现空格则将空格删除,文本中出现破折号也将破折号删除,又如外国人的名字与姓氏采用·号连接,则过滤规则也将该符号删除等。上述预 设中文分词规则包括对第二文本同时进行全切分和原子切分后,根据隐马尔科夫模型和Viterbi算法达到最优路径的分词规划,再进行人名识别、系统词典补充、用户自定义词典补充和词性标注等操作的规则,其中全切分为将文本中的所有词汇分离出来,原子切分为将文本中的所有汉字分离出来。上述中文分词为将文本信息分成多个词组或字段,具体地可通过开源的中文分词工具实现,如Ansj,Ansj支持自定义词典,因此,用户可编辑业务类型对应的专有词汇,使得AIML支持不同业务类型的问题信息和提高问题识别率,如某公司的保险产品名称:e生保。上述同义词匹配为匹配上述中文分词得到词组或字段对应的同义词,其原理是根据中文词汇与其对应同义词之间的对应关系配置文件或数据库表。
如上述步骤S6,上述根据第二字段替换与第二字段对应的第一字段为将文本信息中的词组或字段替换为其对应的一个同义词,从而得到新的文本,包括将第二文本中的第一字段进行多次替换,以将第一字段中的一个或者多个字段替换为对应的第二字段,从而得到多个目标文本。
在一实施例中,上述在第二文本中,根据第二字段替换与第二字段对应的第一字段,得到多个目标文本的步骤S6,包括:
S61,在第二文本中,将第一字段中的一个或者多个字段替换为对应的第二字段,得到多个目标文本。
如上述步骤S61,包括每次将所有第一字段中的一个字段替换成与该字段对应的一个第二字段,以及每次将所有第一字段中的多个字段分别替换成与其对应的一个第二字段,其中当一个第一字段对应有多个第二字段时,将该第一字段逐一替换成对应多个第二字段中的一个。例如,文本信息为“办理养老保险的好处是什么?”,将文本信息进行中文分词,得到“办理”“养老”“保险”“好处”“是什么”等第一字段,再进行同义词匹配,“办理”可匹配为“置备”“购买”等,“养老”可匹配为“退休”等,“好处”可匹配为“优点”“益处”等,因此依次将第一字段替换为第二字段,可得到多个文本。上述将第二文本中的一个第一字段替换为与第一字段对应的一个第二字段,如将“办理”替换为“置备”,得到“置备养老保险的好处是什么”,将“养老”替换为“退休”,可得到“办理退休保险的好处是什么”,进一步地,“办理”对应有两个第二字段,因此还可以将“办理”替换为“购买”,得到“购买养老保险的好处是什么”;上述将第二文本中的多个第一字段分别替换为与第一字段对应的一个第二字段,如将“办理”替换为“购买”、“好处”替换为“优点”,可得到“购买养老保险的优点是什么”,将“办理”替换为“购买”、“好处”替换为“益处”,可得到“购买养老保险的益处是什么”,“办理”还可以替换“置备”等,在此不再一一举例,当然还会将原文本信息“办理养老保险的好处是什么”也作为目标文本。为了提高用户体验,如果用户输入的问题信息是繁体的,那么将返回数据也转为繁体,并作为问题信息的答案。根据一个文本信息得到多个目标文本,以使得AIML更好的识别文本信息,提高问题识别率和匹配率。
如上述步骤S7,上述预设问答文件为智能问答数据库中的多种业务类型分别对应的问答数据表,问答数据表包括预设问题与第一答案的对应关系表。上述预设问答文件可通过后台管理系统进行配置。具体地,各业务人员可用其账号登录后台管理系统,进入系统后,可配置问答数据表、同义词表、中文分词的专业词汇表、繁体与简体的对应关系表等,以及可修改或删除表中的数据。配置完成后,输入确认指令,以调用智能问答系统接口,从而触发智能问答系统生成问答数据表、同义词表、中文分词的专业词汇表、繁体与简体的对应关系表等,或更新上述表中的数据。其中,由于账号权限,各业务人员只能看到和配置其业务类型的相关内容,保证数据安全和避免相互干扰。将多个目标文件分别与预设问答文件进行匹配。具体地,通过对文本信息进行语义分析和语法分析,以根据文本信息中的关键词,判断用户的问题信息中所咨询的业务类型。优选地,为了更加准确地判断出用户咨询的业务类型,可在用户输入问题信息时,提示用户选择一种业务类型进行咨询的选项,如用户选择理财业务的选项,则在理财业务对应的问答数据表中匹配目标文本对应的 预设问题。
如上述步骤S8,若匹配成功,则说明问答数据表中存在与问题信息相同的预设问题,即将预设问题对应的第一答案作为目标文本的答案。进一步地,一般而言,多个目标文本都是基于同一个文本信息的,所以多个目标文本的意思是基本一样的,所以得到一个第一答案作为文本信息的答案,但多个目标文本中也可能存在与目标文本的意思存在差异,且匹配到与其他目标文本的答案不一致的第一答案,则将多个目标文本分别匹配到的不同第一答案均返回至前端界面,供用户参考。
在一实施例中,上述获取用户输入的问题信息,并根据问题信息得到文本信息的步骤S1,包括:
S11,获取用户的语音信号,语音信号携带问题信息;
S12,将语音信号进行语音预处理,得到语音信号的观察序列;
S13,检测观察序列与预设文本对应的观察序列的相似度是否大于预设相似度;
S14,若大于预设相似度,则将预设文本作为问题信息对应的文本信息。
如上述步骤S11和S12,一般情况下,语音信号由用户语音和环境噪声组成,环境噪声对语音识别会造成干扰,因此对语音信号进行语音预处理。上述语音预处理为通过VAD(Voice Activity Detection)技术将语音信号进行分帧以及建立该语音信号对应的HMM(Hidden Markov Model)模型。具体地,将用户的语音信号根据其周期划分为交叠的语音帧,以确保帧到帧之间LPC(Linear Predictive Coding)频谱预估是相关的;通过端点检测算法寻找语音的起点和终点,再寻找强度和每个语音帧过零点的次数,以计算出能量过零点值的门限,从而去除大部分环境噪声;将语音信号经过低阶的低通滤波器使信号频域平坦化,减弱信号处理过程中有限字长效应对信号的影响;对每一语音帧进行加窗处理,减少在开始语音帧和结束语音帧之间的信号间断;对每一语音帧做自相关分析,得到自相关系数,再采用Levsion Durbin算法寻找LPC系数;对LPC系数使用锥形窗进行加权以获得Cepatral系数,将Cepatral系数作为语音帧的特征矢量,进一步地,可对时域Cepatral进行微分以改善语音帧的特征矢量,对特征矢量进行矢量量化后得到语音信号的观察序列。
如上述步骤S13和S14,对预设中文词汇表(包含预设文本的字段)中针对每个音素系统进行HMM训练,获取数字化的语音采样值,并进行预处理、特征矢量提取、矢量量化处理、Baum-Welch建模后等得到预设文本对应的语音模型的观察序列。语音信号的观察序列与预设文本对应的语音模型的观察序列进行匹配时,对每个语音信号的观察序列进行概率计算,优选地,使用最大似然估计算法计算出最大概率(即上述观察序列与预设文本对应的观察序列的相似度),若最大概率大于上述预设相似度,将与语音信号的观察序列存在最大概率的观察序列对应的预设文本作为问题信息对应的文本信息。
在一实施例中,将各个目标文本分别与预设问答文件中的预设问题进行匹配的步骤S7,包括:
S71,将文本信息进行语义分析,以分析出文本信息所对应的业务类型;
S72,基于文本信息对应的业务类型,在预设问答文件中查找出业务类型对应的第一预设问答文件;
S73,获取目标文本与第一预设问答文件中的问题文本的相似度;
S74,判断相似度是否达到预设阈值;
S75,若相似度达到预设阈值,则判定为匹配成功,若未达到,则判定为匹配失败。
如上述步骤S71至S72,上述语义分析包括对文本信息进行中文分词,再运用统计语言模型决定最优的分词结果,根据term-weighting方法对分词后的每个term(术语)计算权重,根据每个term的权重提取文本中的核心词;其中语言模型可以是如基于HMM的N-Gram模型,也可以是基于循环神经网络的语言模型,如state-of-the-art的语言模型;term-weighting方法如TF-IDF、Okapi、MI、ATC、LTU等;文本中一个term出现的次数越多,则权重越大,也越重要。根据提取到的核心词与业务类型的关键词进行匹配,进而分析出 文本信息对应的业务类型。每种业务类型都对应有一个或多个第一预设问答文件,第一预设问答文件包含与预设问答文件中。
如上述步骤S73至S75,目标文本与预设问题进行匹配实际上是文本匹配的过程,其中文本匹配可分为单语义模型、多语义模型、匹配矩阵模型和深层次的句子间模型。单语义模型为采用全连接、CNN类或RNN类的神经网络编码两个句子然后计算句子之间的相似度,未考虑句子中短语的局部结构,如DSSM(Deep Structured Semantic Models);多语义模型从多颗粒的角度解读句子,考虑到句子的局部结构,如MV-LSTM(Multi View Long Short Term Memory);匹配矩阵模型计算句子距不用单词的相似度,再用深度网络提取特征,考虑了句子间不同单词的交互,更精细的处理句子中的关系,如Text Matching as Image Recognition;深层次的句子间模型,根据attention等交互机制,用更精细的结构挖掘句子内和句子间不同单词的关系,如state of the art的模型。具体地,可通过上述几种文本匹配模型中的任意一种,根据文本匹配模型的相似度算法计算目标文本与问题文本的相似度,并判断该相似度是否达到预设阈值,若是,则匹配成功,反之,匹配失败。例如,问题信息为文本信息为“办理养老保险的好处是什么”,则目标文本为“置备养老保险的好处是什么”和“购买养老保险的优点是什么”,将目标文本分别与第一预设问答文件中的预设问题进行匹配,分别计算“置备养老保险的好处是什么”和“购买养老保险的优点是什么”,与“办理养老保险的好处是什么”之间的相似度,如果一个或多个目标文本与第一预设问答文件中的问题“养老保险的优点是什么”的相似度达到预设阈值,则将“养老保险的优点是什么”对应的答案作为问题信息“办理养老保险的好处是什么”的答案;进一步地,多个目标文本也有可能匹配到第一预设问答文件中多个问题文本,虽然这是小概率事件,但是如果出现这种情况,则分别将多个问题文本对应的答案均作为问题信息的答案,供用户参考。
在一实施例中,上述将各个目标文本分别与预设问答文件中的预设问题进行匹配的步骤S7之前,包括:
S071,接收各种业务类型分别对应的预设问题与第一答案;
S072,在第一预设问答文件中写入第一业务类型对应的预设问题与第一答案,其中第一业务类型包含于所有业务类型中,第一预设问答文件包含于所有预设问答文件中。
如上述步骤S071和S072,上述第一预设问答文件可通过后台管理系统进行配置。具体地,各业务人员通过其账号登入后台管理系统(属于智能问答系统的一部分),输入新建第一预设问答文件的指令,并输入业务人员负责的业务类型的预设问题和第一答案。例如,负责篮球业务的业务人员用其账号登入后台管理系统,后台管理系统识别该业务人员的账号,并根据账号对应的预设权限(只有篮球业务相关的编辑权限、浏览权限等),显示的预设权限对应的可编辑问答数据表,该业务人员在问答数据表中输入篮球业务相关的预设问题和第一答案,例如预设问题“姚明在NBA火箭队退役的球衣号码是什么”,第一答案“11号”,通过后台管理系统接收上述预设问题和第一答案,并根据用户输入的确认指令,将问答数据表生成篮球业务对应的第一预设问答文件。同理,负责足球业务的业务人员只能编辑足球业务对应的第一问答文件。由于账号权限,各业务人员只能看到和配置其业务类型的相关内容,保证数据安全和避免相互干扰,以及使得智能问答系统能够支持多种业务场景且各业务场景不被干扰。
在一实施例中,上述将各个目标文本分别与预设问答文件中的预设问题进行匹配的步骤S7之后,包括:
S701,若目标文本与预设问答文件中的预设问题匹配失败,则向预设URL地址发送查询请求,查询请求携带文本信息;
S702,接收查询请求对应的返回数据;
S703,将返回数据进行解析,获取返回数据中相关度靠前的若干个第二答案,并将若干个第二答案作为文本信息对应的答案。
如上述步骤S701至S703,当目标文本与预设问答文件中的预设问题匹配失败,则说明 问答机器人无法根据预设问答文件来回答用户的问题。为使问答机器人能回答用户的问题,则通过数据爬取技术获取用户问题的答案。上述数据爬取技术为网络爬虫,是一种按照一定的规则,自动地抓取指定网址中信息的程序或者脚本,他们被广泛用于互联网搜索引擎或其他类似网站,可以自动采集所有其能够访问到的页面内容,以获取或更新这些网站的内容和检索方式。具体地,当目标文本匹配不到预设问答文件中的预设问题和第一答案时,通过网络爬虫根据文本信息中的业务类型,事先分析搜索引擎(例如百度搜索)搜索时的调用地址(预设URL地址),程序向调用地址模拟发送携带文本信息且获取文本信息对应的若干个第二答案的搜索查询请求,获取调用地址返回的返回数据(html代码),再通过jsoup(java html解析器)解析上述html代码,从而得到文本信息对应的第二答案。在一实施例中,因为爬出技术查找问题对应的答案这个过程,本身就是在筛选出相关答案的过程,例如,通过百度搜索查找“养老保险的好处是什么”这个问题时,显示出来的答案列表,其本身就是百度搜索引擎经过筛选后得到的,但为得到更加准确的答案,所以在将答案列表中相关度最高的答案作为第二答案,优选地,第二答案为相关度高且靠前的若干个答案,以供用户参考;其中,相关度高可根据搜索结果列表中的位置、结果的浏览数、点赞数、有用数的多少来分析问题的答案的相关度,具体地,结果列表中答案的位置越靠前、结果的浏览数越多、点赞数越多或有用数越多,则认为相关度越高。如果结果列表中的位置、结果的浏览数、点赞数、有用数对应的最高相关度答案不同,则以有用数作为主要依据,进一步地,可以对这些相关度的参考因数赋值一定的权重,即答案数值=(位置对应的预设分数值/结果列表中的位置)*权重1+结果的浏览数*权重2+点赞数*权重3+有用数*权重4,进而计算出结果列表中各答案分别对应的数值,以数值最高的答案作为第二答案。进而在当前问答数据表中没有用户咨询的预设问题和第一答案的情况下,通过数据爬取技术实现获取文本信息对应的第二答案,使得机器具有学习功能。
在一实施例中,上述将若干个第二答案作为文本信息对应的答案的步骤S703之后,包括:
S704,将若干个第二答案添加至预设空白列表以生成答案列表,并保存所述答案列表;
S705,显示答案列表和供用户分别选择每个第二答案为有用的选项;
S706,基于用户选择第二答案为有用的选项,累计每个第二答案分别对应的第一有用数;
S707,判断第二答案的第一有用数是否达到预设值;
S708,若是,则将第一有用数达到预设值的第二答案,以及第一文本信息添加至预设问答文件,第一文本信息为第二答案对应的文本信息,以及作为预设问答文件中的预设问题,第一有用数达到预设值的第二答案作为预设文件中的第一答案。
如上述步骤S704至S708,将第二答案以列表的形式显示,当然,还可以将第二答案转为语音输出。上述答案列表中的每个第二答案都对应有一个选择该第二答案为有用的选项,用户可对认同的第二答案选择该第二答案的有用选项,同时累计第二答案的第一有用数,以搜集更多的数据为智能问答系统推荐新问题的答案提供数据基础。例如,用户查看完第二答案后会对第二答案作为评判,如果用户觉得好可以选择有用选项,觉得不好可以选择无用选项,问答机器人会在用户选择后累计一次有用数,并将有用数最高的答案作为后续问题对应的答案,即当其中一个第二答案的第一有用数达到预设值时,则将该第二答案作为对应问题的答案添加至预设问答文件中,从而增强问答机器人的学习能力,使得问答机器人更加智能。
在一实施例中,上述将有用数达到预设值的第二答案,以及第二答案对应的文本信息添加至预设问答文件的步骤S708之后,还包括:
S709,当文本信息与第一文本信息匹配时,显示第一文本信息对应的第一答案,以及显示供用户选择第一答案为有用或无用的选项;
S710,基于用户选择第一答案为有用或无用的选项,累计第一答案对应的第二有用数 和无用数,第二有用数在第一有用数的基础上累计;
S711,判断第二有用数是否小于无用数;
S712,若第二有用数小于无用数,则删除预设问答文件中第一答案及第一答案对应的第一文本信息。
如上述步骤S709至S712,当问答机器人再次接收到如步骤S703的文本信息时,文本信息与第一文本信息匹配,则调取预设问答文件中第一文本信息对应的第一答案返回至前端UI,同时显示该第一答案的有用选项和无用选项。由于答案具有时效性,所以判断第二有用数是否小于无用数,当第二有用数小于无用数,则说明该第一答案已经不能被用户认同,所以删除预设问答文件中第一答案及其对应的第一文本信息。例如古代的人们认为太阳围着地球运动是对的,但是在今天人们认为这是错误的,所以无用数会越来越多,最后无用数大于第二有用数。进一步地,为了快速判断“太阳围着地球运动是对的”的说法是否被人们认同,可以定期(如一个月)统计答案的有用数和无用数,当无用数多于有用数,甚至无用数比有用数多若干值时,则撤销对该答案的推荐,即删除预设问答文件中第一答案及其对应的第一文本信息。
参照图2,在本申请一实施例中基于AIML的智能问答装置,包括:
第一获取模块1,用于获取用户输入的问题信息,并根据问题信息得到文本信息;
转换模块2,用于将文本信息中的中文转换为同一中文字体,得到文本信息对应的第一文本,中文字体为中文简体或中文繁体;
过滤模块3,用于根据预设过滤规则,删除第一文本中的指定符号,得到第二文本;
分词模块4,用于根据预设的中文分词规则,将第二文本进行中文分词,得到第二文本对应的多个第一字段;
第一匹配模块5,用于将各个第一字段分别进行同义词匹配,得到各个第一字段分别对应的第二字段;
替换模块6,用于在所述第二文本中,根据所述第二字段替换与所述第二字段对应的第一字段,得到多个目标文本;
第二匹配模块7,用于将各个目标文本分别与预设问答文件中的预设问题进行匹配,预设问答文件包含预设问题与第一答案的映射关系信息;
第二获取模块8,用于若目标文本与预设问答文件中的预设问题匹配成功,则获取预设问题对应的第一答案,以将第一答案作为问题信息的答案。
如上述第一获取模块1,上述问题信息的接入端可以为微信聊天、web站点在线问答客服等问答对话场景。问题信息可以为用户语音输入的或手动输入的,若为语音输入,则通过语音文字转换工具将语音转为文字信息。上述问题信息可包括多种业务类型的问题,其中,可通过后台管理系统,配置及维护各个业务类型对应的智能问答文件,即各个业务类型分别有问题与答案的对应关系表,因此智能问答机器人可以同时支持多种业务类型的问题信息,并且各自分离、互不干扰,克服了以往一台智能问答机器人只能支持一种业务类型的局限性。
如上述转换模块2,上述中文转换为同一中文字体,即为将文本信息中的中文简体转换为中文繁体或将文本信息中的中文繁体转换为中文简体,具体地,根据预设问答文件的中文字体进行转换,如预设问答文件中的预设问题为中文简体,而香港用户输入的问题信息对应的文本信息为繁体字体,则将文本信息的中文简体转换为中文繁体。其原理可以是配置中文繁体与简体之间的对应关系配置文件,也可以是配置中文繁体与简体的对应关系的数据库表,由于数据库表便于通过后台管理系统界面灵活地增加或修改表中的数据,所以优选为数据库表,可借助开源的工具包实现,如Java中zhconverter。
如上述过滤模块3、分词模块4和第一匹配模块5,上述预设过滤规则为根据中文用语习惯对文本中的指定标点符号和空格等添加的处理规则,如文本中出现空格则将空格删除,文本中出现破折号也将破折号删除,又如外国人的名字与姓氏采用·号连接,则过滤规则 也将该符号删除等。上述预设中文分词规则包括对第二文本同时进行全切分和原子切分后,根据隐马尔科夫模型和Viterbi算法达到最优路径的分词规划,再进行人名识别、系统词典补充、用户自定义词典补充和词性标注等操作的规则,其中全切分为将文本中的所有词汇分离出来,原子切分为将文本中的所有汉字分离出来。上述中文分词为将文本信息分成多个词组或字段,具体地可通过开源的中文分词工具实现,如Ansj,Ansj支持自定义词典,因此,用户可编辑业务类型对应的专有词汇,使得AIML支持不同业务类型的问题信息和提高问题识别率,如某公司的保险产品名称:e生保。上述同义词匹配为匹配上述中文分词得到词组或字段对应的同义词,其原理是根据中文词汇与其对应同义词之间的对应关系配置文件或数据库表。
如上述替换模块6,上述根据第二字段替换与第二字段对应的第一字段为将文本信息中的词组或字段替换为其对应的一个同义词,从而得到新的文本,包括将第二文本中的第一字段进行多次替换,以将第一字段中的一个或者多个字段替换为对应的第二字段,从而得到多个目标文本。
在一实施例中,上述替换模块6,包括:
替换单元,在第二文本中,将第一字段中的一个或者多个字段替换为对应的第二字段,得到多个目标文本。
如上述替换单元,将第一字段中的一个或者多个字段替换为对应的第二字段包括每次将所有第一字段中的一个字段替换成与该字段对应的一个第二字段,以及每次将所有第一字段中的多个字段分别替换成与其对应的一个第二字段,其中当一个第一字段对应有多个第二字段时,将该第一字段逐一替换成对应多个第二字段中的一个。例如,文本信息为“办理养老保险的好处是什么?”,将文本信息进行中文分词,得到“办理”“养老”“保险”“好处”“是什么”等第一字段,再进行同义词匹配,“办理”可匹配为“置备”“购买”等,“养老”可匹配为“退休”等,“好处”可匹配为“优点”“益处”等,因此依次将第一字段替换为第二字段,可得到多个文本。上述将第二文本中的一个第一字段替换为与第一字段对应的一个第二字段,如将“办理”替换为“置备”,得到置备养老保险的好处是什么”,将“养老”替换为“退休”,可得到“办理退休保险的好处是什么”,进一步地,“办理”对应有两个第二字段,因此还可以将“办理”替换为“购买”,得到“购买养老保险的好处是什么”;上述将第二文本中的多个第一字段分别替换为与第一字段对应的一个第二字段,如将“办理”替换为“购买”、“好处”替换为“优点”,可得到“购买养老保险的优点是什么”,将“办理”替换为“购买”、“好处”替换为“益处”,可得到“购买养老保险的益处是什么”,“办理”还可以替换“置备”等,在此不再一一举例,当然还会将原文本信息“办理养老保险的好处是什么”也作为目标文本。为了提高用户体验,如果用户输入的问题信息是繁体的,那么将返回数据也转为繁体,并作为问题信息的答案。根据一个文本信息得到多个目标文本,以使得AIML更好的识别文本信息,提高问题识别率和匹配率。
如上述第二匹配模块7,上述预设问答文件为智能问答数据库中的多种业务类型分别对应的问答数据表,问答数据表包括预设问题与第一答案的对应关系表。上述预设问答文件可通过后台管理系统进行配置。具体地,各业务人员可用其账号登录后台管理系统,进入系统后,可配置问答数据表、同义词表、中文分词的专业词汇表、繁体与简体的对应关系表等,以及可修改或删除表中的数据。配置完成后,输入确认指令,以调用智能问答系统接口,从而触发智能问答系统生成问答数据表、同义词表、中文分词的专业词汇表、繁体与简体的对应关系表等,或更新上述表中的数据。其中,由于账号权限,各业务人员只能看到和配置其业务类型的相关内容,保证数据安全和避免相互干扰。将多个目标文件分别与预设问答文件进行匹配。具体地,通过对文本信息进行语义分析和语法分析,以根据文本信息中的关键词,判断用户的问题信息中所咨询的业务类型。优选地,为了更加准确地判断出用户咨询的业务类型,可在用户输入问题信息时,提示用户选择一种业务类型进行咨询的选项,如用户选择理财业务的选项,则在理财业务对应的问答数据表中匹配目标 文本对应的预设问题。
如上述第二获取模块8,若匹配成功,则说明问答数据表中存在与问题信息相同的预设问题,即将预设问题对应的第一答案作为目标文本的答案。进一步地,一般而言,多个目标文本都是基于同一个文本信息的,所以多个目标文本的意思是基本一样的,所以得到一个第一答案作为文本信息的答案,但多个目标文本中也可能存在与目标文本的意思存在差异,且匹配到与其他目标文本的答案不一致的第一答案,则将多个目标文本分别匹配到的不同第一答案均返回至前端界面,供用户参考。
在一实施例中,上述第一获取模块1,包括:
第一获取单元,用于获取用户的语音信号,语音信号携带问题信息;
处理单元,用于将语音信号进行语音预处理,得到语音信号的观察序列;
检测单元,用于检测观察序列与预设文本对应的观察序列的相似度是否大于预设相似度;
作为单元,用于若匹配成功,则将预设文本作为问题信息对应的文本信息。
如上述第一获取单元和处理单元,一般情况下,语音信号由用户语音和环境噪声组成,环境噪声对语音识别会造成干扰,因此对语音信号进行语音预处理。上述语音预处理为通过VAD(Voice Activity Detection)技术将语音信号进行分帧以及建立该语音信号对应的HMM(Hidden Markov Model)模型。具体地,将用户的语音信号根据其周期划分为交叠的语音帧,以确保帧到帧之间LPC(Linear Predictive Coding)频谱预估是相关的;通过端点检测算法寻找语音的起点和终点,再寻找强度和每个语音帧过零点的次数,以计算出能量过零点值的门限,从而去除大部分环境噪声;将语音信号经过低阶的低通滤波器使信号频域平坦化,减弱信号处理过程中有限字长效应对信号的影响;对每一语音帧进行加窗处理,减少在开始语音帧和结束语音帧之间的信号间断;对每一语音帧做自相关分析,得到自相关系数,再采用Levsion Durbin算法寻找LPC系数;对LPC系数使用锥形窗进行加权以获得Cepatral系数,将Cepatral系数作为语音帧的特征矢量,进一步地,可对时域Cepatral进行微分以改善语音帧的特征矢量,对特征矢量进行矢量量化后得到语音信号的观察序列。
如上述匹配单元和作为单元,对预设中文词汇表(包含预设文本的字段)中针对每个音素系统进行HMM训练,获取数字化的语音采样值,并进行预处理、特征矢量提取、矢量量化处理、Baum-Welch建模后等得到预设文本对应的语音模型的观察序列。语音信号的观察序列与预设文本对应的语音模型的观察序列进行匹配时,对每个语音信号的观察序列进行概率计算,优选地,使用最大似然估计算法计算出最大概率,(即上述观察序列与预设文本对应的观察序列的相似度),若最大概率大于上述预设相似度,将与语音信号的观察序列存在最大概率的观察序列对应的预设文本作为问题信息对应的文本信息。
在一实施例中,上述第二匹配模块,包括:
分析单元,用于将文本信息进行语义分析,以分析出文本信息所对应的业务类型;
查找单元,用于基于文本信息对应的业务类型,在预设问答文件中查找出业务类型对应的第一预设问答文件;
第二获取单元,用于获取目标文本与第一预设问答文件中的问题文本的相似度;
判断单元,用于判断相似度是否达到预设阈值;
判定单元,用于若相似度达到预设阈值,则判定为匹配成功,若未达到,则判定为匹配失败。
如上述分析单元和查找单元,上述语义分析包括对文本信息进行中文分词,再运用统计语言模型决定最优的分词结果,根据term-weighting方法对分词后的每个term(术语)计算权重,根据每个term的权重提取文本中的核心词;其中语言模型可以是如基于HMM的N-Gram模型,也可以是基于循环神经网络的语言模型,如state-of-the-art的语言模型;term-weighting方法如TF-IDF、Okapi、MI、ATC、LTU等;文本中一个term出现的次数越多,则权重越大,也越重要。根据提取到的核心词与业务类型的关键词进行匹配,进而分析出 文本信息对应的业务类型。每种业务类型都对应有一个或多个第一预设问答文件,第一预设问答文件包含与预设问答文件中。
如上述第二获取单元、判断单元和判定单元,目标文本与预设问题进行匹配实际上是文本匹配的过程,其中文本匹配可分为单语义模型、多语义模型、匹配矩阵模型和深层次的句子间模型。单语义模型为采用全连接、CNN类或RNN类的神经网络编码两个句子然后计算句子之间的相似度,未考虑句子中短语的局部结构,如DSSM(Deep Structured Semantic Models);多语义模型从多颗粒的角度解读句子,考虑到句子的局部结构,如MV-LSTM(Multi View Long Short Term Memory);匹配矩阵模型计算句子距不用单词的相似度,再用深度网络提取特征,考虑了句子间不同单词的交互,更精细的处理句子中的关系,如Text Matching as Image Recognition;深层次的句子间模型,根据attention等交互机制,用更精细的结构挖掘句子内和句子间不同单词的关系,如state of the art的模型。具体地,可通过上述几种文本匹配模型中的任意一种,根据文本匹配模型的相似度算法计算目标文本与问题文本的相似度,并判断该相似度是否达到预设阈值,若是,则匹配成功,反之,匹配失败。例如,问题信息为文本信息为“办理养老保险的好处是什么”,则目标文本为“置备养老保险的好处是什么”和“购买养老保险的优点是什么”,将目标文本分别与第一预设问答文件中的预设问题进行匹配,分别计算“置备养老保险的好处是什么”和“购买养老保险的优点是什么”,与“办理养老保险的好处是什么”之间的相似度,如果一个或多个目标文本与第一预设问答文件中的问题“养老保险的优点是什么”的相似度达到预设阈值,则将“养老保险的优点是什么”对应的答案作为问题信息“办理养老保险的好处是什么”的答案;进一步地,多个目标文本也有可能匹配到第一预设问答文件中多个问题文本,虽然这是小概率事件,但是如果出现这种情况,则分别将多个问题文本对应的答案均作为问题信息的答案,供用户参考。
在一实施例中,上述装置还包括:
第二接收模块,用于接收各种业务类型分别对应的预设问题与第一答案;
写入模块,用于在第一预设问答文件中写入第一业务类型对应的预设问题与第一答案,其中第一业务类型包含于所有业务类型中,第一预设问答文件包含于所有预设问答文件中。
如上述第二接收模块和写入模块,上述第一预设问答文件可通过后台管理系统进行配置。具体地,各业务人员通过其账号登入后台管理系统(属于智能问答系统的一部分),输入新建第一预设问答文件的指令,并输入业务人员负责的业务类型的预设问题和第一答案。例如,负责篮球业务的业务人员用其账号登入后台管理系统,后台管理系统识别该业务人员的账号,并根据账号对应的预设权限(只有篮球业务相关的编辑权限、浏览权限等),显示的预设权限对应的可编辑问答数据表,该业务人员在问答数据表中输入篮球业务相关的预设问题和第一答案,例如预设问题“姚明在NBA火箭队退役的球衣号码是什么”,第一答案“11号”,通过后台管理系统接收上述预设问题和第一答案,并根据用户输入的确认指令,将问答数据表生成篮球业务对应的第一预设问答文件。同理,负责足球业务的业务人员只能编辑足球业务对应的第一问答文件。由于账号权限,各业务人员只能看到和配置其业务类型的相关内容,保证数据安全和避免相互干扰,以及使得智能问答系统能够支持多种业务场景且各业务场景不被干扰。
在一实施例中,上述装置还包括:
查询模块,用于若目标文本与预设问答文件中的预设问题匹配失败,则向预设URL地址发送查询请求,查询请求携带文本信息;
第一接收模块,用于接收查询请求对应的返回数据;
解析模块,用于将返回数据进行解析,获取返回数据中相关度靠前的若干个第二答案,并将若干个第二答案作为文本信息对应的答案。
如上述查询模块、第一接收模块和解析模块,当目标文本与预设问答文件中的预设问题匹配失败,则说明问答机器人无法根据预设问答文件来回答用户的问题。为使问答机器 人能回答用户的问题,则通过数据爬取技术获取用户问题的答案。上述数据爬取技术为网络爬虫,是一种按照一定的规则,自动地抓取指定网址中信息的程序或者脚本,他们被广泛用于互联网搜索引擎或其他类似网站,可以自动采集所有其能够访问到的页面内容,以获取或更新这些网站的内容和检索方式。具体地,当目标文本匹配不到预设问答文件中的预设问题和第一答案时,通过网络爬虫根据文本信息中的业务类型,事先分析搜索引擎(例如百度搜索)搜索时的调用地址(预设URL地址),程序向调用地址模拟发送携带文本信息且获取文本信息对应的若干个第二答案的搜索查询请求,获取调用地址返回的返回数据(html代码),再通过jsoup(java html解析器)解析上述html代码,从而得到文本信息对应的第二答案。在一实施例中,因为爬出技术查找问题对应的答案这个过程,本身就是在筛选出相关答案的过程,例如,通过百度搜索查找“养老保险的好处是什么”这个问题时,显示出来的答案列表,其本身就是百度搜索引擎经过筛选后得到的,但为得到更加准确的答案,所以在将答案列表中相关度最高的答案作为第二答案,优选地,第二答案为相关度高且靠前的若干个答案,以供用户参考;其中,相关度高可根据搜索结果列表中的位置、结果的浏览数、点赞数、有用数的多少来分析问题的答案的相关度,具体地,结果列表中答案的位置越靠前、结果的浏览数越多、点赞数越多或有用数越多,则认为相关度越高。如果结果列表中的位置、结果的浏览数、点赞数、有用数对应的最高相关度答案不同,则以有用数作为主要依据,进一步地,可以对这些相关度的参考因数赋值一定的权重,即答案数值=(位置对应的预设分数值/结果列表中的位置)*权重1+结果的浏览数*权重2+点赞数*权重3+有用数*权重4,进而计算出结果列表中各答案分别对应的数值,以数值最高的答案作为第二答案。进而在当前问答数据表中没有用户咨询的预设问题和第一答案的情况下,通过数据爬取技术实现获取文本信息对应的第二答案,使得机器具有学习功能。
在一实施例中,上述装置还包括:
生成模块,用于将若干个第二答案添加至预设空白列表以生成答案列表,并保存答案列表;
第一显示模块,用于显示答案列表和供用户分别选择每个第二答案为有用的选项;
第一累计模块,用于基于用户选择第二答案为有用的选项,累计每个第二答案分别对应的第一有用数;
第一判断模块,用于判断第二答案的第一有用数是否达到预设值;
添加模块,用于若是,则将第一有用数达到预设值的第二答案,以及第一文本信息添加至预设问答文件,第一文本信息为第二答案对应的文本信息,以及作为预设问答文件中的预设问题,第一有用数达到预设值的第二答案作为预设文件中的第一答案。
如上述装置,将第二答案以列表的形式显示,当然,还可以将第二答案转为语音输出。上述答案列表中的每个第二答案都对应有一个选择该第二答案为有用的选项,用户可对认同的第二答案选择该第二答案的有用选项,同时累计第二答案的第一有用数,以搜集更多的数据为智能问答系统推荐新问题的答案提供数据基础。例如,用户查看完第二答案后会对第二答案作为评判,如果用户觉得好可以选择有用选项,觉得不好可以选择无用选项,问答机器人会在用户选择后累计一次有用数,并将有用数最高的答案作为后续问题对应的答案,即当其中一个第二答案的第一有用数达到预设值时,则将该第二答案作为对应问题的答案添加至预设问答文件中,从而增强问答机器人的学习能力,使得问答机器人更加智能。
在一实施例中,上述装置还包括:
第二显示模块,用于当文本信息与第一文本信息匹配时,显示第一文本信息对应的第一答案,以及显示供用户选择第一答案为有用或无用的选项;
第二累计模块,用于基于用户选择第一答案为有用或无用的选项,累计第一答案对应的第二有用数和无用数,第二有用数在第一有用数的基础上累计;
第二判断模块,用于判断第二有用数是否小于无用数;
删除模块,用于若第二有用数小于无用数,则删除预设问答文件中第一答案及第一答案对应的第一文本信息。
如上述装置,当问答机器人再次接收到如解析模块解析得到的文本信息时,文本信息与第一文本信息匹配,则调取预设问答文件中第一文本信息对应的第一答案返回至前端UI,同时显示该第一答案的有用选项和无用选项。由于答案具有时效性,所以判断第二有用数是否小于无用数,当第二有用数小于无用数,则说明该第一答案已经不能被用户认同,所以删除预设问答文件中第一答案及其对应的第一文本信息。例如古代的人们认为太阳围着地球运动是对的,但是在今天人们认为这是错误的,所以无用数会越来越多,最后无用数大于第二有用数。进一步地,为了快速判断“太阳围着地球运动是对的”的说法是否被人们认同,可以定期(如一个月)统计答案的有用数和无用数,当无用数多于有用数,甚至无用数比有用数多若干值时,则撤销对该答案的推荐,即删除预设问答文件中第一答案及其对应的第一文本信息。
本申请一实施例中计算机设备,包括存储器和执行器,存储器存储有计算机程序,执行器执行计算机程序时实现上述基于AIML的智能问答方法的步骤。
上述计算机设备可以是服务器,其内部结构可以如图3所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设计的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储任务、数据库表和待处理表等数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令在执行时,执行如上述各方法的实施例的流程。本领域技术人员可以理解,图3中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定。
本申请一实施例中存储介质,其为计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性,其上存储有计算机程序,所述计算机程序被执行器执行时实现上述基于AIML的智能问答方法的步骤。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。
以上所述仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (21)

  1. 一种基于AIML的智能问答方法,所述AIML为人工智能标记语言,其中,包括:
    获取用户输入的问题信息,并根据所述问题信息得到文本信息;
    将所述文本信息中的中文转换为同一中文字体,得到所述文本信息对应的第一文本,所述中文字体为中文简体或中文繁体;
    根据预设过滤规则,删除所述第一文本中的指定符号,得到第二文本;
    根据预设的中文分词规则,将所述第二文本进行中文分词,得到所述第二文本对应的多个第一字段;
    将各个所述第一字段分别进行同义词匹配,得到各个所述第一字段分别对应的第二字段;
    在所述第二文本中,根据所述第二字段替换与所述第二字段对应的第一字段,得到多个所述目标文本;
    将各个所述目标文本分别与预设问答文件中的预设问题进行匹配,所述预设问答文件包含所述预设问题与第一答案的映射关系信息;
    若所述目标文本与预设问答文件中的预设问题匹配成功,则获取所述预设问题对应的第一答案,以将所述第一答案作为所述问题信息的答案。
  2. 根据权利要求1所述的基于AIML的智能问答方法,其中,所述获取用户输入的问题信息,并根据所述问题信息得到文本信息的步骤,包括:
    获取用户的语音信号,所述语音信号携带所述问题信息;
    将所述语音信号进行语音预处理,得到所述语音信号的观察序列;
    检测所述观察序列与预设文本对应的观察序列的相似度是否大于预设相似度;
    若大于预设相似度,则将所述预设文本作为所述问题信息对应的所述文本信息。
  3. 根据权利要求1所述的基于AIML的智能问答方法,其中,所述将各个所述目标文本分别与预设问答文件中的预设问题进行匹配的步骤,包括:
    将所述文本信息进行语义分析,以分析出所述文本信息所对应的业务类型;
    基于所述文本信息对应的业务类型,在所述预设问答文件中查找出所述业务类型对应的第一预设问答文件;
    获取所述目标文本与所述第一预设问答文件中的问题文本的相似度;
    判断所述相似度是否达到预设阈值;
    若所述相似度达到预设阈值,则判定为匹配成功,若未达到,则判定为匹配失败。
  4. 根据权利要求1所述的基于AIML的智能问答方法,其中,所述在所述第二文本中,根据所述第二字段替换与所述第二字段对应的第一字段,得到多个所述目标文本的步骤,包括:
    在所述第二文本中,将所述第一字段中的一个或者多个字段替换为对应的第二字段,得到多个所述目标文本。
  5. 根据权利要求1所述的基于AIML的智能问答方法,其中,所述将各个所述目标文本分别与预设问答文件中的预设问题进行匹配的步骤之后,包括:
    若所述目标文本与预设问答文件中的预设问题匹配失败,则向预设URL地址发送查询请求,所述查询请求携带所述文本信息;
    接收所述查询请求对应的返回数据;
    将所述返回数据进行解析,获取所述返回数据中相关度靠前的若干个第二答案,并将若干个所述第二答案作为所述文本信息对应的答案。
  6. 根据权利要求5所述的基于AIML的智能问答方法,其中,所述将若干个所述第二答案作为所述文本信息对应的答案的步骤之后,包括:
    将若干个所述第二答案添加至预设空白列表以生成答案列表,并保存所述答案列表;
    显示所述答案列表和供用户分别选择每个所述第二答案为有用的选项;
    基于用户选择所述第二答案为有用的选项,累计每个第二答案分别对应的第一有用数;
    判断所述第二答案的第一有用数是否达到预设值;
    若是,则将所述第一有用数达到预设值的所述第二答案,以及第一文本信息添加至所述预设问答文件,所述第一文本信息为所述第二答案对应的所述文本信息,以及作为所述预设问答文件中的预设问题,所述第一有用数达到预设值的所述第二答案作为所述预设文件中的第一答案。
  7. 根据权利要求6所述的基于AIML的智能问答方法,其中,所述将所述有用数达到预设值的所述第二答案,以及所述第二答案对应的文本信息添加至所述预设问答文件的步骤之后,还包括:
    当所述文本信息与所述第一文本信息匹配时,显示所述第一文本信息对应的第一答案,以及显示供用户选择所述第一答案为有用或无用的选项;
    基于用户选择所述第一答案为有用或无用的选项,累计所述第一答案对应的第二有用数和无用数,所述第二有用数在所述第一有用数的基础上累计;
    判断所述第二有用数是否小于所述无用数;
    若所述第二有用数小于无用数,则删除所述预设问答文件中所述第一答案及第一答案对应的第一文本信息。
  8. 一种计算机设备,包括存储器和执行器,所述存储器存储有计算机程序,其中,所述执行器执行所述计算机程序时实现如下步骤:
    获取用户输入的问题信息,并根据所述问题信息得到文本信息;
    将所述文本信息中的中文转换为同一中文字体,得到所述文本信息对应的第一文本,所述中文字体为中文简体或中文繁体;
    根据预设过滤规则,删除所述第一文本中的指定符号,得到第二文本;
    根据预设的中文分词规则,将所述第二文本进行中文分词,得到所述第二文本对应的多个第一字段;
    将各个所述第一字段分别进行同义词匹配,得到各个所述第一字段分别对应的第二字段;
    在所述第二文本中,根据所述第二字段替换与所述第二字段对应的第一字段,得到多个所述目标文本;
    将各个所述目标文本分别与预设问答文件中的预设问题进行匹配,所述预设问答文件包含所述预设问题与第一答案的映射关系信息;
    若所述目标文本与预设问答文件中的预设问题匹配成功,则获取所述预设问题对应的第一答案,以将所述第一答案作为所述问题信息的答案。
  9. 根据权利要求8所述的计算机设备,其中,所述获取用户输入的问题信息,并根据所述问题信息得到文本信息的步骤,包括:
    获取用户的语音信号,所述语音信号携带所述问题信息;
    将所述语音信号进行语音预处理,得到所述语音信号的观察序列;
    检测所述观察序列与预设文本对应的观察序列的相似度是否大于预设相似度;
    若大于预设相似度,则将所述预设文本作为所述问题信息对应的所述文本信息。
  10. 根据权利要求8所述的计算机设备,其中,所述将各个所述目标文本分别与预设问答文件中的预设问题进行匹配的步骤,包括:
    将所述文本信息进行语义分析,以分析出所述文本信息所对应的业务类型;
    基于所述文本信息对应的业务类型,在所述预设问答文件中查找出所述业务类型对应的第一预设问答文件;
    获取所述目标文本与所述第一预设问答文件中的问题文本的相似度;
    判断所述相似度是否达到预设阈值;
    若所述相似度达到预设阈值,则判定为匹配成功,若未达到,则判定为匹配失败。
  11. 根据权利要求8所述的计算机设备,其中,所述在所述第二文本中,根据所述第二字段替换与所述第二字段对应的第一字段,得到多个所述目标文本的步骤,包括:
    在所述第二文本中,将所述第一字段中的一个或者多个字段替换为对应的第二字段,得到多个所述目标文本。
  12. 根据权利要求8所述的计算机设备,其中,所述将各个所述目标文本分别与预设问答文件中的预设问题进行匹配的步骤之后,包括:
    若所述目标文本与预设问答文件中的预设问题匹配失败,则向预设URL地址发送查询请求,所述查询请求携带所述文本信息;
    接收所述查询请求对应的返回数据;
    将所述返回数据进行解析,获取所述返回数据中相关度靠前的若干个第二答案,并将若干个所述第二答案作为所述文本信息对应的答案。
  13. 根据权利要求12所述的计算机设备,其中,所述将若干个所述第二答案作为所述文本信息对应的答案的步骤之后,包括:
    将若干个所述第二答案添加至预设空白列表以生成答案列表,并保存所述答案列表;
    显示所述答案列表和供用户分别选择每个所述第二答案为有用的选项;
    基于用户选择所述第二答案为有用的选项,累计每个第二答案分别对应的第一有用数;
    判断所述第二答案的第一有用数是否达到预设值;
    若是,则将所述第一有用数达到预设值的所述第二答案,以及第一文本信息添加至所述预设问答文件,所述第一文本信息为所述第二答案对应的所述文本信息,以及作为所述预设问答文件中的预设问题,所述第一有用数达到预设值的所述第二答案作为所述预设文件中的第一答案。
  14. 根据权利要求13所述的计算机设备,其中,所述将所述有用数达到预设值的所述第二答案,以及所述第二答案对应的文本信息添加至所述预设问答文件的步骤之后,还包括:
    当所述文本信息与所述第一文本信息匹配时,显示所述第一文本信息对应的第一答案,以及显示供用户选择所述第一答案为有用或无用的选项;
    基于用户选择所述第一答案为有用或无用的选项,累计所述第一答案对应的第二有用数和无用数,所述第二有用数在所述第一有用数的基础上累计;
    判断所述第二有用数是否小于所述无用数;
    若所述第二有用数小于无用数,则删除所述预设问答文件中所述第一答案及第一答案对应的第一文本信息。
  15. 一种计算机存储介质,其上存储有计算机程序,其中,所述计算机程序被执行器执行时实现如下步骤:
    获取用户输入的问题信息,并根据所述问题信息得到文本信息;
    将所述文本信息中的中文转换为同一中文字体,得到所述文本信息对应的第一文本,所述中文字体为中文简体或中文繁体;
    根据预设过滤规则,删除所述第一文本中的指定符号,得到第二文本;
    根据预设的中文分词规则,将所述第二文本进行中文分词,得到所述第二文本对应的多个第一字段;
    将各个所述第一字段分别进行同义词匹配,得到各个所述第一字段分别对应的第二字段;
    在所述第二文本中,根据所述第二字段替换与所述第二字段对应的第一字段,得到多个所述目标文本;
    将各个所述目标文本分别与预设问答文件中的预设问题进行匹配,所述预设问答文件包含所述预设问题与第一答案的映射关系信息;
    若所述目标文本与预设问答文件中的预设问题匹配成功,则获取所述预设问题对应的第一答案,以将所述第一答案作为所述问题信息的答案。
  16. 根据权利要求15所述的计算机存储介质,其中,所述获取用户输入的问题信息,并根据所述问题信息得到文本信息的步骤,包括:
    获取用户的语音信号,所述语音信号携带所述问题信息;
    将所述语音信号进行语音预处理,得到所述语音信号的观察序列;
    检测所述观察序列与预设文本对应的观察序列的相似度是否大于预设相似度;
    若大于预设相似度,则将所述预设文本作为所述问题信息对应的所述文本信息。
  17. 根据权利要求15所述的计算机存储介质,其中,所述将各个所述目标文本分别与预设问答文件中的预设问题进行匹配的步骤,包括:
    将所述文本信息进行语义分析,以分析出所述文本信息所对应的业务类型;
    基于所述文本信息对应的业务类型,在所述预设问答文件中查找出所述业务类型对应的第一预设问答文件;
    获取所述目标文本与所述第一预设问答文件中的问题文本的相似度;
    判断所述相似度是否达到预设阈值;
    若所述相似度达到预设阈值,则判定为匹配成功,若未达到,则判定为匹配失败。
  18. 根据权利要求15所述的计算机存储介质,其中,所述在所述第二文本中,根据所述第二字段替换与所述第二字段对应的第一字段,得到多个所述目标文本的步骤,包括:
    在所述第二文本中,将所述第一字段中的一个或者多个字段替换为对应的第二字段,得到多个所述目标文本。
  19. 根据权利要求15所述的计算机存储介质,其中,所述将各个所述目标文本分别与预设问答文件中的预设问题进行匹配的步骤之后,包括:
    若所述目标文本与预设问答文件中的预设问题匹配失败,则向预设URL地址发送查询请求,所述查询请求携带所述文本信息;
    接收所述查询请求对应的返回数据;
    将所述返回数据进行解析,获取所述返回数据中相关度靠前的若干个第二答案,并将若干个所述第二答案作为所述文本信息对应的答案。
  20. 根据权利要求19所述的计算机存储介质,其中,所述将若干个所述第二答案作为所述文本信息对应的答案的步骤之后,包括:
    将若干个所述第二答案添加至预设空白列表以生成答案列表,并保存所述答案列表;
    显示所述答案列表和供用户分别选择每个所述第二答案为有用的选项;
    基于用户选择所述第二答案为有用的选项,累计每个第二答案分别对应的第一有用数;
    判断所述第二答案的第一有用数是否达到预设值;
    若是,则将所述第一有用数达到预设值的所述第二答案,以及第一文本信息添加至所述预设问答文件,所述第一文本信息为所述第二答案对应的所述文本信息,以及作为所述预设问答文件中的预设问题,所述第一有用数达到预设值的所述第二答案作为所述预设文件中的第一答案。
  21. 根据权利要求20所述的计算机存储介质,其中,所述将所述有用数达到预设值的所述第二答案,以及所述第二答案对应的文本信息添加至所述预设问答文件的步骤之后,还包括:
    当所述文本信息与所述第一文本信息匹配时,显示所述第一文本信息对应的第一答案,以及显示供用户选择所述第一答案为有用或无用的选项;
    基于用户选择所述第一答案为有用或无用的选项,累计所述第一答案对应的第二有用数和无用数,所述第二有用数在所述第一有用数的基础上累计;
    判断所述第二有用数是否小于所述无用数;
    若所述第二有用数小于无用数,则删除所述预设问答文件中所述第一答案及第一答案 对应的第一文本信息。
PCT/CN2020/088052 2019-05-23 2020-04-30 基于aiml的智能问答方法、装置、计算机设备及存储介质 WO2020233386A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910435063.2 2019-05-23
CN201910435063.2A CN110321416A (zh) 2019-05-23 2019-05-23 基于aiml的智能问答方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2020233386A1 true WO2020233386A1 (zh) 2020-11-26

Family

ID=68118991

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/088052 WO2020233386A1 (zh) 2019-05-23 2020-04-30 基于aiml的智能问答方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN110321416A (zh)
WO (1) WO2020233386A1 (zh)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110321416A (zh) * 2019-05-23 2019-10-11 深圳壹账通智能科技有限公司 基于aiml的智能问答方法、装置、计算机设备及存储介质
CN110795548A (zh) * 2019-10-25 2020-02-14 招商局金融科技有限公司 智能问答方法、装置及计算机可读存储介质
CN111582996B (zh) * 2020-05-20 2023-11-24 拉扎斯网络科技(上海)有限公司 业务信息的展示方法及装置
CN113807148B (zh) * 2020-06-16 2024-07-02 阿里巴巴集团控股有限公司 文本识别匹配方法和装置、终端设备
CN112069230B (zh) * 2020-09-07 2023-10-27 中国平安财产保险股份有限公司 数据分析方法、装置、设备及存储介质
CN112667789B (zh) * 2020-12-17 2024-07-26 中国平安人寿保险股份有限公司 用户意图匹配方法装置、终端设备及存储介质
CN116821304B (zh) * 2023-07-07 2023-12-19 国网青海省电力公司信息通信公司 基于大数据的供电所知识智能问答系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105677822A (zh) * 2016-01-05 2016-06-15 首都师范大学 一种基于对话机器人的招生自动问答方法及系统
CN107688667A (zh) * 2017-09-30 2018-02-13 平安科技(深圳)有限公司 智能机器人客服方法、电子装置及计算机可读存储介质
CN109241258A (zh) * 2018-08-23 2019-01-18 江苏索迩软件技术有限公司 一种应用税务领域的深度学习智能问答系统
CN109325040A (zh) * 2018-07-13 2019-02-12 众安信息技术服务有限公司 一种faq问答库泛化方法、装置及设备
CN110321416A (zh) * 2019-05-23 2019-10-11 深圳壹账通智能科技有限公司 基于aiml的智能问答方法、装置、计算机设备及存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9460085B2 (en) * 2013-12-09 2016-10-04 International Business Machines Corporation Testing and training a question-answering system
CN107066541A (zh) * 2017-03-13 2017-08-18 平安科技(深圳)有限公司 客服问答数据的处理方法及系统
CN107301865B (zh) * 2017-06-22 2020-11-03 海信集团有限公司 一种用于语音输入中确定交互文本的方法和装置
CN107220380A (zh) * 2017-06-27 2017-09-29 北京百度网讯科技有限公司 基于人工智能的问答推荐方法、装置和计算机设备
CN107609101B (zh) * 2017-09-11 2020-10-27 远光软件股份有限公司 智能交互方法、设备及存储介质
CN109766423A (zh) * 2018-12-29 2019-05-17 上海智臻智能网络科技股份有限公司 基于神经网络的问答方法及装置、存储介质、终端

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105677822A (zh) * 2016-01-05 2016-06-15 首都师范大学 一种基于对话机器人的招生自动问答方法及系统
CN107688667A (zh) * 2017-09-30 2018-02-13 平安科技(深圳)有限公司 智能机器人客服方法、电子装置及计算机可读存储介质
CN109325040A (zh) * 2018-07-13 2019-02-12 众安信息技术服务有限公司 一种faq问答库泛化方法、装置及设备
CN109241258A (zh) * 2018-08-23 2019-01-18 江苏索迩软件技术有限公司 一种应用税务领域的深度学习智能问答系统
CN110321416A (zh) * 2019-05-23 2019-10-11 深圳壹账通智能科技有限公司 基于aiml的智能问答方法、装置、计算机设备及存储介质

Also Published As

Publication number Publication date
CN110321416A (zh) 2019-10-11

Similar Documents

Publication Publication Date Title
WO2020233386A1 (zh) 基于aiml的智能问答方法、装置、计算机设备及存储介质
CN112069298B (zh) 基于语义网和意图识别的人机交互方法、设备及介质
Malandrakis et al. Distributional semantic models for affective text analysis
CN102629246B (zh) 识别浏览器语音命令的服务器及浏览器语音命令识别方法
CN111046656B (zh) 文本处理方法、装置、电子设备及可读存储介质
CN111708869B (zh) 人机对话的处理方法及装置
CN113468302A (zh) 组合共享询问线的多个搜索查询的参数
CN107704453A (zh) 一种文字语义分析方法、文字语义分析终端及存储介质
US20230394247A1 (en) Human-machine collaborative conversation interaction system and method
CN113505209A (zh) 一种面向汽车领域的智能问答系统
CN112052324A (zh) 智能问答的方法、装置和计算机设备
KR101677859B1 (ko) 지식 베이스를 이용하는 시스템 응답 생성 방법 및 이를 수행하는 장치
CN109614620B (zh) 一种基于HowNet的图模型词义消歧方法和系统
US11907665B2 (en) Method and system for processing user inputs using natural language processing
CN112765974B (zh) 一种业务辅助方法、电子设备及可读存储介质
CN112115252B (zh) 智能辅助写作处理方法、装置、电子设备及存储介质
US20220147719A1 (en) Dialogue management
WO2023278052A1 (en) Automated troubleshooter
JP2013190985A (ja) 知識応答システム、方法およびコンピュータプログラム
KR101333485B1 (ko) 온라인 사전을 이용한 개체명 사전 구축 방법 및 이를 실행하는 장치
Aliero et al. Systematic review on text normalization techniques and its approach to non-standard words
CN115017271B (zh) 用于智能生成rpa流程组件块的方法及系统
CN113761919A (zh) 一种口语化短文本的实体属性提取方法及电子装置
Karpagam et al. Deep learning approaches for answer selection in question answering system for conversation agents
CN111046168A (zh) 用于生成专利概述信息的方法、装置、电子设备和介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20810174

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20810174

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 18/03/2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20810174

Country of ref document: EP

Kind code of ref document: A1