CN110569419A

CN110569419A - question-answering system optimization method and device, computer equipment and storage medium

Info

Publication number: CN110569419A
Application number: CN201910699484.6A
Authority: CN
Inventors: 王科强; 骆迅; 顾婷婷; 倪渊
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-07-31
Filing date: 2019-07-31
Publication date: 2019-12-13

Abstract

The invention discloses a question-answering system optimization method, a question-answering system optimization device, computer equipment and a storage medium, and belongs to the technical field of natural language processing. The method identifies whether the question and answer request input by the user is related to the content of the media platform by judging whether the keywords in the question and answer request are related to the content of the media platform, and can directly feed back the response message without matching content for the irrelevant question and answer request; and further searching corresponding data in an information base associated with the media platform for the related question-answer request so as to accurately obtain a search result matched with the question-answer request, thereby improving the accuracy of the answer content and the experience effect of the user.

Description

Question-answering system optimization method and device, computer equipment and storage medium

Technical Field

The invention relates to the technical field of natural language processing, in particular to a question-answering system optimization method, a question-answering system optimization device, computer equipment and a storage medium.

background

currently, portal websites, such as official websites of schools, hospitals and government agencies, have a lot of information related to the agencies, and users (students, patients and citizens) can navigate or search key information needed by the users themselves through the websites. Navigating the user requires one to click and view to filter the usefulness of the information. In addition, if the navigation entries of the web portal are confusing, it takes a lot of time for the user to authenticate. The search function of the portal website can only search the keywords generally, and the published information is generally single due to the fact that the data volume of most portal websites is small. If the keywords used by the user are not suitable, although semantically suitable, the search engine of the portal site may have difficulty returning results that meet the user's intention. The user needs to make multiple attempts by using different keywords (synonyms) to obtain the desired result, and the search performance is poor and the recall rate is low.

disclosure of Invention

Aiming at the problem of poor search performance of the existing official portal website, the question-answering system optimization method, the question-answering system optimization device, the computer equipment and the storage medium are provided, wherein the question-answering system optimization method, the question-answering system optimization device, the computer equipment and the storage medium are used for optimizing the search performance and improving the search accuracy.

in order to achieve the above object, the present invention provides a question-answering system optimization method, which is applied to a media platform and comprises:

S1, obtaining a question and answer request;

S2, extracting keywords in the question and answer request, judging whether the keywords are related to the content in the media platform or not, and if yes, executing a step S3; if not, go to step S4;

S3, identifying the keywords, searching an information base associated with the media platform according to an identification result, and outputting a search result;

And S4, outputting an answer message that the question and answer request is not matched with the content of the media platform.

preferably, the media platform comprises a web portal platform, and/or a wechat public number platform.

Preferably, the step S2 of determining whether the keyword is related to the content in the media platform includes:

Classifying the keywords by adopting a corpus classification model, acquiring category information of the keywords, matching the category information with type data associated with the media platform, acquiring matching degree of the category information and the type data, judging whether the matching degree is greater than a preset threshold value or not, and if so, executing a step S3; if not, go to step S4.

Preferably, the corpus classification model is a model obtained by crawling data from the internet for corpus classification training.

Preferably, the information base adopts an inverted index file, and each entry in an index table of the inverted index file includes an attribute value and an address of each record having the attribute value.

Preferably, the step S3, recognizing the keyword, searching an information base associated with the media platform according to the recognition result, and outputting the search result, includes:

Identifying the keywords, matching the identification result with the attribute values in the inverted index file, acquiring a recording address corresponding to the attribute value with the highest correlation degree according to the correlation degree of the identification result and the attribute values, and outputting the content in the recording address as the search result;

And the recording address is a webpage address or a storage address of a text file.

Recognizing the keywords by adopting a semantic matching model, acquiring search terms related to the keywords, matching the keywords and the search terms with the information base, and outputting data information with the highest matching degree;

The search term is a synonym of the keyword.

in order to achieve the above object, the present invention provides a question answering device applied in a media platform, comprising:

The acquisition unit is used for acquiring the question and answer request;

a judging unit, configured to extract a keyword in the question and answer request, judge whether the keyword is related to content in the media platform, and output an answer message that the question and answer request does not match with the content in the media platform when the keyword is not related to the content in the media platform;

and the searching unit is used for identifying the keywords when the keywords are related to the content in the media platform, searching an information base related to the media platform according to an identification result and outputting a searching result.

To achieve the above object, the present invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.

to achieve the above object, the present invention provides a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the above method.

according to the question-answering system optimization method, device, computer equipment and storage medium provided by the invention, whether the question-answering request input by the user is related to the content of the media platform is identified by judging whether the keyword in the question-answering request is related to the content of the media platform, and the response message without matching content can be directly fed back for the irrelevant question-answering request; and further searching corresponding data in an information base associated with the media platform for the related question-answer request so as to accurately obtain a search result matched with the question-answer request, thereby improving the accuracy of the answer content and the experience effect of the user.

drawings

FIG. 1 is a block diagram of an embodiment of a question-answering system optimization method according to the present invention;

FIG. 2 is a block diagram of an embodiment of a question answering device according to the present invention;

fig. 3 is a schematic diagram of a hardware structure of a computer device for executing the method for optimizing the question-answering system according to the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The invention discloses a question-answering system optimization method, a device, computer equipment and a storage medium, which are suitable for the fields of medical treatment (such as hospital portal websites) and education (university portal websites) and the like, and provides the question-answering system optimization method for optimizing search performance and improving search accuracy. The method identifies whether the question and answer request input by the user is related to the content of the media platform by judging whether the keywords in the question and answer request are related to the content of the media platform, and can directly feed back the response message without matching content for the irrelevant question and answer request; and further searching corresponding data in an information base associated with the media platform for the related question-answer request so as to accurately obtain a search result matched with the question-answer request, thereby improving the accuracy of the answer content and the experience effect of the user.

Example one

Referring to fig. 1, the method for optimizing a question-answering system of the embodiment is applied to a media platform, and the media platform may include a web portal platform and/or a wechat public number platform.

by way of example and not limitation, the portal platform may be a highly domain-specific platform such as a portal platform of a medical institution, a portal platform of a university institution, or the like. Similarly, the wechat public number platform can be a wechat public number platform of a medical institution, a wechat public number platform of a college institution and other platforms with strong pertinence in the fields.

the question-answering system optimization method can comprise the following steps:

s1, obtaining a question and answer request;

In this step, whether the content of the question answering request is related to the content related to the media platform is obtained by identifying whether the keyword is related to the content in the media platform. For example: when a university website or WeChat public account searches for a question-answer request of a hospital registration process, the content of the question-answer request can be known to be inconsistent with the content of the university website or the WeChat public account by identifying keywords, and a user can be prompted that the website or the WeChat public account has no matched answer result.

Specifically, the step S2 of determining whether the keyword is related to the content in the media platform may include:

The corpus classification model is obtained by crawling data from the internet to perform corpus classification training. The corpus classification model is obtained by training a large amount of internet data in advance, and whether keywords in the question-answering request are related to the content in the media platform or not can be identified through the corpus classification model, so that whether the question-answering request of a user is in the range of the related field of the content in the media platform or not is judged.

Furthermore, the corpus classification model classifies the corpus using a naive Bayes classification algorithm, which predicts the probability of class membership, such as the probability of a given tuple belonging to a particular class, as a statistical classification method. The naive bayes classification algorithm is based on bayes theorem. The naive bayes classification algorithm assumes that the probability of one attribute value on a given class is independent of the values of other attributes, an assumption called class conditional independence.

the process of training the corpus classification model is as follows:

inputting keywords crawled from the Internet into a classification model, calculating the characteristic attribute of each class corresponding to the keywords by calculating the characteristic attribute of each class corresponding to each keyword, calculating the conditional probability of all the partitions for each characteristic attribute, predicting the class corresponding to the keywords according to the conditional probability, comparing the predicted result with the real result, and adjusting the parameter value in the classification model until the training of the classification model is completed to obtain the corpus classification model. It should be noted that the category is a category related to the content in the media platform, and whether the keyword is related to the content in the media platform is determined according to the category;

The process of identifying whether the keywords in the question-answering request are related to the content in the media platform or not through the corpus classification model is as follows:

The method comprises the steps of obtaining keywords in a question and answer request, calculating the characteristic attributes of each class corresponding to the keywords, calculating the conditional probability of all the partitions of each characteristic attribute, predicting the class corresponding to the keywords according to the conditional probability, wherein the class is related to the content in a media platform, and judging whether the keywords are related to the content in the media platform according to the class.

by way of example and not limitation, after the keywords in the question and answer request are obtained, the keywords may be filtered according to a filtering preset word, where the filtering preset word is one or more of the following words: dirty words, sensitive words, stop words, etc.

Wherein, the information base adopts an inverted index file. The inverted index results from the need to look up records based on the values of attributes in practical applications. Each entry in the index table of the inverted index file includes an attribute value and the address of the record having the attribute value. Since the attribute value is not determined by the record but the position of the record is determined by the attribute value, it is called inverted index (inverted index). The file with the inverted index is called an inverted index file, which is called an inverted file for short. The posting lists are used to record which documents contain a word. Generally, many documents in a document set contain a word, each document records information such as a document number (DocID), the number of Times (TF) that the word appears in the document, and where the word appears in the document, so that the information related to a document is called a reverse index (nesting), and a series of reverse index containing the word forms a list structure, which is a reverse list corresponding to a word. All words that appear in the document collection and their corresponding posting lists constitute the posting index.

As a preferred embodiment, the step S3 of identifying the keyword, and searching the information base associated with the media platform according to the identification result may include:

and identifying the keywords, matching the identification result with the attribute values in the inverted index file, acquiring a recording address corresponding to the attribute value with the highest correlation degree according to the correlation degree of the identification result and the attribute values, and outputting the content in the recording address as the search result.

it should be noted that the recording address may be a web page address, or may be a storage address of a text file (e.g., doc, txt, pdf, etc.). And outputting the search result as the content in the address of the website or the content in the text file.

As a preferred embodiment, the step S3, recognizing the keyword, searching an information base associated with the media platform according to the recognition result, and outputting the search result, includes:

the search term is a synonym of the keyword.

in the embodiment, a real-time semantic index mode is adopted, and keywords are matched and identified in a semantic matching mode, so that the identification accuracy is improved. Taking the keyword of the question and answer request as 'sports' as an example, the keyword is identified by adopting a semantic matching model, and words such as 'exercise', 'activity', 'training', and the like which are related to and close to the 'sports' can be identified, so that data information with high matching degree with the 'sports' and the words close to the 'sports' can be searched in an information base, the acquired feedback data can comprise information such as 'exercise', 'activity', 'training', and the like, and the searching comprehensiveness and accuracy are improved.

in practical application, a semantic matching model is adopted to identify the keywords, and the keywords or the question and answer requests are analyzed into vectors; and analyzing the query statement of the information base into vectors, matching the two vectors to calculate the similarity, accelerating the indexing speed by adopting the semantic index number, and outputting the data information corresponding to the vector with the closest similarity as a search result.

in practical application, a user can log in a portal website of a media platform, input a question and answer request (namely a query statement), judge whether the question and answer request belongs to the field range content of the website according to keywords in the question and answer request, and if not, the user can indicate that the website belongs to a certain field and the question and answer request is a response message unrelated to the field; if yes, performing semantic matching, and returning relevant answers for the user to refer. The user can also log in the WeChat public number of the media platform, input the question-answer request (namely query statement) through the mode of chatting, judge whether the question-answer request belongs to the content of the range of field that this WeChat public number relates to according to the keyword in the question-answer request, if not, can user represent this WeChat public number belongs to a certain field, the question-answer request is with the irrelevant response message of this field; if yes, performing semantic matching, and returning relevant answers for the user to refer.

in the embodiment, whether the question and answer request input by the user is related to the content of the media platform is identified by judging whether the keywords in the question and answer request are related to the content of the media platform, and a response message without matching content can be directly fed back for the irrelevant question and answer request; for the related question and answer requests, corresponding data are further searched in an information base associated with the media platform, so that search results matched with the question and answer requests are accurately obtained, the user is helped to find related query contents more quickly and accurately, and the accuracy of answer contents and the experience effect of the user are improved.

Example two

as shown in fig. 2, a question answering device 1 applied in a media platform includes: an acquisition unit 11, a judgment unit 12 and a search unit 13; wherein:

An obtaining unit 11, configured to obtain a question and answer request;

A judging unit 12, configured to extract a keyword in the question and answer request, judge whether the keyword is related to content in the media platform, and output an answer message that the question and answer request does not match with the content in the media platform when the keyword is not related to the content in the media platform;

Specifically, the determining unit 12 may determine whether the keyword is related to the content in the media platform by:

Classifying the keywords by adopting a corpus classification model, acquiring category information of the keywords, matching the category information with type data associated with the media platform, acquiring matching degree of the category information and the type data, judging whether the matching degree is greater than a preset threshold value, if so, searching corresponding data in an information base associated with the media platform according to the keywords, and outputting a search result; if not, outputting an answer message that the question-answer request is not matched with the content of the media platform.

The process of training the corpus classification model is as follows:

And the searching unit 13 is configured to, when the keyword is related to the content in the media platform, identify the keyword, search an information base associated with the media platform according to an identification result, and output the search result.

As a preferred embodiment, the searching unit 13 identifies the keyword, and searching the information base associated with the media platform according to the identification result may include:

As a preferred embodiment, the searching unit 13 identifies the keyword, searches an information base associated with the media platform according to the identification result, and outputs the search result, including:

the search term is a synonym of the keyword.

EXAMPLE III

in order to achieve the above object, the present invention further provides a computer device 2, where the computer device 2 includes a plurality of computer devices 2, components of the question answering apparatus 1 according to the second embodiment may be distributed in different computer devices 2, and the computer device 2 may be a smartphone, a tablet computer, a notebook computer, a desktop computer, a rack server, a blade server, a tower server, or a rack server (including an independent server or a server cluster formed by a plurality of servers) that executes a program. The computer device 2 of the present embodiment includes at least, but is not limited to: a memory 21, a processor 23, a network interface 22, and the question-answering device 1 (refer to fig. 3) that are communicably connected to each other through a system bus. It is noted that fig. 3 only shows the computer device 2 with components, but it is to be understood that not all of the shown components are required to be implemented, and that more or less components may be implemented instead.

In this embodiment, the memory 21 includes at least one type of computer-readable storage medium, which includes a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the storage 21 may be an internal storage unit of the computer device 2, such as a hard disk or a memory of the computer device 2. In other embodiments, the memory 21 may also be an external storage device of the computer device 2, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like provided on the computer device 2. Of course, the memory 21 may also comprise both an internal storage unit of the computer device 2 and an external storage device thereof. In this embodiment, the memory 21 is generally used to store an operating system installed in the computer device 2 and various application software, such as a program code of the question answering system optimization method in the first embodiment. Further, the memory 21 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 23 may be a Central Processing Unit (CPU), a controller, a microcontroller, a microprocessor, or other data Processing chip in some embodiments. The processor 23 is typically used for controlling the overall operation of the computer device 2, such as performing control and processing related to data interaction or communication with the computer device 2. In this embodiment, the processor 23 is configured to operate the program codes stored in the memory 21 or process data, for example, operate the question answering device 1.

The network interface 22 may comprise a wireless network interface or a wired network interface, and the network interface 22 is typically used to establish a communication connection between the computer device 2 and other computer devices 2. For example, the network interface 22 is used to connect the computer device 2 to an external terminal through a network, establish a data transmission channel and a communication connection between the computer device 2 and the external terminal, and the like. The network may be a wireless or wired network such as an Intranet (Intranet), the Internet (Internet), a Global System of Mobile communication (GSM), Wideband Code Division Multiple Access (WCDMA), a 4G network, a 5G network, Bluetooth (Bluetooth), Wi-Fi, and the like.

it is noted that fig. 3 only shows the computer device 2 with components 21-23, but it is to be understood that not all shown components are required to be implemented, and that more or less components may be implemented instead.

In this embodiment, the question answering device 1 stored in the memory 21 can also be divided into one or more program modules, and the one or more program modules are stored in the memory 21 and executed by one or more processors (in this embodiment, the processor 23) to complete the present invention.

example four

To achieve the above objects, the present invention also provides a computer-readable storage medium including a plurality of storage media such as a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application store, etc., on which a computer program is stored, which when executed by the processor 23, implements corresponding functions. The computer-readable storage medium of the present embodiment is used to store the question-answering apparatus 1, and when executed by the processor 23, the method for optimizing the question-answering system of the first embodiment is implemented.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A question-answering system optimization method is applied to a media platform and comprises the following steps:

s1, obtaining a question and answer request;

2. The question-answering system optimization method according to claim 1, wherein the media platform comprises a web portal platform and/or a wechat public number platform.

3. The question-answering system optimization method according to claim 1, wherein the determining whether the keyword is related to the content in the media platform in the step S2 includes:

4. The question-answering system optimization method according to claim 3, wherein the corpus classification model is a model obtained by performing corpus classification training on data crawled from the internet.

5. The question-answering system optimization method according to claim 1, wherein the information base employs an inverted index file, and each entry in an index table of the inverted index file includes an attribute value and an address of each record having the attribute value.

6. The question-answering system optimization method according to claim 5, wherein the step S3 of recognizing the keyword, searching an information base associated with the media platform according to the recognition result, and outputting the search result comprises:

7. The question-answering system optimization method according to claim 1, wherein the step S3 of recognizing the keyword, searching an information base associated with the media platform according to the recognition result, and outputting the search result includes:

the search term is a synonym of the keyword.

8. a question answering device, applied to a media platform, comprising:

the acquisition unit is used for acquiring the question and answer request;

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program when executed by a processor implements the steps of the method of any one of claims 1 to 7.