CN110569419A - question-answering system optimization method and device, computer equipment and storage medium - Google Patents

question-answering system optimization method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN110569419A
CN110569419A CN201910699484.6A CN201910699484A CN110569419A CN 110569419 A CN110569419 A CN 110569419A CN 201910699484 A CN201910699484 A CN 201910699484A CN 110569419 A CN110569419 A CN 110569419A
Authority
CN
China
Prior art keywords
question
media platform
keywords
content
keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910699484.6A
Other languages
Chinese (zh)
Inventor
王科强
骆迅
顾婷婷
倪渊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910699484.6A priority Critical patent/CN110569419A/en
Publication of CN110569419A publication Critical patent/CN110569419A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/319Inverted lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines

Abstract

The invention discloses a question-answering system optimization method, a question-answering system optimization device, computer equipment and a storage medium, and belongs to the technical field of natural language processing. The method identifies whether the question and answer request input by the user is related to the content of the media platform by judging whether the keywords in the question and answer request are related to the content of the media platform, and can directly feed back the response message without matching content for the irrelevant question and answer request; and further searching corresponding data in an information base associated with the media platform for the related question-answer request so as to accurately obtain a search result matched with the question-answer request, thereby improving the accuracy of the answer content and the experience effect of the user.

Description

Question-answering system optimization method and device, computer equipment and storage medium
Technical Field
The invention relates to the technical field of natural language processing, in particular to a question-answering system optimization method, a question-answering system optimization device, computer equipment and a storage medium.
background
currently, portal websites, such as official websites of schools, hospitals and government agencies, have a lot of information related to the agencies, and users (students, patients and citizens) can navigate or search key information needed by the users themselves through the websites. Navigating the user requires one to click and view to filter the usefulness of the information. In addition, if the navigation entries of the web portal are confusing, it takes a lot of time for the user to authenticate. The search function of the portal website can only search the keywords generally, and the published information is generally single due to the fact that the data volume of most portal websites is small. If the keywords used by the user are not suitable, although semantically suitable, the search engine of the portal site may have difficulty returning results that meet the user's intention. The user needs to make multiple attempts by using different keywords (synonyms) to obtain the desired result, and the search performance is poor and the recall rate is low.
disclosure of Invention
Aiming at the problem of poor search performance of the existing official portal website, the question-answering system optimization method, the question-answering system optimization device, the computer equipment and the storage medium are provided, wherein the question-answering system optimization method, the question-answering system optimization device, the computer equipment and the storage medium are used for optimizing the search performance and improving the search accuracy.
in order to achieve the above object, the present invention provides a question-answering system optimization method, which is applied to a media platform and comprises:
S1, obtaining a question and answer request;
S2, extracting keywords in the question and answer request, judging whether the keywords are related to the content in the media platform or not, and if yes, executing a step S3; if not, go to step S4;
S3, identifying the keywords, searching an information base associated with the media platform according to an identification result, and outputting a search result;
And S4, outputting an answer message that the question and answer request is not matched with the content of the media platform.
preferably, the media platform comprises a web portal platform, and/or a wechat public number platform.
Preferably, the step S2 of determining whether the keyword is related to the content in the media platform includes:
Classifying the keywords by adopting a corpus classification model, acquiring category information of the keywords, matching the category information with type data associated with the media platform, acquiring matching degree of the category information and the type data, judging whether the matching degree is greater than a preset threshold value or not, and if so, executing a step S3; if not, go to step S4.
Preferably, the corpus classification model is a model obtained by crawling data from the internet for corpus classification training.
Preferably, the information base adopts an inverted index file, and each entry in an index table of the inverted index file includes an attribute value and an address of each record having the attribute value.
Preferably, the step S3, recognizing the keyword, searching an information base associated with the media platform according to the recognition result, and outputting the search result, includes:
Identifying the keywords, matching the identification result with the attribute values in the inverted index file, acquiring a recording address corresponding to the attribute value with the highest correlation degree according to the correlation degree of the identification result and the attribute values, and outputting the content in the recording address as the search result;
And the recording address is a webpage address or a storage address of a text file.
Preferably, the step S3, recognizing the keyword, searching an information base associated with the media platform according to the recognition result, and outputting the search result, includes:
Recognizing the keywords by adopting a semantic matching model, acquiring search terms related to the keywords, matching the keywords and the search terms with the information base, and outputting data information with the highest matching degree;
The search term is a synonym of the keyword.
in order to achieve the above object, the present invention provides a question answering device applied in a media platform, comprising:
The acquisition unit is used for acquiring the question and answer request;
a judging unit, configured to extract a keyword in the question and answer request, judge whether the keyword is related to content in the media platform, and output an answer message that the question and answer request does not match with the content in the media platform when the keyword is not related to the content in the media platform;
and the searching unit is used for identifying the keywords when the keywords are related to the content in the media platform, searching an information base related to the media platform according to an identification result and outputting a searching result.
To achieve the above object, the present invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.
to achieve the above object, the present invention provides a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the above method.
according to the question-answering system optimization method, device, computer equipment and storage medium provided by the invention, whether the question-answering request input by the user is related to the content of the media platform is identified by judging whether the keyword in the question-answering request is related to the content of the media platform, and the response message without matching content can be directly fed back for the irrelevant question-answering request; and further searching corresponding data in an information base associated with the media platform for the related question-answer request so as to accurately obtain a search result matched with the question-answer request, thereby improving the accuracy of the answer content and the experience effect of the user.
drawings
FIG. 1 is a block diagram of an embodiment of a question-answering system optimization method according to the present invention;
FIG. 2 is a block diagram of an embodiment of a question answering device according to the present invention;
fig. 3 is a schematic diagram of a hardware structure of a computer device for executing the method for optimizing the question-answering system according to the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The invention discloses a question-answering system optimization method, a device, computer equipment and a storage medium, which are suitable for the fields of medical treatment (such as hospital portal websites) and education (university portal websites) and the like, and provides the question-answering system optimization method for optimizing search performance and improving search accuracy. The method identifies whether the question and answer request input by the user is related to the content of the media platform by judging whether the keywords in the question and answer request are related to the content of the media platform, and can directly feed back the response message without matching content for the irrelevant question and answer request; and further searching corresponding data in an information base associated with the media platform for the related question-answer request so as to accurately obtain a search result matched with the question-answer request, thereby improving the accuracy of the answer content and the experience effect of the user.
Example one
Referring to fig. 1, the method for optimizing a question-answering system of the embodiment is applied to a media platform, and the media platform may include a web portal platform and/or a wechat public number platform.
by way of example and not limitation, the portal platform may be a highly domain-specific platform such as a portal platform of a medical institution, a portal platform of a university institution, or the like. Similarly, the wechat public number platform can be a wechat public number platform of a medical institution, a wechat public number platform of a college institution and other platforms with strong pertinence in the fields.
the question-answering system optimization method can comprise the following steps:
s1, obtaining a question and answer request;
S2, extracting keywords in the question and answer request, judging whether the keywords are related to the content in the media platform or not, and if yes, executing a step S3; if not, go to step S4;
In this step, whether the content of the question answering request is related to the content related to the media platform is obtained by identifying whether the keyword is related to the content in the media platform. For example: when a university website or WeChat public account searches for a question-answer request of a hospital registration process, the content of the question-answer request can be known to be inconsistent with the content of the university website or the WeChat public account by identifying keywords, and a user can be prompted that the website or the WeChat public account has no matched answer result.
Specifically, the step S2 of determining whether the keyword is related to the content in the media platform may include:
Classifying the keywords by adopting a corpus classification model, acquiring category information of the keywords, matching the category information with type data associated with the media platform, acquiring matching degree of the category information and the type data, judging whether the matching degree is greater than a preset threshold value or not, and if so, executing a step S3; if not, go to step S4.
The corpus classification model is obtained by crawling data from the internet to perform corpus classification training. The corpus classification model is obtained by training a large amount of internet data in advance, and whether keywords in the question-answering request are related to the content in the media platform or not can be identified through the corpus classification model, so that whether the question-answering request of a user is in the range of the related field of the content in the media platform or not is judged.
Furthermore, the corpus classification model classifies the corpus using a naive Bayes classification algorithm, which predicts the probability of class membership, such as the probability of a given tuple belonging to a particular class, as a statistical classification method. The naive bayes classification algorithm is based on bayes theorem. The naive bayes classification algorithm assumes that the probability of one attribute value on a given class is independent of the values of other attributes, an assumption called class conditional independence.
the process of training the corpus classification model is as follows:
inputting keywords crawled from the Internet into a classification model, calculating the characteristic attribute of each class corresponding to the keywords by calculating the characteristic attribute of each class corresponding to each keyword, calculating the conditional probability of all the partitions for each characteristic attribute, predicting the class corresponding to the keywords according to the conditional probability, comparing the predicted result with the real result, and adjusting the parameter value in the classification model until the training of the classification model is completed to obtain the corpus classification model. It should be noted that the category is a category related to the content in the media platform, and whether the keyword is related to the content in the media platform is determined according to the category;
The process of identifying whether the keywords in the question-answering request are related to the content in the media platform or not through the corpus classification model is as follows:
The method comprises the steps of obtaining keywords in a question and answer request, calculating the characteristic attributes of each class corresponding to the keywords, calculating the conditional probability of all the partitions of each characteristic attribute, predicting the class corresponding to the keywords according to the conditional probability, wherein the class is related to the content in a media platform, and judging whether the keywords are related to the content in the media platform according to the class.
by way of example and not limitation, after the keywords in the question and answer request are obtained, the keywords may be filtered according to a filtering preset word, where the filtering preset word is one or more of the following words: dirty words, sensitive words, stop words, etc.
s3, identifying the keywords, searching an information base associated with the media platform according to an identification result, and outputting a search result;
Wherein, the information base adopts an inverted index file. The inverted index results from the need to look up records based on the values of attributes in practical applications. Each entry in the index table of the inverted index file includes an attribute value and the address of the record having the attribute value. Since the attribute value is not determined by the record but the position of the record is determined by the attribute value, it is called inverted index (inverted index). The file with the inverted index is called an inverted index file, which is called an inverted file for short. The posting lists are used to record which documents contain a word. Generally, many documents in a document set contain a word, each document records information such as a document number (DocID), the number of Times (TF) that the word appears in the document, and where the word appears in the document, so that the information related to a document is called a reverse index (nesting), and a series of reverse index containing the word forms a list structure, which is a reverse list corresponding to a word. All words that appear in the document collection and their corresponding posting lists constitute the posting index.
As a preferred embodiment, the step S3 of identifying the keyword, and searching the information base associated with the media platform according to the identification result may include:
and identifying the keywords, matching the identification result with the attribute values in the inverted index file, acquiring a recording address corresponding to the attribute value with the highest correlation degree according to the correlation degree of the identification result and the attribute values, and outputting the content in the recording address as the search result.
it should be noted that the recording address may be a web page address, or may be a storage address of a text file (e.g., doc, txt, pdf, etc.). And outputting the search result as the content in the address of the website or the content in the text file.
As a preferred embodiment, the step S3, recognizing the keyword, searching an information base associated with the media platform according to the recognition result, and outputting the search result, includes:
Recognizing the keywords by adopting a semantic matching model, acquiring search terms related to the keywords, matching the keywords and the search terms with the information base, and outputting data information with the highest matching degree;
the search term is a synonym of the keyword.
in the embodiment, a real-time semantic index mode is adopted, and keywords are matched and identified in a semantic matching mode, so that the identification accuracy is improved. Taking the keyword of the question and answer request as 'sports' as an example, the keyword is identified by adopting a semantic matching model, and words such as 'exercise', 'activity', 'training', and the like which are related to and close to the 'sports' can be identified, so that data information with high matching degree with the 'sports' and the words close to the 'sports' can be searched in an information base, the acquired feedback data can comprise information such as 'exercise', 'activity', 'training', and the like, and the searching comprehensiveness and accuracy are improved.
in practical application, a semantic matching model is adopted to identify the keywords, and the keywords or the question and answer requests are analyzed into vectors; and analyzing the query statement of the information base into vectors, matching the two vectors to calculate the similarity, accelerating the indexing speed by adopting the semantic index number, and outputting the data information corresponding to the vector with the closest similarity as a search result.
and S4, outputting an answer message that the question and answer request is not matched with the content of the media platform.
in practical application, a user can log in a portal website of a media platform, input a question and answer request (namely a query statement), judge whether the question and answer request belongs to the field range content of the website according to keywords in the question and answer request, and if not, the user can indicate that the website belongs to a certain field and the question and answer request is a response message unrelated to the field; if yes, performing semantic matching, and returning relevant answers for the user to refer. The user can also log in the WeChat public number of the media platform, input the question-answer request (namely query statement) through the mode of chatting, judge whether the question-answer request belongs to the content of the range of field that this WeChat public number relates to according to the keyword in the question-answer request, if not, can user represent this WeChat public number belongs to a certain field, the question-answer request is with the irrelevant response message of this field; if yes, performing semantic matching, and returning relevant answers for the user to refer.
in the embodiment, whether the question and answer request input by the user is related to the content of the media platform is identified by judging whether the keywords in the question and answer request are related to the content of the media platform, and a response message without matching content can be directly fed back for the irrelevant question and answer request; for the related question and answer requests, corresponding data are further searched in an information base associated with the media platform, so that search results matched with the question and answer requests are accurately obtained, the user is helped to find related query contents more quickly and accurately, and the accuracy of answer contents and the experience effect of the user are improved.
Example two
as shown in fig. 2, a question answering device 1 applied in a media platform includes: an acquisition unit 11, a judgment unit 12 and a search unit 13; wherein:
An obtaining unit 11, configured to obtain a question and answer request;
A judging unit 12, configured to extract a keyword in the question and answer request, judge whether the keyword is related to content in the media platform, and output an answer message that the question and answer request does not match with the content in the media platform when the keyword is not related to the content in the media platform;
in this step, whether the content of the question answering request is related to the content related to the media platform is obtained by identifying whether the keyword is related to the content in the media platform. For example: when a university website or WeChat public account searches for a question-answer request of a hospital registration process, the content of the question-answer request can be known to be inconsistent with the content of the university website or the WeChat public account by identifying keywords, and a user can be prompted that the website or the WeChat public account has no matched answer result.
Specifically, the determining unit 12 may determine whether the keyword is related to the content in the media platform by:
Classifying the keywords by adopting a corpus classification model, acquiring category information of the keywords, matching the category information with type data associated with the media platform, acquiring matching degree of the category information and the type data, judging whether the matching degree is greater than a preset threshold value, if so, searching corresponding data in an information base associated with the media platform according to the keywords, and outputting a search result; if not, outputting an answer message that the question-answer request is not matched with the content of the media platform.
the corpus classification model is obtained by crawling data from the internet to perform corpus classification training. The corpus classification model is obtained by training a large amount of internet data in advance, and whether keywords in the question-answering request are related to the content in the media platform or not can be identified through the corpus classification model, so that whether the question-answering request of a user is in the range of the related field of the content in the media platform or not is judged.
Furthermore, the corpus classification model classifies the corpus using a naive Bayes classification algorithm, which predicts the probability of class membership, such as the probability of a given tuple belonging to a particular class, as a statistical classification method. The naive bayes classification algorithm is based on bayes theorem. The naive bayes classification algorithm assumes that the probability of one attribute value on a given class is independent of the values of other attributes, an assumption called class conditional independence.
The process of training the corpus classification model is as follows:
inputting keywords crawled from the Internet into a classification model, calculating the characteristic attribute of each class corresponding to the keywords by calculating the characteristic attribute of each class corresponding to each keyword, calculating the conditional probability of all the partitions for each characteristic attribute, predicting the class corresponding to the keywords according to the conditional probability, comparing the predicted result with the real result, and adjusting the parameter value in the classification model until the training of the classification model is completed to obtain the corpus classification model. It should be noted that the category is a category related to the content in the media platform, and whether the keyword is related to the content in the media platform is determined according to the category;
The process of identifying whether the keywords in the question-answering request are related to the content in the media platform or not through the corpus classification model is as follows:
the method comprises the steps of obtaining keywords in a question and answer request, calculating the characteristic attributes of each class corresponding to the keywords, calculating the conditional probability of all the partitions of each characteristic attribute, predicting the class corresponding to the keywords according to the conditional probability, wherein the class is related to the content in a media platform, and judging whether the keywords are related to the content in the media platform according to the class.
By way of example and not limitation, after the keywords in the question and answer request are obtained, the keywords may be filtered according to a filtering preset word, where the filtering preset word is one or more of the following words: dirty words, sensitive words, stop words, etc.
in practical application, a user can log in a portal website of a media platform, input a question and answer request (namely a query statement), judge whether the question and answer request belongs to the field range content of the website according to keywords in the question and answer request, and if not, the user can indicate that the website belongs to a certain field and the question and answer request is a response message unrelated to the field; if yes, performing semantic matching, and returning relevant answers for the user to refer. The user can also log in the WeChat public number of the media platform, input the question-answer request (namely query statement) through the mode of chatting, judge whether the question-answer request belongs to the content of the range of field that this WeChat public number relates to according to the keyword in the question-answer request, if not, can user represent this WeChat public number belongs to a certain field, the question-answer request is with the irrelevant response message of this field; if yes, performing semantic matching, and returning relevant answers for the user to refer.
And the searching unit 13 is configured to, when the keyword is related to the content in the media platform, identify the keyword, search an information base associated with the media platform according to an identification result, and output the search result.
Wherein, the information base adopts an inverted index file. The inverted index results from the need to look up records based on the values of attributes in practical applications. Each entry in the index table of the inverted index file includes an attribute value and the address of the record having the attribute value. Since the attribute value is not determined by the record but the position of the record is determined by the attribute value, it is called inverted index (inverted index). The file with the inverted index is called an inverted index file, which is called an inverted file for short. The posting lists are used to record which documents contain a word. Generally, many documents in a document set contain a word, each document records information such as a document number (DocID), the number of Times (TF) that the word appears in the document, and where the word appears in the document, so that the information related to a document is called a reverse index (nesting), and a series of reverse index containing the word forms a list structure, which is a reverse list corresponding to a word. All words that appear in the document collection and their corresponding posting lists constitute the posting index.
As a preferred embodiment, the searching unit 13 identifies the keyword, and searching the information base associated with the media platform according to the identification result may include:
And identifying the keywords, matching the identification result with the attribute values in the inverted index file, acquiring a recording address corresponding to the attribute value with the highest correlation degree according to the correlation degree of the identification result and the attribute values, and outputting the content in the recording address as the search result.
It should be noted that the recording address may be a web page address, or may be a storage address of a text file (e.g., doc, txt, pdf, etc.). And outputting the search result as the content in the address of the website or the content in the text file.
As a preferred embodiment, the searching unit 13 identifies the keyword, searches an information base associated with the media platform according to the identification result, and outputs the search result, including:
recognizing the keywords by adopting a semantic matching model, acquiring search terms related to the keywords, matching the keywords and the search terms with the information base, and outputting data information with the highest matching degree;
the search term is a synonym of the keyword.
in the embodiment, a real-time semantic index mode is adopted, and keywords are matched and identified in a semantic matching mode, so that the identification accuracy is improved. Taking the keyword of the question and answer request as 'sports' as an example, the keyword is identified by adopting a semantic matching model, and words such as 'exercise', 'activity', 'training', and the like which are related to and close to the 'sports' can be identified, so that data information with high matching degree with the 'sports' and the words close to the 'sports' can be searched in an information base, the acquired feedback data can comprise information such as 'exercise', 'activity', 'training', and the like, and the searching comprehensiveness and accuracy are improved.
In practical application, a semantic matching model is adopted to identify the keywords, and the keywords or the question and answer requests are analyzed into vectors; and analyzing the query statement of the information base into vectors, matching the two vectors to calculate the similarity, accelerating the indexing speed by adopting the semantic index number, and outputting the data information corresponding to the vector with the closest similarity as a search result.
In the embodiment, whether the question and answer request input by the user is related to the content of the media platform is identified by judging whether the keywords in the question and answer request are related to the content of the media platform, and a response message without matching content can be directly fed back for the irrelevant question and answer request; for the related question and answer requests, corresponding data are further searched in an information base associated with the media platform, so that search results matched with the question and answer requests are accurately obtained, the user is helped to find related query contents more quickly and accurately, and the accuracy of answer contents and the experience effect of the user are improved.
EXAMPLE III
in order to achieve the above object, the present invention further provides a computer device 2, where the computer device 2 includes a plurality of computer devices 2, components of the question answering apparatus 1 according to the second embodiment may be distributed in different computer devices 2, and the computer device 2 may be a smartphone, a tablet computer, a notebook computer, a desktop computer, a rack server, a blade server, a tower server, or a rack server (including an independent server or a server cluster formed by a plurality of servers) that executes a program. The computer device 2 of the present embodiment includes at least, but is not limited to: a memory 21, a processor 23, a network interface 22, and the question-answering device 1 (refer to fig. 3) that are communicably connected to each other through a system bus. It is noted that fig. 3 only shows the computer device 2 with components, but it is to be understood that not all of the shown components are required to be implemented, and that more or less components may be implemented instead.
In this embodiment, the memory 21 includes at least one type of computer-readable storage medium, which includes a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the storage 21 may be an internal storage unit of the computer device 2, such as a hard disk or a memory of the computer device 2. In other embodiments, the memory 21 may also be an external storage device of the computer device 2, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like provided on the computer device 2. Of course, the memory 21 may also comprise both an internal storage unit of the computer device 2 and an external storage device thereof. In this embodiment, the memory 21 is generally used to store an operating system installed in the computer device 2 and various application software, such as a program code of the question answering system optimization method in the first embodiment. Further, the memory 21 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 23 may be a Central Processing Unit (CPU), a controller, a microcontroller, a microprocessor, or other data Processing chip in some embodiments. The processor 23 is typically used for controlling the overall operation of the computer device 2, such as performing control and processing related to data interaction or communication with the computer device 2. In this embodiment, the processor 23 is configured to operate the program codes stored in the memory 21 or process data, for example, operate the question answering device 1.
The network interface 22 may comprise a wireless network interface or a wired network interface, and the network interface 22 is typically used to establish a communication connection between the computer device 2 and other computer devices 2. For example, the network interface 22 is used to connect the computer device 2 to an external terminal through a network, establish a data transmission channel and a communication connection between the computer device 2 and the external terminal, and the like. The network may be a wireless or wired network such as an Intranet (Intranet), the Internet (Internet), a Global System of Mobile communication (GSM), Wideband Code Division Multiple Access (WCDMA), a 4G network, a 5G network, Bluetooth (Bluetooth), Wi-Fi, and the like.
it is noted that fig. 3 only shows the computer device 2 with components 21-23, but it is to be understood that not all shown components are required to be implemented, and that more or less components may be implemented instead.
In this embodiment, the question answering device 1 stored in the memory 21 can also be divided into one or more program modules, and the one or more program modules are stored in the memory 21 and executed by one or more processors (in this embodiment, the processor 23) to complete the present invention.
example four
To achieve the above objects, the present invention also provides a computer-readable storage medium including a plurality of storage media such as a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application store, etc., on which a computer program is stored, which when executed by the processor 23, implements corresponding functions. The computer-readable storage medium of the present embodiment is used to store the question-answering apparatus 1, and when executed by the processor 23, the method for optimizing the question-answering system of the first embodiment is implemented.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A question-answering system optimization method is applied to a media platform and comprises the following steps:
s1, obtaining a question and answer request;
s2, extracting keywords in the question and answer request, judging whether the keywords are related to the content in the media platform or not, and if yes, executing a step S3; if not, go to step S4;
s3, identifying the keywords, searching an information base associated with the media platform according to an identification result, and outputting a search result;
And S4, outputting an answer message that the question and answer request is not matched with the content of the media platform.
2. The question-answering system optimization method according to claim 1, wherein the media platform comprises a web portal platform and/or a wechat public number platform.
3. The question-answering system optimization method according to claim 1, wherein the determining whether the keyword is related to the content in the media platform in the step S2 includes:
Classifying the keywords by adopting a corpus classification model, acquiring category information of the keywords, matching the category information with type data associated with the media platform, acquiring matching degree of the category information and the type data, judging whether the matching degree is greater than a preset threshold value or not, and if so, executing a step S3; if not, go to step S4.
4. The question-answering system optimization method according to claim 3, wherein the corpus classification model is a model obtained by performing corpus classification training on data crawled from the internet.
5. The question-answering system optimization method according to claim 1, wherein the information base employs an inverted index file, and each entry in an index table of the inverted index file includes an attribute value and an address of each record having the attribute value.
6. The question-answering system optimization method according to claim 5, wherein the step S3 of recognizing the keyword, searching an information base associated with the media platform according to the recognition result, and outputting the search result comprises:
Identifying the keywords, matching the identification result with the attribute values in the inverted index file, acquiring a recording address corresponding to the attribute value with the highest correlation degree according to the correlation degree of the identification result and the attribute values, and outputting the content in the recording address as the search result;
And the recording address is a webpage address or a storage address of a text file.
7. The question-answering system optimization method according to claim 1, wherein the step S3 of recognizing the keyword, searching an information base associated with the media platform according to the recognition result, and outputting the search result includes:
Recognizing the keywords by adopting a semantic matching model, acquiring search terms related to the keywords, matching the keywords and the search terms with the information base, and outputting data information with the highest matching degree;
the search term is a synonym of the keyword.
8. a question answering device, applied to a media platform, comprising:
the acquisition unit is used for acquiring the question and answer request;
A judging unit, configured to extract a keyword in the question and answer request, judge whether the keyword is related to content in the media platform, and output an answer message that the question and answer request does not match with the content in the media platform when the keyword is not related to the content in the media platform;
And the searching unit is used for identifying the keywords when the keywords are related to the content in the media platform, searching an information base related to the media platform according to an identification result and outputting a searching result.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program when executed by a processor implements the steps of the method of any one of claims 1 to 7.
CN201910699484.6A 2019-07-31 2019-07-31 question-answering system optimization method and device, computer equipment and storage medium Pending CN110569419A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910699484.6A CN110569419A (en) 2019-07-31 2019-07-31 question-answering system optimization method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910699484.6A CN110569419A (en) 2019-07-31 2019-07-31 question-answering system optimization method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN110569419A true CN110569419A (en) 2019-12-13

Family

ID=68773375

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910699484.6A Pending CN110569419A (en) 2019-07-31 2019-07-31 question-answering system optimization method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110569419A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111667029A (en) * 2020-07-09 2020-09-15 腾讯科技(深圳)有限公司 Clustering method, device, equipment and storage medium
CN112036350A (en) * 2020-09-07 2020-12-04 山东山科数字经济研究院有限公司 User investigation method and system based on government affair cloud
CN112115282A (en) * 2020-09-17 2020-12-22 北京达佳互联信息技术有限公司 Question answering method, device, equipment and storage medium based on search
CN116450858A (en) * 2023-06-14 2023-07-18 辰风策划(深圳)有限公司 Sales system for electronic product

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120239657A1 (en) * 2011-03-18 2012-09-20 Fujitsu Limited Category classification processing device and method
CN103425640A (en) * 2012-05-14 2013-12-04 华为技术有限公司 Multimedia questioning-answering system and method
US20150186527A1 (en) * 2013-12-26 2015-07-02 Iac Search & Media, Inc. Question type detection for indexing in an offline system of question and answer search engine
CN109614476A (en) * 2018-12-11 2019-04-12 平安科技(深圳)有限公司 Customer service system answering method, device, computer equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120239657A1 (en) * 2011-03-18 2012-09-20 Fujitsu Limited Category classification processing device and method
CN103425640A (en) * 2012-05-14 2013-12-04 华为技术有限公司 Multimedia questioning-answering system and method
US20150186527A1 (en) * 2013-12-26 2015-07-02 Iac Search & Media, Inc. Question type detection for indexing in an offline system of question and answer search engine
CN109614476A (en) * 2018-12-11 2019-04-12 平安科技(深圳)有限公司 Customer service system answering method, device, computer equipment and storage medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111667029A (en) * 2020-07-09 2020-09-15 腾讯科技(深圳)有限公司 Clustering method, device, equipment and storage medium
CN111667029B (en) * 2020-07-09 2023-11-10 腾讯科技(深圳)有限公司 Clustering method, device, equipment and storage medium
CN112036350A (en) * 2020-09-07 2020-12-04 山东山科数字经济研究院有限公司 User investigation method and system based on government affair cloud
CN112036350B (en) * 2020-09-07 2022-01-28 山东山科数字经济研究院有限公司 User investigation method and system based on government affair cloud
CN112115282A (en) * 2020-09-17 2020-12-22 北京达佳互联信息技术有限公司 Question answering method, device, equipment and storage medium based on search
CN116450858A (en) * 2023-06-14 2023-07-18 辰风策划(深圳)有限公司 Sales system for electronic product
CN116450858B (en) * 2023-06-14 2023-09-05 辰风策划(深圳)有限公司 Sales system for electronic product

Similar Documents

Publication Publication Date Title
US10586155B2 (en) Clarification of submitted questions in a question and answer system
US9489401B1 (en) Methods and systems for object recognition
EP2823410B1 (en) Entity augmentation service from latent relational data
CN110569419A (en) question-answering system optimization method and device, computer equipment and storage medium
US20160171095A1 (en) Identifying and Displaying Relationships Between Candidate Answers
US20160034514A1 (en) Providing search results based on an identified user interest and relevance matching
US20130268519A1 (en) Fact verification engine
US9275128B2 (en) Method and system for document indexing and data querying
US8825620B1 (en) Behavioral word segmentation for use in processing search queries
US20140379719A1 (en) System and method for tagging and searching documents
US10810245B2 (en) Hybrid method of building topic ontologies for publisher and marketer content and ad recommendations
CN108427702B (en) Target document acquisition method and application server
WO2021017306A1 (en) Personalized search method, system, and device employing user portrait, and storage medium
US20150206101A1 (en) System for determining infringement of copyright based on the text reference point and method thereof
CN106407316B (en) Software question and answer recommendation method and device based on topic model
CN112883030A (en) Data collection method and device, computer equipment and storage medium
CN112740202A (en) Performing image search using content tags
US11379527B2 (en) Sibling search queries
CN110569349A (en) Big data-based method, system, equipment and storage medium for pushing articles for education
CN114116997A (en) Knowledge question answering method, knowledge question answering device, electronic equipment and storage medium
CN113254671B (en) Atlas optimization method, device, equipment and medium based on query analysis
US11507593B2 (en) System and method for generating queryeable structured document from an unstructured document using machine learning
CN113590792A (en) User problem processing method and device and server
CN112527954A (en) Unstructured data full-text search method and system and computer equipment
JP2012104051A (en) Document index creating device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination