Disclosure of Invention
The specification provides a method for extracting topic keywords, which comprises the following steps:
reading target question-answer items from a question-answer type knowledge base; the target question-answer item comprises question data and answer data;
extracting keywords from the question data and the answer data respectively;
determining whether the same target keyword exists in the keywords extracted from the question data and the keywords extracted from the answer data;
if the same target keywords exist, the target keywords are determined to be the subject keywords of the target question-answer items.
Optionally, the method further comprises:
and adding a classification label to the target question-answer item based on the subject keywords of the target question-answer item.
Optionally, the adding a classification label to the target question-answer item based on the topic keyword of the target question-answer item includes:
if a plurality of topic keywords exist in the target question-answer item, determining a target topic keyword with the largest occurrence frequency in the question data and the answer data, and storing the target topic keyword as a classification label of the target question-answer item into the question-answer type knowledge base;
and if the unique topic keyword exists in the target question-answer item, storing the target topic keyword as a classification label of the target question-answer item into the question-answer type knowledge base.
Optionally, the method further comprises:
and adding the topic keywords to a search keyword set of a search engine interfaced with the question-answer knowledge base.
Optionally, a keyword extraction algorithm used for extracting keywords from the question data and the answer data is a TextRank algorithm or a TF-IDF algorithm.
The specification also provides a device for extracting the topic keywords, which comprises:
the reading module is used for reading target question-answer items from the question-answer knowledge base; the target question-answer item comprises question data and answer data;
the extraction module is used for extracting keywords from the question data and the answer data respectively;
a first determining module, configured to determine whether the same target keyword exists in the keywords extracted from the question data and the keywords extracted from the answer data;
and the second determining module is used for determining the target keywords as the subject keywords of the target question-answer items when the same target keywords exist.
Optionally, the apparatus further comprises:
and the first adding module is used for adding classification labels to the target question-answer items based on the topic keywords of the target question-answer items.
Optionally, the first adding module is specifically configured to:
if a plurality of topic keywords exist in the target question-answer item, determining a target topic keyword with the largest occurrence frequency in the question data and the answer data, and storing the target topic keyword as a classification label of the target question-answer item into the question-answer type knowledge base;
and if the unique topic keyword exists in the target question-answer item, storing the target topic keyword as a classification label of the target question-answer item into the question-answer type knowledge base.
Optionally, the apparatus further comprises:
and the second adding module is used for adding the topic keywords to a search keyword set of a search engine which is in butt joint with the question-answer type knowledge base.
Optionally, a keyword extraction algorithm used for extracting keywords from the question data and the answer data is a TextRank algorithm or a TF-IDF algorithm.
The present specification also proposes an electronic device including:
a processor;
a memory for storing machine-executable instructions;
wherein, by reading and executing the machine-executable instructions stored in the memory corresponding to the control logic for keyword extraction, the processor is caused to:
reading target question-answer items from a question-answer type knowledge base; the target question-answer item comprises question data and answer data;
extracting keywords from the question data and the answer data respectively;
determining whether the same target keyword exists in the keywords extracted from the question data and the keywords extracted from the answer data;
if the same target keywords exist, the target keywords are determined to be the subject keywords of the target question-answer items.
In the above technical solution, for the question-answer type knowledge base, keyword extraction may be performed on the question data and the answer data included in the question-answer entry, and further, the same keyword extracted from the question data as the keyword of the subject of the question-answer entry may be determined. Therefore, on one hand, the topic keywords of each question-answer item can be used for classifying the question-answer items in the question-answer knowledge base, so that the question-answer knowledge base can be conveniently and quickly searched by using the topic keywords. On the other hand, since the subject keywords are the same keywords extracted from the question data as those extracted from the answer data, the main content of the question-answer items can be reflected more accurately, so that the retrieval accuracy for the question-answer type knowledge base can be improved.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with aspects of one or more embodiments of the present description as detailed in the accompanying claims.
The terminology used in the description presented herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the description. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in this specification to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of the present description. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.
The specification aims at providing a technical scheme for determining the same keywords in question data and answer data contained in question and answer items as the subject keywords of the question and answer items aiming at a question and answer type knowledge base.
In particular implementations, the question-answer entries in the question-answer knowledge base may be traversed to read a question-answer entry from the question-answer knowledge base that is not classified. Further, keywords may be extracted from question data contained in the question-answer entry and keywords may be extracted from answer data contained in the question-answer entry.
Subsequently, it is possible to determine whether the same target keyword exists in the keywords extracted from the question data and the keywords extracted from the answer data by comparing the keywords extracted from the question data with the keywords extracted from the answer data.
If the same target keyword exists, the target keyword may be determined as the subject keyword of the question-answer item.
In this way, classification of the question-answer entries in the question-answer type knowledge base according to the subject keywords of each question-answer entry in the question-answer type knowledge base can be further achieved.
In the above technical solution, for the question-answer type knowledge base, keyword extraction may be performed on the question data and the answer data included in the question-answer entry, and further, the same keyword extracted from the question data as the keyword of the subject of the question-answer entry may be determined. Therefore, on one hand, the topic keywords of each question-answer item can be used for classifying the question-answer items in the question-answer knowledge base, so that the question-answer knowledge base can be conveniently and quickly searched by using the topic keywords. On the other hand, since the subject keywords are the same keywords extracted from the question data as those extracted from the answer data, the main content of the question-answer items can be reflected more accurately, so that the retrieval accuracy for the question-answer type knowledge base can be improved.
The present specification is described below by way of specific examples.
Referring to fig. 1, fig. 1 is a schematic diagram of a keyword extraction system according to an exemplary embodiment of the present disclosure.
As shown in fig. 1, the keyword extraction system may include a question-and-answer knowledge base, and an electronic device interfacing with the question-and-answer knowledge base. The electronic device may extract keywords from the question-answer knowledge base, and the electronic device may be a server, a computer, a mobile phone, a tablet device, a notebook computer, a palm computer (PDAs, personal Digital Assistants), or the like, which is not limited in this specification.
In practical applications, the question-and-answer type knowledge base may be a knowledge base for storing question-and-answer type data, and the question-and-answer type data may be stored in the question-and-answer type database in the form of question-and-answer items, and one question-and-answer item may include a question and an answer for answering the question. For example, the question-answer data stored in the question-answer knowledge base may be as shown in table 1 below:
TABLE 1
The answer 1 may be an answer for answering the question 1, and the question 1 and the answer 1 form a question-answer item 1; answer 2 may be an answer for solving question 2, question 2 and answer 2 constituting question-answer item 2; and so on.
Referring to fig. 2, fig. 2 is a flowchart illustrating a method of extracting a topic keyword according to an exemplary embodiment of the present specification. The method can be applied to the electronic device shown in fig. 1, and comprises the following steps:
step 202, reading target question-answer items from a question-answer type knowledge base; the target question-answer item comprises question data and answer data;
step 204, extracting keywords from the question data and the answer data respectively;
step 206, determining whether the same target keyword exists in the keywords extracted from the question data and the keywords extracted from the answer data;
and step 208, if the same target keywords exist, determining the target keywords as the subject keywords of the target question-answer items.
In this embodiment, the electronic device may first read a question-answer item (referred to as a target question-answer item) from a question-answer type knowledge base with which it interfaces. Wherein the target question-answer entry may include question data and answer data.
Taking the question-answer knowledge base shown in table 1 as an example, the electronic device interfacing with the question-answer knowledge base may read, from the question-answer knowledge base, a question-answer item 1 including a question 1 (i.e., question data) and an answer 1 (i.e., answer data) as a target question-answer item, may read, from the question-answer knowledge base, a question-answer item 2 including a question 2 (i.e., question data) and an answer 2 (i.e., answer data) as a target question-answer item, and so on.
After the target question-answer item is read, keywords may be further extracted from question data included in the target question-answer item, and keywords may be extracted from answer data included in the target question-answer item.
In one embodiment shown, keywords may be extracted from the question data based on a preset keyword extraction algorithm. The keyword extraction algorithm may be preset by a technician, and specifically may be a conventional keyword extraction algorithm such as a TextRank algorithm or a TF-IDF (Term Frequency-Inverse Document Frequency, a common weighting technique for information retrieval data mining) algorithm, which is not described herein.
Likewise, keywords may be extracted from the answer data based on a preset keyword extraction algorithm.
In order to ensure consistency of keyword extraction, the keyword extraction algorithm used for question data may be the same as the keyword extraction algorithm used for answer data. However, in practical applications, the keyword extraction algorithm used for question data may be different from the keyword extraction algorithm used for answer data, and this specification is not limited thereto.
After the keywords are extracted from the question data and the answer data, respectively, the keywords extracted from the question data may be further compared with the keywords extracted from the answer data to determine whether the same keywords (referred to as target keywords) exist in the keywords extracted from the question data and the keywords extracted from the answer data.
If it is determined that the same target keyword exists, the target keyword may be determined as a subject keyword of the target question-answer item described above. The topic keywords are keywords which can be used for reflecting the main content of the target question-answer item.
For example, it is assumed that keywords extracted from question data contained in a certain question-answer entry include: the keywords 1, 2 and 3, and the keywords extracted from the answer data contained in the question-answer item include: keyword 2, keyword 3, and keyword 4. In this case, after comparing the keywords extracted from the question data with the keywords extracted from the answer data, it can be determined that the same keywords 2 and 3 exist, that is, both the keywords 2 and 3 can be target keywords. Subsequently, keyword 2 and keyword 3 may be determined as subject keywords of the question-answer item.
In one embodiment, after determining the topic keywords of the target question-answer item, it may be further determined whether the target question-answer item has a plurality of topic keywords, that is, whether the target question-answer item has and only has one topic keyword.
If the target question-answer item has a unique topic keyword, that is, the target question-answer item has only one topic keyword, the topic keyword can be directly stored as a label of the target question-answer item in the question-answer type knowledge base, that is, the classification label is directly added to the target question-answer item in the question-answer type knowledge base by using the topic keyword.
If a plurality of topic keywords exist in the target question-answer item, the number of times that each topic keyword appears in the question data and the answer data contained in the target question-answer item can be counted respectively to determine the topic keyword (called the target topic keyword) with the largest number of times in the question data and the answer data. Subsequently, the target topic keyword can be stored as a label of the target question-answer item in the question-answer type knowledge base, namely, a classification label is added to the target question-answer item in the question-answer type knowledge base by utilizing the target topic keyword.
For example, assume that the determined subject keywords of a question-answer item include: the number of occurrences of the keyword 1 and the keyword 2 in the question data and the answer data included in the question-answer item may be counted, respectively, by the keyword 1 and the keyword 2. If the number of occurrences of the keyword 1 in the question data and the answer data is smaller than the number of occurrences of the keyword 2 in the question data and the answer data, the keyword 2 may be used as a target topic keyword of the question-answer item, and the keyword 2 may be stored as a tag of the question-answer item in a question-answer type knowledge base where the question-answer item is located.
Alternatively, a plurality of topic keywords for the target question-answer item may be output to the user via the user interface. The user may select a topic keyword (referred to as a target topic keyword) from among the topic keywords through the user interface. Subsequently, the target topic keyword can be stored as a label of the target question-answer item in the question-answer type knowledge base, namely, a classification label is added to the target question-answer item in the question-answer type knowledge base by utilizing the target topic keyword.
Referring to fig. 3, fig. 3 is a schematic diagram of a user interface according to an exemplary embodiment of the present disclosure.
As shown in fig. 3, the user interface may be a user interface provided by a customer service system for providing online customer service to a user. The customer service system can be in butt joint with the question-answer type knowledge base.
The user may enter keywords for information desired to be retrieved in a text entry box provided by the user interface. After the keyword input is completed, the user may click on the "send" button in the user interface. When the customer service system detects the clicking operation of the user on the send button, the keyword input by the user at present can be obtained, and further, the question-answer item hit by the keyword is searched in the question-answer type knowledge base which is in butt joint with the customer service system, namely, the label of the question-answer item hit by the keyword comprises the question-answer item of the keyword. Subsequently, the customer service system can display the searched question and answer items to the user for the user to check.
The question-answer knowledge base shown in table 2 below is taken as an example:
TABLE 2
Assuming that the user inputs a keyword 1 in a user interface provided by a customer service system interfacing with the question-answering type knowledge base, the customer service system can display the question-answering item 1 and the question-answering item 2 to the user for the user to view because the labels of the question-answering item 1 and the question-answering item 2 both comprise the keyword 1.
In one embodiment shown, after determining the subject keywords of the target question-answer item, the subject keywords may be further added to a set of search keywords for a search engine that interfaces with the question-answer knowledge base.
Referring to fig. 4, fig. 4 is a schematic diagram of another user interface shown in an exemplary embodiment of the present disclosure.
As shown in fig. 4, the user interface may be a user interface provided by a customer service system for providing online services to users. The customer service system can be in butt joint with the question-answer knowledge base through the search engine.
The customer service system can display the search keyword set of the search engine in the user interface, so that a user can click on a certain keyword displayed in the user interface to acquire information related to the keyword.
For example, the user may click on "keyword 1" in the user interface. When the customer service system detects that the user detects the click operation of the user on the keyword 1, the search engine can search the question-answer items hit by the keyword in the question-answer type knowledge base. Subsequently, the search engine may return the found question and answer items to the customer service system for the customer service system to display the found question and answer items to the user for viewing by the user.
In the above technical solution, for the question-answer type knowledge base, keyword extraction may be performed on the question data and the answer data included in the question-answer entry, and further, the same keyword extracted from the question data as the keyword of the subject of the question-answer entry may be determined. Therefore, on one hand, the topic keywords of each question-answer item can be used for classifying the question-answer items in the question-answer knowledge base, so that the question-answer knowledge base can be conveniently and quickly searched by using the topic keywords. On the other hand, since the subject keywords are the same keywords extracted from the question data as those extracted from the answer data, the main content of the question-answer items can be reflected more accurately, so that the retrieval accuracy for the question-answer type knowledge base can be improved.
Corresponding to the foregoing embodiments of the method for extracting the topic keyword, the present specification also provides embodiments of an apparatus for extracting the topic keyword.
The embodiment of the extraction device of the topic keywords can be applied to electronic equipment. The apparatus embodiments may be implemented by software, or may be implemented by hardware or a combination of hardware and software. Taking software implementation as an example, the device in a logic sense is formed by reading corresponding computer program instructions in a nonvolatile memory into a memory by a processor of an electronic device where the device is located for operation. In terms of hardware, as shown in fig. 5, a hardware structure diagram of an electronic device where the extracting device for the subject keywords of the present disclosure is located is shown in fig. 5, and in addition to the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 5, the electronic device where the device is located in the embodiment generally may further include other hardware according to the actual function of extracting the subject keywords, which is not described herein again.
Referring to fig. 6, fig. 6 is a block diagram of an extraction apparatus of topic keywords shown in an exemplary embodiment of the present specification. The apparatus 60 may be applied to the electronic device shown in fig. 5, including:
a reading module 601, configured to read a target question-answer item from a question-answer knowledge base; the target question-answer item comprises question data and answer data;
an extracting module 602, configured to extract keywords from the question data and the answer data, respectively;
a first determining module 603, configured to determine whether the keyword extracted from the question data and the keyword extracted from the answer data have the same target keyword;
and a second determining module 604, configured to determine, when the same target keyword exists, the target keyword as a subject keyword of the target question-answer item.
In this embodiment, the apparatus 60 may further include:
a first adding module 605 is configured to add a classification label to the target question-answer item based on the topic keyword of the target question-answer item.
In this embodiment, the first adding module 605 may specifically be configured to:
if a plurality of topic keywords exist in the target question-answer item, determining a target topic keyword with the largest occurrence frequency in the question data and the answer data, and storing the target topic keyword as a classification label of the target question-answer item into the question-answer type knowledge base;
and if the unique topic keyword exists in the target question-answer item, storing the target topic keyword as a classification label of the target question-answer item into the question-answer type knowledge base.
In this embodiment, the apparatus 60 may further include:
a second adding module 606 is configured to add the topic keyword to a search keyword set of a search engine that interfaces with the question-and-answer knowledge base.
In this embodiment, the keyword extraction algorithm used to extract keywords from the question data and the answer data is a TextRank algorithm or a TF-IDF algorithm.
The implementation process of the functions and roles of each module in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be described herein again.
For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the modules illustrated as separate components may or may not be physically separate, and the components shown as modules may or may not be physical, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present description. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The system, apparatus or module set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having some function. A typical implementation device is a computer, which may be in the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or a combination of any of these devices.
Corresponding to the embodiment of the method for extracting the topic keywords, the specification also provides an embodiment of the electronic equipment. The electronic device includes: a processor and a memory for storing machine executable instructions; wherein the processor and the memory are typically interconnected by an internal bus. In other possible implementations, the device may also include an external interface to enable communication with other devices or components.
In this embodiment, the processor is caused to, by reading and executing the machine-executable instructions stored in the memory corresponding to the control logic for keyword extraction:
reading target question-answer items from a question-answer type knowledge base; the target question-answer item comprises question data and answer data;
extracting keywords from the question data and the answer data respectively;
determining whether the same target keyword exists in the keywords extracted from the question data and the keywords extracted from the answer data;
if the same target keywords exist, the target keywords are determined to be the subject keywords of the target question-answer items.
In this embodiment, the processor is further caused to, by reading and executing the machine-executable instructions stored in the memory corresponding to the control logic for keyword extraction:
and adding a classification label to the target question-answer item based on the subject keywords of the target question-answer item.
In this embodiment, the processor is caused to, by reading and executing the machine-executable instructions stored in the memory corresponding to the control logic for keyword extraction:
if a plurality of topic keywords exist in the target question-answer item, determining a target topic keyword with the largest occurrence frequency in the question data and the answer data, and storing the target topic keyword as a classification label of the target question-answer item into the question-answer type knowledge base;
and if the unique topic keyword exists in the target question-answer item, storing the target topic keyword as a classification label of the target question-answer item into the question-answer type knowledge base.
In this embodiment, the processor is further caused to, by reading and executing the machine-executable instructions stored in the memory corresponding to the control logic for keyword extraction:
and adding the topic keywords to a search keyword set of a search engine interfaced with the question-answer knowledge base.
In this embodiment, the keyword extraction algorithm used to extract keywords from the question data and the answer data is a TextRank algorithm or a TF-IDF algorithm.
Other embodiments of the present disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This specification is intended to cover any variations, uses, or adaptations of the specification following, in general, the principles of the specification and including such departures from the present disclosure as come within known or customary practice within the art to which the specification pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the specification being indicated by the following claims.
It is to be understood that the present description is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present description is limited only by the appended claims.
The foregoing description of the preferred embodiment(s) is (are) merely intended to illustrate the embodiment(s) of the present invention, and it is not intended to limit the embodiment(s) of the present invention to the particular embodiment(s) described.