CN110263137B - Theme keyword extraction method and device and electronic equipment - Google Patents

Theme keyword extraction method and device and electronic equipment Download PDF

Info

Publication number
CN110263137B
CN110263137B CN201910468420.5A CN201910468420A CN110263137B CN 110263137 B CN110263137 B CN 110263137B CN 201910468420 A CN201910468420 A CN 201910468420A CN 110263137 B CN110263137 B CN 110263137B
Authority
CN
China
Prior art keywords
question
answer
target
keywords
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910468420.5A
Other languages
Chinese (zh)
Other versions
CN110263137A (en
Inventor
谷银波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Advanced New Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Advanced New Technologies Co Ltd filed Critical Advanced New Technologies Co Ltd
Priority to CN201910468420.5A priority Critical patent/CN110263137B/en
Publication of CN110263137A publication Critical patent/CN110263137A/en
Application granted granted Critical
Publication of CN110263137B publication Critical patent/CN110263137B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

One or more embodiments of the present disclosure provide a method and apparatus for extracting a topic keyword, and an electronic device, where the method includes: reading target question-answer items from a question-answer type knowledge base; the target question-answer item comprises question data and answer data; extracting keywords from the question data and the answer data respectively; determining whether the same target keyword exists in the keywords extracted from the question data and the keywords extracted from the answer data; if the same target keywords exist, the target keywords are determined to be the subject keywords of the target question-answer items.

Description

Theme keyword extraction method and device and electronic equipment
Technical Field
One or more embodiments of the present disclosure relate to the field of computer application technologies, and in particular, to a method and an apparatus for extracting a topic keyword, and an electronic device.
Background
In many technical fields, some common questions and corresponding answers (Frequently Asked Questions, FAQ) are recorded, so that the answers can be found quickly when the same questions are encountered later. As more questions and answers are recorded, they are typically entered into a database to form a knowledge base. As the amount of data in the knowledge base increases, it is often necessary to sort the entries in the knowledge base to facilitate quick retrieval of the knowledge base.
Disclosure of Invention
The specification provides a method for extracting topic keywords, which comprises the following steps:
reading target question-answer items from a question-answer type knowledge base; the target question-answer item comprises question data and answer data;
extracting keywords from the question data and the answer data respectively;
determining whether the same target keyword exists in the keywords extracted from the question data and the keywords extracted from the answer data;
if the same target keywords exist, the target keywords are determined to be the subject keywords of the target question-answer items.
Optionally, the method further comprises:
and adding a classification label to the target question-answer item based on the subject keywords of the target question-answer item.
Optionally, the adding a classification label to the target question-answer item based on the topic keyword of the target question-answer item includes:
if a plurality of topic keywords exist in the target question-answer item, determining a target topic keyword with the largest occurrence frequency in the question data and the answer data, and storing the target topic keyword as a classification label of the target question-answer item into the question-answer type knowledge base;
and if the unique topic keyword exists in the target question-answer item, storing the target topic keyword as a classification label of the target question-answer item into the question-answer type knowledge base.
Optionally, the method further comprises:
and adding the topic keywords to a search keyword set of a search engine interfaced with the question-answer knowledge base.
Optionally, a keyword extraction algorithm used for extracting keywords from the question data and the answer data is a TextRank algorithm or a TF-IDF algorithm.
The specification also provides a device for extracting the topic keywords, which comprises:
the reading module is used for reading target question-answer items from the question-answer knowledge base; the target question-answer item comprises question data and answer data;
the extraction module is used for extracting keywords from the question data and the answer data respectively;
a first determining module, configured to determine whether the same target keyword exists in the keywords extracted from the question data and the keywords extracted from the answer data;
and the second determining module is used for determining the target keywords as the subject keywords of the target question-answer items when the same target keywords exist.
Optionally, the apparatus further comprises:
and the first adding module is used for adding classification labels to the target question-answer items based on the topic keywords of the target question-answer items.
Optionally, the first adding module is specifically configured to:
if a plurality of topic keywords exist in the target question-answer item, determining a target topic keyword with the largest occurrence frequency in the question data and the answer data, and storing the target topic keyword as a classification label of the target question-answer item into the question-answer type knowledge base;
and if the unique topic keyword exists in the target question-answer item, storing the target topic keyword as a classification label of the target question-answer item into the question-answer type knowledge base.
Optionally, the apparatus further comprises:
and the second adding module is used for adding the topic keywords to a search keyword set of a search engine which is in butt joint with the question-answer type knowledge base.
Optionally, a keyword extraction algorithm used for extracting keywords from the question data and the answer data is a TextRank algorithm or a TF-IDF algorithm.
The present specification also proposes an electronic device including:
a processor;
a memory for storing machine-executable instructions;
wherein, by reading and executing the machine-executable instructions stored in the memory corresponding to the control logic for keyword extraction, the processor is caused to:
reading target question-answer items from a question-answer type knowledge base; the target question-answer item comprises question data and answer data;
extracting keywords from the question data and the answer data respectively;
determining whether the same target keyword exists in the keywords extracted from the question data and the keywords extracted from the answer data;
if the same target keywords exist, the target keywords are determined to be the subject keywords of the target question-answer items.
In the above technical solution, for the question-answer type knowledge base, keyword extraction may be performed on the question data and the answer data included in the question-answer entry, and further, the same keyword extracted from the question data as the keyword of the subject of the question-answer entry may be determined. Therefore, on one hand, the topic keywords of each question-answer item can be used for classifying the question-answer items in the question-answer knowledge base, so that the question-answer knowledge base can be conveniently and quickly searched by using the topic keywords. On the other hand, since the subject keywords are the same keywords extracted from the question data as those extracted from the answer data, the main content of the question-answer items can be reflected more accurately, so that the retrieval accuracy for the question-answer type knowledge base can be improved.
Drawings
FIG. 1 is a schematic diagram of a topic keyword extraction system according to an exemplary embodiment of the present disclosure;
FIG. 2 is a flowchart of a method for extracting topic keywords as illustrated in an exemplary embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a user interface shown in an exemplary embodiment of the present description;
FIG. 4 is a schematic diagram of another user interface shown in an exemplary embodiment of the present description;
fig. 5 is a hardware configuration diagram of an electronic device where an extracting device for a topic keyword is located, which is shown in an exemplary embodiment of the present disclosure;
fig. 6 is a block diagram of an extraction apparatus of topic keywords shown in an exemplary embodiment of the present specification.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with aspects of one or more embodiments of the present description as detailed in the accompanying claims.
The terminology used in the description presented herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the description. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in this specification to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of the present description. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.
The specification aims at providing a technical scheme for determining the same keywords in question data and answer data contained in question and answer items as the subject keywords of the question and answer items aiming at a question and answer type knowledge base.
In particular implementations, the question-answer entries in the question-answer knowledge base may be traversed to read a question-answer entry from the question-answer knowledge base that is not classified. Further, keywords may be extracted from question data contained in the question-answer entry and keywords may be extracted from answer data contained in the question-answer entry.
Subsequently, it is possible to determine whether the same target keyword exists in the keywords extracted from the question data and the keywords extracted from the answer data by comparing the keywords extracted from the question data with the keywords extracted from the answer data.
If the same target keyword exists, the target keyword may be determined as the subject keyword of the question-answer item.
In this way, classification of the question-answer entries in the question-answer type knowledge base according to the subject keywords of each question-answer entry in the question-answer type knowledge base can be further achieved.
In the above technical solution, for the question-answer type knowledge base, keyword extraction may be performed on the question data and the answer data included in the question-answer entry, and further, the same keyword extracted from the question data as the keyword of the subject of the question-answer entry may be determined. Therefore, on one hand, the topic keywords of each question-answer item can be used for classifying the question-answer items in the question-answer knowledge base, so that the question-answer knowledge base can be conveniently and quickly searched by using the topic keywords. On the other hand, since the subject keywords are the same keywords extracted from the question data as those extracted from the answer data, the main content of the question-answer items can be reflected more accurately, so that the retrieval accuracy for the question-answer type knowledge base can be improved.
The present specification is described below by way of specific examples.
Referring to fig. 1, fig. 1 is a schematic diagram of a keyword extraction system according to an exemplary embodiment of the present disclosure.
As shown in fig. 1, the keyword extraction system may include a question-and-answer knowledge base, and an electronic device interfacing with the question-and-answer knowledge base. The electronic device may extract keywords from the question-answer knowledge base, and the electronic device may be a server, a computer, a mobile phone, a tablet device, a notebook computer, a palm computer (PDAs, personal Digital Assistants), or the like, which is not limited in this specification.
In practical applications, the question-and-answer type knowledge base may be a knowledge base for storing question-and-answer type data, and the question-and-answer type data may be stored in the question-and-answer type database in the form of question-and-answer items, and one question-and-answer item may include a question and an answer for answering the question. For example, the question-answer data stored in the question-answer knowledge base may be as shown in table 1 below:
Figure BDA0002080091150000061
TABLE 1
The answer 1 may be an answer for answering the question 1, and the question 1 and the answer 1 form a question-answer item 1; answer 2 may be an answer for solving question 2, question 2 and answer 2 constituting question-answer item 2; and so on.
Referring to fig. 2, fig. 2 is a flowchart illustrating a method of extracting a topic keyword according to an exemplary embodiment of the present specification. The method can be applied to the electronic device shown in fig. 1, and comprises the following steps:
step 202, reading target question-answer items from a question-answer type knowledge base; the target question-answer item comprises question data and answer data;
step 204, extracting keywords from the question data and the answer data respectively;
step 206, determining whether the same target keyword exists in the keywords extracted from the question data and the keywords extracted from the answer data;
and step 208, if the same target keywords exist, determining the target keywords as the subject keywords of the target question-answer items.
In this embodiment, the electronic device may first read a question-answer item (referred to as a target question-answer item) from a question-answer type knowledge base with which it interfaces. Wherein the target question-answer entry may include question data and answer data.
Taking the question-answer knowledge base shown in table 1 as an example, the electronic device interfacing with the question-answer knowledge base may read, from the question-answer knowledge base, a question-answer item 1 including a question 1 (i.e., question data) and an answer 1 (i.e., answer data) as a target question-answer item, may read, from the question-answer knowledge base, a question-answer item 2 including a question 2 (i.e., question data) and an answer 2 (i.e., answer data) as a target question-answer item, and so on.
After the target question-answer item is read, keywords may be further extracted from question data included in the target question-answer item, and keywords may be extracted from answer data included in the target question-answer item.
In one embodiment shown, keywords may be extracted from the question data based on a preset keyword extraction algorithm. The keyword extraction algorithm may be preset by a technician, and specifically may be a conventional keyword extraction algorithm such as a TextRank algorithm or a TF-IDF (Term Frequency-Inverse Document Frequency, a common weighting technique for information retrieval data mining) algorithm, which is not described herein.
Likewise, keywords may be extracted from the answer data based on a preset keyword extraction algorithm.
In order to ensure consistency of keyword extraction, the keyword extraction algorithm used for question data may be the same as the keyword extraction algorithm used for answer data. However, in practical applications, the keyword extraction algorithm used for question data may be different from the keyword extraction algorithm used for answer data, and this specification is not limited thereto.
After the keywords are extracted from the question data and the answer data, respectively, the keywords extracted from the question data may be further compared with the keywords extracted from the answer data to determine whether the same keywords (referred to as target keywords) exist in the keywords extracted from the question data and the keywords extracted from the answer data.
If it is determined that the same target keyword exists, the target keyword may be determined as a subject keyword of the target question-answer item described above. The topic keywords are keywords which can be used for reflecting the main content of the target question-answer item.
For example, it is assumed that keywords extracted from question data contained in a certain question-answer entry include: the keywords 1, 2 and 3, and the keywords extracted from the answer data contained in the question-answer item include: keyword 2, keyword 3, and keyword 4. In this case, after comparing the keywords extracted from the question data with the keywords extracted from the answer data, it can be determined that the same keywords 2 and 3 exist, that is, both the keywords 2 and 3 can be target keywords. Subsequently, keyword 2 and keyword 3 may be determined as subject keywords of the question-answer item.
In one embodiment, after determining the topic keywords of the target question-answer item, it may be further determined whether the target question-answer item has a plurality of topic keywords, that is, whether the target question-answer item has and only has one topic keyword.
If the target question-answer item has a unique topic keyword, that is, the target question-answer item has only one topic keyword, the topic keyword can be directly stored as a label of the target question-answer item in the question-answer type knowledge base, that is, the classification label is directly added to the target question-answer item in the question-answer type knowledge base by using the topic keyword.
If a plurality of topic keywords exist in the target question-answer item, the number of times that each topic keyword appears in the question data and the answer data contained in the target question-answer item can be counted respectively to determine the topic keyword (called the target topic keyword) with the largest number of times in the question data and the answer data. Subsequently, the target topic keyword can be stored as a label of the target question-answer item in the question-answer type knowledge base, namely, a classification label is added to the target question-answer item in the question-answer type knowledge base by utilizing the target topic keyword.
For example, assume that the determined subject keywords of a question-answer item include: the number of occurrences of the keyword 1 and the keyword 2 in the question data and the answer data included in the question-answer item may be counted, respectively, by the keyword 1 and the keyword 2. If the number of occurrences of the keyword 1 in the question data and the answer data is smaller than the number of occurrences of the keyword 2 in the question data and the answer data, the keyword 2 may be used as a target topic keyword of the question-answer item, and the keyword 2 may be stored as a tag of the question-answer item in a question-answer type knowledge base where the question-answer item is located.
Alternatively, a plurality of topic keywords for the target question-answer item may be output to the user via the user interface. The user may select a topic keyword (referred to as a target topic keyword) from among the topic keywords through the user interface. Subsequently, the target topic keyword can be stored as a label of the target question-answer item in the question-answer type knowledge base, namely, a classification label is added to the target question-answer item in the question-answer type knowledge base by utilizing the target topic keyword.
Referring to fig. 3, fig. 3 is a schematic diagram of a user interface according to an exemplary embodiment of the present disclosure.
As shown in fig. 3, the user interface may be a user interface provided by a customer service system for providing online customer service to a user. The customer service system can be in butt joint with the question-answer type knowledge base.
The user may enter keywords for information desired to be retrieved in a text entry box provided by the user interface. After the keyword input is completed, the user may click on the "send" button in the user interface. When the customer service system detects the clicking operation of the user on the send button, the keyword input by the user at present can be obtained, and further, the question-answer item hit by the keyword is searched in the question-answer type knowledge base which is in butt joint with the customer service system, namely, the label of the question-answer item hit by the keyword comprises the question-answer item of the keyword. Subsequently, the customer service system can display the searched question and answer items to the user for the user to check.
The question-answer knowledge base shown in table 2 below is taken as an example:
Figure BDA0002080091150000091
TABLE 2
Assuming that the user inputs a keyword 1 in a user interface provided by a customer service system interfacing with the question-answering type knowledge base, the customer service system can display the question-answering item 1 and the question-answering item 2 to the user for the user to view because the labels of the question-answering item 1 and the question-answering item 2 both comprise the keyword 1.
In one embodiment shown, after determining the subject keywords of the target question-answer item, the subject keywords may be further added to a set of search keywords for a search engine that interfaces with the question-answer knowledge base.
Referring to fig. 4, fig. 4 is a schematic diagram of another user interface shown in an exemplary embodiment of the present disclosure.
As shown in fig. 4, the user interface may be a user interface provided by a customer service system for providing online services to users. The customer service system can be in butt joint with the question-answer knowledge base through the search engine.
The customer service system can display the search keyword set of the search engine in the user interface, so that a user can click on a certain keyword displayed in the user interface to acquire information related to the keyword.
For example, the user may click on "keyword 1" in the user interface. When the customer service system detects that the user detects the click operation of the user on the keyword 1, the search engine can search the question-answer items hit by the keyword in the question-answer type knowledge base. Subsequently, the search engine may return the found question and answer items to the customer service system for the customer service system to display the found question and answer items to the user for viewing by the user.
In the above technical solution, for the question-answer type knowledge base, keyword extraction may be performed on the question data and the answer data included in the question-answer entry, and further, the same keyword extracted from the question data as the keyword of the subject of the question-answer entry may be determined. Therefore, on one hand, the topic keywords of each question-answer item can be used for classifying the question-answer items in the question-answer knowledge base, so that the question-answer knowledge base can be conveniently and quickly searched by using the topic keywords. On the other hand, since the subject keywords are the same keywords extracted from the question data as those extracted from the answer data, the main content of the question-answer items can be reflected more accurately, so that the retrieval accuracy for the question-answer type knowledge base can be improved.
Corresponding to the foregoing embodiments of the method for extracting the topic keyword, the present specification also provides embodiments of an apparatus for extracting the topic keyword.
The embodiment of the extraction device of the topic keywords can be applied to electronic equipment. The apparatus embodiments may be implemented by software, or may be implemented by hardware or a combination of hardware and software. Taking software implementation as an example, the device in a logic sense is formed by reading corresponding computer program instructions in a nonvolatile memory into a memory by a processor of an electronic device where the device is located for operation. In terms of hardware, as shown in fig. 5, a hardware structure diagram of an electronic device where the extracting device for the subject keywords of the present disclosure is located is shown in fig. 5, and in addition to the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 5, the electronic device where the device is located in the embodiment generally may further include other hardware according to the actual function of extracting the subject keywords, which is not described herein again.
Referring to fig. 6, fig. 6 is a block diagram of an extraction apparatus of topic keywords shown in an exemplary embodiment of the present specification. The apparatus 60 may be applied to the electronic device shown in fig. 5, including:
a reading module 601, configured to read a target question-answer item from a question-answer knowledge base; the target question-answer item comprises question data and answer data;
an extracting module 602, configured to extract keywords from the question data and the answer data, respectively;
a first determining module 603, configured to determine whether the keyword extracted from the question data and the keyword extracted from the answer data have the same target keyword;
and a second determining module 604, configured to determine, when the same target keyword exists, the target keyword as a subject keyword of the target question-answer item.
In this embodiment, the apparatus 60 may further include:
a first adding module 605 is configured to add a classification label to the target question-answer item based on the topic keyword of the target question-answer item.
In this embodiment, the first adding module 605 may specifically be configured to:
if a plurality of topic keywords exist in the target question-answer item, determining a target topic keyword with the largest occurrence frequency in the question data and the answer data, and storing the target topic keyword as a classification label of the target question-answer item into the question-answer type knowledge base;
and if the unique topic keyword exists in the target question-answer item, storing the target topic keyword as a classification label of the target question-answer item into the question-answer type knowledge base.
In this embodiment, the apparatus 60 may further include:
a second adding module 606 is configured to add the topic keyword to a search keyword set of a search engine that interfaces with the question-and-answer knowledge base.
In this embodiment, the keyword extraction algorithm used to extract keywords from the question data and the answer data is a TextRank algorithm or a TF-IDF algorithm.
The implementation process of the functions and roles of each module in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be described herein again.
For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the modules illustrated as separate components may or may not be physically separate, and the components shown as modules may or may not be physical, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present description. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The system, apparatus or module set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having some function. A typical implementation device is a computer, which may be in the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or a combination of any of these devices.
Corresponding to the embodiment of the method for extracting the topic keywords, the specification also provides an embodiment of the electronic equipment. The electronic device includes: a processor and a memory for storing machine executable instructions; wherein the processor and the memory are typically interconnected by an internal bus. In other possible implementations, the device may also include an external interface to enable communication with other devices or components.
In this embodiment, the processor is caused to, by reading and executing the machine-executable instructions stored in the memory corresponding to the control logic for keyword extraction:
reading target question-answer items from a question-answer type knowledge base; the target question-answer item comprises question data and answer data;
extracting keywords from the question data and the answer data respectively;
determining whether the same target keyword exists in the keywords extracted from the question data and the keywords extracted from the answer data;
if the same target keywords exist, the target keywords are determined to be the subject keywords of the target question-answer items.
In this embodiment, the processor is further caused to, by reading and executing the machine-executable instructions stored in the memory corresponding to the control logic for keyword extraction:
and adding a classification label to the target question-answer item based on the subject keywords of the target question-answer item.
In this embodiment, the processor is caused to, by reading and executing the machine-executable instructions stored in the memory corresponding to the control logic for keyword extraction:
if a plurality of topic keywords exist in the target question-answer item, determining a target topic keyword with the largest occurrence frequency in the question data and the answer data, and storing the target topic keyword as a classification label of the target question-answer item into the question-answer type knowledge base;
and if the unique topic keyword exists in the target question-answer item, storing the target topic keyword as a classification label of the target question-answer item into the question-answer type knowledge base.
In this embodiment, the processor is further caused to, by reading and executing the machine-executable instructions stored in the memory corresponding to the control logic for keyword extraction:
and adding the topic keywords to a search keyword set of a search engine interfaced with the question-answer knowledge base.
In this embodiment, the keyword extraction algorithm used to extract keywords from the question data and the answer data is a TextRank algorithm or a TF-IDF algorithm.
Other embodiments of the present disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This specification is intended to cover any variations, uses, or adaptations of the specification following, in general, the principles of the specification and including such departures from the present disclosure as come within known or customary practice within the art to which the specification pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the specification being indicated by the following claims.
It is to be understood that the present description is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present description is limited only by the appended claims.
The foregoing description of the preferred embodiment(s) is (are) merely intended to illustrate the embodiment(s) of the present invention, and it is not intended to limit the embodiment(s) of the present invention to the particular embodiment(s) described.

Claims (7)

1. A method for extracting a topic keyword, the method comprising:
reading target question-answer items from a question-answer type knowledge base; the target question-answer item comprises question data and answer data;
extracting keywords from the question data and the answer data respectively;
determining whether the same target keyword exists in the keywords extracted from the question data and the keywords extracted from the answer data;
if the same target keywords exist, determining the target keywords as the subject keywords of the target question-answer items;
the method further comprises the steps of:
if a plurality of topic keywords exist in the target question-answer item, determining a target topic keyword with the largest occurrence frequency in the question data and the answer data, and storing the target topic keyword as a classification label of the target question-answer item into the question-answer type knowledge base;
and if the unique topic keyword exists in the target question-answer item, storing the target topic keyword as a classification label of the target question-answer item into the question-answer type knowledge base.
2. The method of claim 1, the method further comprising:
and adding the topic keywords to a search keyword set of a search engine interfaced with the question-answer knowledge base.
3. The method of claim 1, wherein the keyword extraction algorithm used to extract keywords from the question data and the answer data is a TextRank algorithm or a TF-IDF algorithm.
4. An extraction apparatus of topic keywords, the apparatus comprising:
the reading module is used for reading target question-answer items from the question-answer knowledge base; the target question-answer item comprises question data and answer data;
the extraction module is used for extracting keywords from the question data and the answer data respectively;
a first determining module, configured to determine whether the same target keyword exists in the keywords extracted from the question data and the keywords extracted from the answer data;
the second determining module is used for determining the target keywords as the subject keywords of the target question-answer items when the same target keywords exist;
the device further comprises a first adding module, specifically configured to:
if a plurality of topic keywords exist in the target question-answer item, determining a target topic keyword with the largest occurrence frequency in the question data and the answer data, and storing the target topic keyword as a classification label of the target question-answer item into the question-answer type knowledge base;
and if the unique topic keyword exists in the target question-answer item, storing the target topic keyword as a classification label of the target question-answer item into the question-answer type knowledge base.
5. The apparatus of claim 4, the apparatus further comprising:
and the second adding module is used for adding the topic keywords to a search keyword set of a search engine which is in butt joint with the question-answer type knowledge base.
6. The apparatus of claim 4, wherein the keyword extraction algorithm used to extract keywords from the question data and the answer data is a TextRank algorithm or a TF-IDF algorithm.
7. An electronic device, the electronic device comprising:
a processor;
a memory for storing machine-executable instructions;
wherein, by reading and executing the machine-executable instructions stored in the memory corresponding to the control logic for keyword extraction, the processor is caused to:
reading target question-answer items from a question-answer type knowledge base; the target question-answer item comprises question data and answer data;
extracting keywords from the question data and the answer data respectively;
determining whether the same target keyword exists in the keywords extracted from the question data and the keywords extracted from the answer data;
if the same target keywords exist, determining the target keywords as the subject keywords of the target question-answer items;
if a plurality of topic keywords exist in the target question-answer item, determining a target topic keyword with the largest occurrence frequency in the question data and the answer data, and storing the target topic keyword as a classification label of the target question-answer item into the question-answer type knowledge base;
and if the unique topic keyword exists in the target question-answer item, storing the target topic keyword as a classification label of the target question-answer item into the question-answer type knowledge base.
CN201910468420.5A 2019-05-31 2019-05-31 Theme keyword extraction method and device and electronic equipment Active CN110263137B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910468420.5A CN110263137B (en) 2019-05-31 2019-05-31 Theme keyword extraction method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910468420.5A CN110263137B (en) 2019-05-31 2019-05-31 Theme keyword extraction method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN110263137A CN110263137A (en) 2019-09-20
CN110263137B true CN110263137B (en) 2023-06-06

Family

ID=67916218

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910468420.5A Active CN110263137B (en) 2019-05-31 2019-05-31 Theme keyword extraction method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN110263137B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114462384B (en) * 2022-04-12 2022-07-12 北京大学 Metadata automatic generation device for digital object modeling

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902652A (en) * 2014-02-27 2014-07-02 深圳市智搜信息技术有限公司 Automatic question-answering system
CN105528437A (en) * 2015-12-17 2016-04-27 浙江大学 Question-answering system construction method based on structured text knowledge extraction
WO2016101727A1 (en) * 2014-12-23 2016-06-30 北京奇虎科技有限公司 Question-and-answer-based search result adjustment method and device
CN107993724A (en) * 2017-11-09 2018-05-04 易保互联医疗信息科技(北京)有限公司 A kind of method and device of medicine intelligent answer data processing

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10817790B2 (en) * 2016-05-11 2020-10-27 International Business Machines Corporation Automated distractor generation by identifying relationships between reference keywords and concepts

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902652A (en) * 2014-02-27 2014-07-02 深圳市智搜信息技术有限公司 Automatic question-answering system
WO2016101727A1 (en) * 2014-12-23 2016-06-30 北京奇虎科技有限公司 Question-and-answer-based search result adjustment method and device
CN105528437A (en) * 2015-12-17 2016-04-27 浙江大学 Question-answering system construction method based on structured text knowledge extraction
CN107993724A (en) * 2017-11-09 2018-05-04 易保互联医疗信息科技(北京)有限公司 A kind of method and device of medicine intelligent answer data processing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于自动生成知识库的智能问答系统设计;王飞鸿;《中国科技信息》;20180615(第12期);全文 *

Also Published As

Publication number Publication date
CN110263137A (en) 2019-09-20

Similar Documents

Publication Publication Date Title
US10210243B2 (en) Method and system for enhanced query term suggestion
US7685200B2 (en) Ranking and suggesting candidate objects
CN107122400B (en) Method, computing system and storage medium for refining query results using visual cues
US20090077065A1 (en) Method and system for information searching based on user interest awareness
CN105874427B (en) Help information is identified based on application context
CN109063108B (en) Search ranking method and device, computer equipment and storage medium
CN107958042B (en) Target topic pushing method and mobile terminal
CN111782947B (en) Search content display method and device, electronic equipment and storage medium
CN106095738B (en) Recommending form fragments
US20210117834A1 (en) Method and device for providing notes by using artificial intelligence-based correlation calculation
CN110968789B (en) Electronic book pushing method, electronic equipment and computer storage medium
US20230205755A1 (en) Methods and systems for improved search for data loss prevention
CN108536786A (en) A kind of information recommendation method, device, server and storage medium
CN111737443B (en) Answer text processing method and device and key text determining method
CN113849748A (en) Information display method and device, electronic equipment and readable storage medium
US20210042363A1 (en) Search pattern suggestions for large datasets
CN108563713B (en) Keyword rule generation method and device and electronic equipment
CN110263137B (en) Theme keyword extraction method and device and electronic equipment
CN108595498B (en) Question feedback method and device
US10282482B2 (en) Data provision device, data provision method, and data provision program
US20180276294A1 (en) Information processing apparatus, information processing system, and information processing method
CN116644102A (en) Intelligent investment object selection method, system terminal and computer readable storage medium
US11720626B1 (en) Image keywords
CN111078972B (en) Questioning behavior data acquisition method, questioning behavior data acquisition device and server
CN112051951A (en) Media content display method, and media content display determination method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20201009

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20201009

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

GR01 Patent grant
GR01 Patent grant