CN115495560A - Question-answer type interaction method and device and electronic equipment - Google Patents

Question-answer type interaction method and device and electronic equipment Download PDF

Info

Publication number
CN115495560A
CN115495560A CN202110682887.7A CN202110682887A CN115495560A CN 115495560 A CN115495560 A CN 115495560A CN 202110682887 A CN202110682887 A CN 202110682887A CN 115495560 A CN115495560 A CN 115495560A
Authority
CN
China
Prior art keywords
question
entity
processed
data
entities
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110682887.7A
Other languages
Chinese (zh)
Inventor
杨杰
周启贤
岳文应
王可泽
陈添水
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
DMAI Guangzhou Co Ltd
Original Assignee
DMAI Guangzhou Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by DMAI Guangzhou Co Ltd filed Critical DMAI Guangzhou Co Ltd
Priority to CN202110682887.7A priority Critical patent/CN115495560A/en
Publication of CN115495560A publication Critical patent/CN115495560A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of natural language processing, in particular to a question-answer type interaction method, a question-answer type interaction device and electronic equipment, wherein the interaction method comprises the steps of obtaining a question to be processed and obtaining a preset knowledge map, and the preset knowledge map is constructed on the basis of the incidence relations of different categories of corpus data; performing entity analysis on the question to be processed, and determining a retrieval sentence corresponding to the question to be processed; and searching in a preset knowledge graph based on the search statement, and determining the answer of the question to be processed. Because the preset knowledge graph is constructed according to the incidence relation of the corpus data of different categories, the preset knowledge graph has knowledge graphs of multiple categories, and retrieval sentences obtained by utilizing the entity analysis of the sentences to be processed are retrieved in the preset knowledge graph, so that accurate answers can be retrieved.

Description

Question-answer type interaction method and device and electronic equipment
Technical Field
The invention relates to the technical field of natural language processing, in particular to a question-answer type interaction method, a question-answer type interaction device and electronic equipment.
Background
Automatic question answering is a technology for processing question texts, and corresponding answers are returned by analyzing questions of a user. However, due to the broad sense of the sentences, the query methods of the sentences are varied, and the answer reply contents are also various. In recent years, in order to reduce labor cost and improve user interactivity and user experience, more and more automatic question-answering systems are available in the market, so that labor cost and user waiting time are greatly reduced.
In the prior art, the knowledge graph is generally constructed aiming at different fields, and the mode can cause that the effective accurate answer can not be given to the knowledge which can not be covered by the knowledge graph. For example, for a music knowledge-graph question-and-answer system, if asking: what is the weather today? The question-answering system returns the answer: i don't know it.
Disclosure of Invention
In view of this, embodiments of the present invention provide a question-answering type interaction method, device and electronic device, so as to solve the problem that an existing automatic question-answering system is difficult to provide an accurate answer.
According to a first aspect, an embodiment of the present invention provides a question-answering interaction method, including:
the method comprises the steps of obtaining a question to be processed and obtaining a preset knowledge graph, wherein the preset knowledge graph is constructed based on the incidence relation of different types of corpus data;
performing entity analysis on the question to be processed, and determining a retrieval sentence corresponding to the question to be processed;
and searching in a preset knowledge graph based on the search statement, and determining the answer of the question to be processed.
According to the question-answer interaction method provided by the embodiment of the invention, the preset knowledge graph is constructed according to the incidence relation of different types of corpus data, so that the preset knowledge graph has knowledge graphs of multiple types, and the retrieval sentences obtained by the entity analysis of the sentences to be processed are retrieved in the preset knowledge graph, so that accurate answers can be retrieved.
With reference to the first aspect, in a first implementation manner of the first aspect, the performing entity analysis on the question to be processed to determine a search statement corresponding to the question to be processed includes:
acquiring an entity naming set corresponding to the preset knowledge graph;
matching in the question to be processed based on each entity name in the entity name set, and determining entity data in the question to be processed;
determining the category of a target question to which the question to be processed belongs by using the entity data in the question to be processed;
and forming a retrieval sentence corresponding to the question to be processed based on the target question category and the entity data in the question to be processed.
The question-answer interaction method provided by the embodiment of the invention determines the entity data in the question to be processed in a name matching mode, and can ensure the reliability of the determination of the entity data; meanwhile, the question category corresponding to the question to be processed is determined so as to realize accurate retrieval.
With reference to the first implementation manner of the first aspect, in a second implementation manner of the first aspect, the determining entity data in the question to be processed based on matching of each entity name in the entity name set in the question to be processed includes:
matching in the question to be processed by utilizing each entity name in the entity naming set to obtain at least one matched entity;
judging whether a similar matching entity exists in the at least one matching entity;
and when the similar matching entity exists in the at least one matching entity, screening the similar matching entity to determine entity data in the question to be processed.
According to the question-answer interaction method provided by the embodiment of the invention, when similar matching entities exist in the sentences to be processed, the similar matching entities are screened, so that the subsequent data retrieval processing amount can be reduced, and the retrieval efficiency and accuracy are improved.
With reference to the second implementation manner of the first aspect, in a third implementation manner of the first aspect, the screening the similar matching entities to determine entity data in the question to be processed includes:
reserving the entity with the longest character length in the similar matching entities, and deleting other similar matching entities;
and extracting the entity corresponding category with the longest character length to form entity data in the question to be processed, wherein the entity data comprises an entity name and an entity category.
With reference to the first implementation manner of the first aspect, in a fourth implementation manner of the first aspect, the forming a search statement corresponding to the question to be processed based on the target question category and the entity data in the question to be processed includes:
determining a target query statement template corresponding to the target question category by using the corresponding relation between the question category and the query statement template;
and filling the target query statement template based on the entity data to form a retrieval statement corresponding to the question to be processed.
According to the question-answer interaction method provided by the embodiment of the invention, the target query statement template corresponding to the statement to be processed can be determined through the corresponding relation between the question category and the query statement template, so that the answer corresponding to the statement to be processed can be accurately and efficiently retrieved.
With reference to the first aspect, in a fifth implementation manner of the first aspect, the acquiring a preset knowledge-map includes:
obtaining corpus data of various categories;
and determining the preset knowledge graph based on the category of the corpus data and the association relationship between the corpus data.
According to the question-answer interaction method provided by the embodiment of the invention, the preset knowledge graph is constructed by utilizing the association among the corpus data of different categories, so that the preset knowledge graph can realize the query of the question sentences of different categories to obtain the answers of the question sentences of different categories, and the application scene of the method is expanded.
With reference to the first aspect or any one of the first to fifth embodiments of the first aspect, in a sixth embodiment of the first aspect, the determining the preset knowledge graph based on the category of the corpus data and the association relationship between the corpus data includes:
extracting entities in the corpus data to construct an entity naming set, wherein the different entities have an association relation;
acquiring preset entity categories to construct corresponding files, and storing the entities into the corresponding files based on the categories of the entities in the corpus data, wherein the entities have unique indexes in the files, and the unique indexes comprise the identification of the entities and the category identification of the entities;
and associating files corresponding to the preset entity category based on the association relationship among the different entities, and determining the preset knowledge graph.
The question-answer interaction method provided by the embodiment of the invention utilizes the category and the attribute of the entity and combines the incidence relation among different entities to construct the preset knowledge graph suitable for multi-category question query, and the entity is stored in the preset knowledge graph in an indexing way, so that the efficiency of entity query is improved.
According to a second aspect, an embodiment of the present invention further provides a question-answering interaction apparatus, including:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a question to be processed and acquiring a preset knowledge map, and the preset knowledge map is constructed based on the incidence relation of different categories of corpus data;
the analysis module is used for carrying out entity analysis on the question to be processed and determining a retrieval sentence corresponding to the question to be processed;
and the retrieval module is used for retrieving in a preset knowledge graph based on the retrieval sentences and determining answers of the question sentences to be processed.
According to the question-answering type interaction device provided by the embodiment of the invention, the preset knowledge graph is constructed according to the incidence relation of the corpus data of different categories, so that the preset knowledge graph has knowledge graphs of multiple categories, and the retrieval sentence obtained by utilizing the entity analysis of the sentence to be processed is retrieved in the preset knowledge graph, so that the accurate answer can be retrieved.
According to a third aspect, embodiments of the present invention provide an electronic device, comprising: a memory and a processor, the memory and the processor being communicatively connected to each other, the memory storing therein computer instructions, and the processor executing the computer instructions to perform the question-answering interaction method according to the first aspect or any one of the embodiments of the first aspect.
According to a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, which stores computer instructions for causing a computer to execute the question-and-answer interaction method described in the first aspect or any one of the implementation manners of the first aspect.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a flow diagram of a question-and-answer interaction method according to an embodiment of the present invention;
FIG. 2 is a flow diagram of a question-and-answer interaction method according to an embodiment of the present invention;
FIG. 3 is a flow chart of a question-and-answer interaction method according to an embodiment of the present invention;
FIG. 4 is a block diagram of a question-answering interaction device according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In accordance with an embodiment of the present invention, there is provided a question-and-answer interactive method embodiment, it is noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.
In this embodiment, a question-answer interaction method is provided, which may be used in electronic devices, such as computers, mobile phones, tablet computers, smart devices, and the like, fig. 1 is a flowchart of the question-answer interaction method according to an embodiment of the present invention, and as shown in fig. 1, the flowchart includes the following steps:
s11, obtaining a question to be processed and obtaining a preset knowledge graph.
The preset knowledge graph is constructed based on the incidence relation of different categories of corpus data.
The question to be processed may be input by a user on an interactive interface provided by the electronic device, or may be acquired by the electronic device by using an audio acquisition device, or acquired by the electronic device from a third-party device, or the like. The specific way of obtaining the question sentence to be processed by the electronic device is not limited at all, and the corresponding setting can be performed according to the actual requirement.
The number of categories of the corpus data included in the preset knowledge graph is not limited at all, and the association relationship between different categories of corpus data may be determined by the entities associated with each entity in the corpus. For example, the entity is apple, and the category related to the entity is fruit and company name, so that the entity apple can be used for associating the fruit with the two categories of company name.
The preset knowledge graph can be obtained by the electronic equipment from the outside, can be constructed by the electronic equipment by utilizing different types of corpus data, or can be obtained by other methods, and is not limited at all, and can be set correspondingly according to actual requirements.
Specifically, each node in the preset knowledge graph may represent an entity, and the node stores attributes of each entity. The nodes can be connected by using an incidence relation, wherein the incidence relation not only comprises the incidence relation between the same category, but also comprises the incidence relation of different categories.
The construction of the preset knowledge graph will be described in detail hereinafter.
And S12, performing entity analysis on the question to be processed, and determining a retrieval sentence corresponding to the question to be processed.
After the electronic device obtains the question to be processed, each phrase of the question to be processed can be recognized first, and then the phrases are matched with the phrases in the entity library, so that which phrases of the question to be processed belong to the entity can be determined. The entity library may be constructed based on each entity name in the preset knowledge graph, or may be established in other manners.
Optionally, the electronic device may also remove some non-entity words from the to-be-processed sentence, and accordingly, after removing the non-entity words, the remaining words in the to-be-processed sentence are entities.
After the electronic device determines the entity in the question sentence to be processed, the electronic device may construct a search sentence by using the category to which the entity belongs. For example, if the entity in the question to be processed belongs to the fruit category, the question asking for the fruit may be used to form a search statement corresponding to the question to be processed. Or, on the basis of identifying the entity of the question to be processed, the electronic device determines the category of the question to be processed to obtain the question category, and forms the retrieval sentence by using the question category.
This step will be described specifically below.
And S13, retrieving in a preset knowledge graph based on the retrieval sentences, and determining answers of the question sentences to be processed.
After the electronic equipment determines the retrieval statement, the electronic equipment can directly utilize the retrieval statement to perform retrieval in the preset knowledge graph. As described above, each node in the preset knowledge graph stores the attribute of an entity, so that entity matching can be performed in the preset knowledge graph first, and after the entity is matched, the corresponding attribute output is extracted by combining with the problem in the retrieval statement, and the answer of the statement to be processed can be determined.
In the question-answer interaction method provided by this embodiment, since the preset knowledge graph is constructed according to the association relationship between the corpus data of different categories, there are knowledge graphs of multiple categories in the preset knowledge graph, and the retrieval sentence obtained by the entity analysis of the sentence to be processed is retrieved in the preset knowledge graph, so that the accurate answer can be retrieved.
In this embodiment, a question-answer interaction method is provided, which may be used in electronic devices, such as computers, mobile phones, tablet computers, smart devices, and the like, fig. 2 is a flowchart of the question-answer interaction method according to an embodiment of the present invention, and as shown in fig. 2, the flowchart includes the following steps:
and S21, acquiring the question to be processed and acquiring a preset knowledge graph.
The preset knowledge graph is constructed based on the incidence relation of different categories of corpus data.
Please refer to S11 in fig. 1, which is not repeated herein.
And S22, performing entity analysis on the question to be processed, and determining a retrieval sentence corresponding to the question to be processed.
Specifically, the S22 may include:
s221, acquiring an entity naming set corresponding to the preset knowledge map.
After the electronic device acquires the preset knowledge graph, the entity name set can be constructed by using the entity names corresponding to the nodes in the preset knowledge graph. Or the electronic device acquires the corresponding entity naming set while acquiring the preset knowledge graph. The specific manner of obtaining the entity name set is not limited at all, and may be set according to actual requirements.
Further, as described above, each entity has its attributed entity class, and accordingly, one entity may be attributed to a different class. Therefore, the construction of the entity naming set can be carried out according to the entity category. For example, corresponding to the entity category A, an entity naming set 1 is constructed; corresponding to the entity type B, an entity naming set 2 is constructed; corresponding to the entity class C, an entity name and 3 are constructed; 8230; and by analogy, an entity naming set corresponding to the preset knowledge graph can be constructed.
S222, matching is carried out in the question to be processed based on each entity name in the entity name set, and entity data in the question to be processed are determined.
The electronic equipment can perform entity matching by using a character string matching mode to obtain all entity data in the question to be processed. The entity data may include entities and entity categories thereof. Of course, the electronic device may also use other ways to match the entities, and is not limited herein.
Specifically, each entity name in the entity naming set is used for matching in the question to be processed, and some entity names appearing in the question to be processed are found, although one entity name may not be found. If not, the entity of the question to be processed is empty. For example, the entity name set is (Zhongjiun, linjunjie), and the question is which songs of Zhongjiun, so that Zhongjiun entity is extracted.
In some optional implementations of this embodiment, before matching the question to be processed, the question to be processed may be normalized. Because the input question forms of different users are various, after the question to be processed is processed in a standardized way, the question is processed into a question expression with a uniform format, so that the subsequent processing is facilitated.
For example, the specific process flow is as follows:
a) The Chinese and English characters of the user are all changed into a uniform format;
b) All the blanks with the number larger than 1 are replaced by blanks;
c) The case and case are all changed into lower case or upper case;
d) The question input of the user is changed into a uniform planning format.
In some optional implementations of this embodiment, the step S222 may include:
(1) And matching each entity name in the entity naming set in the question to be processed to obtain at least one matched entity.
The electronic equipment can perform entity matching in the entity naming set by using a character string matching mode to obtain at least one matching entity in the question to be processed.
(2) And judging whether a similar matching entity exists in the at least one matching entity.
The affinity matching entities may be for two matching entities, an element being part of another element, e.g. apple and apple company. In this case, similar matching entities need to be screened. Namely, when the similar matching entity exists in at least one matching entity, executing the step (3); otherwise, S223 is executed.
(3) And screening similar matching entities to determine entity data in the question to be processed.
The specific screening mode may be user-defined, for example, the entity with the longest string length is retained, and the others are deleted; or, the entity with the shortest character string length is reserved, and the rest is deleted. Therefore, the entity data in the question to be processed can be determined by screening the similar matching entities.
When similar matching entities exist in the sentences to be processed, the similar matching entities are screened, so that the subsequent data retrieval processing amount can be reduced, and the retrieval efficiency and accuracy are improved.
As an optional implementation manner of this embodiment, the step (3) may include:
3.1 The entity with the longest character length among the similar matching entities is retained and the other similar matching entities are deleted.
3.2 Extract the entity corresponding category with the longest character length to form entity data in the question to be processed, wherein the entity data comprises an entity name and an entity category.
Continuing with the above example, in the present embodiment, the entity with the longest character length in the similar matching entities is considered as the entity in the question to be processed. For example, "apple" of the two similar matching entities "apple" and "apple company" is taken as an entity in the question to be processed, and "apple" is deleted.
As described above, the entity name set may be constructed according to entity categories, and then after determining the entities in the question to be processed, the entity categories corresponding to the entities may be determined accordingly.
And S223, determining the target question category to which the to-be-processed question belongs by using the entity data in the to-be-processed question.
The electronic device may first screen the category of the question to be processed by using the category of the entity in the entity data. For example, the sentence to be processed includes entities of 3 entity categories, and accordingly, preliminary screening of the categories may be performed in a preset knowledge graph, and then further determination of question categories may be performed from the 3 entity categories that are screened out.
Or, the electronic device may also directly input each entity name in the question to be processed into the classification model, and perform classification processing to obtain the category of the target question.
Or, the electronic device may first perform primary screening on the question categories of the to-be-processed question by using the entities in the to-be-processed question, and then perform secondary screening by using the classification model. That is, the target question category of the question to be processed may be determined by performing rule matching or model classification on the question to be processed first, or by performing rule classification and model classification first.
And S224, forming a retrieval sentence corresponding to the question to be processed based on the category of the target question and the entity data in the question to be processed.
After the electronic equipment determines the category of the target question, a retrieval sentence corresponding to the question to be processed can be constructed.
In some optional implementations of this embodiment, the step S224 may include:
(1) And determining a target query statement template corresponding to the target question category by utilizing the corresponding relation between the question category and the query statement template.
The corresponding relation between the question category and the query sentence template is stored in the electronic equipment, and after the question category to which the question to be processed belongs is determined, the corresponding query sentence template is also determined. Accordingly, a target query statement template corresponding to the target question category can be obtained.
(2) And filling the target query sentence template based on the entity data to form a retrieval sentence corresponding to the question to be processed.
And filling the entities in the question to be processed into the target query sentence template by the electronic equipment to obtain the retrieval sentences corresponding to the question to be processed.
Through the corresponding relation between the question category and the query sentence template, the target query sentence template corresponding to the sentence to be processed can be determined, so that the answer corresponding to the sentence to be processed can be accurately and efficiently retrieved.
And S23, retrieving in a preset knowledge graph based on the retrieval sentences, and determining answers of the question sentences to be processed.
Please refer to S13 in fig. 1, which is not repeated herein.
The question-answer interaction method provided by this embodiment determines the entity data in the question to be processed in a name matching manner, and can ensure the reliability of the determination of the entity data; meanwhile, the question category corresponding to the question to be processed is determined so as to realize accurate retrieval.
As an optional implementation manner of this embodiment, the electronic device performs entity extraction on the question to be processed in the entity name set by using a character string matching algorithm. If a plurality of similar matching entities are obtained through extraction, elements in an extraction result need to be checked for a long time, if a certain element is part of another element in the morning, the current element is set as a stop word to be filtered, and finally key value pairs of the entities and the entity categories, namely the entity data, are constructed.
Further, the electronic device queries using matching rules of the entity and the template combination. The electronic device may perform matching according to the following rules, matching a rule template of a specific category. The rule content is as follows:
(1) For the case of a single entity
1.1 Category division is performed according to entity category to judge which question category the question to be processed may belong to;
1.2 ) partitioning question-sentences according to entities, matching rule templates
1.3 For multiple entities of the same category, the processing rule may be an entity that preferentially matches the end of a sentence, specifically because the end of a general sentence is the core content with reference to the English grammar
(2) For the case of multiple entities
2.1 Classification based on entity-mixed classification combinations
2.2 ) partitioning question-sentences according to entities, matching rule templates
For the single entity and the multiple entities, if rule matching fails, model matching classification can be adopted, wherein the model can use 2 methods of single model and multiple model fusion.
And further, obtaining a specific query statement template according to the classification result, and filling the entity. If multiple entities are present, then a cascading query is required to obtain a complete query statement. And executing the query statement and returning a corresponding question answer. When the question answer is displayed, the answer can be replied by applying a template, or the answer is generated based on the model, the answer is filled in and the result is returned.
In this embodiment, a question-answer interaction method is provided, which may be used in electronic devices, such as computers, mobile phones, tablet computers, smart devices, and the like, fig. 3 is a flowchart of the question-answer interaction method according to the embodiment of the present invention, and as shown in fig. 3, the flowchart includes the following steps:
and S31, acquiring the question to be processed and acquiring a preset knowledge graph.
The preset knowledge graph is constructed based on the incidence relation of different categories of corpus data.
Specifically, the above S31 may include:
and S311, obtaining the corpus data of various categories.
And acquiring specific corpus data of different application scenes, wherein the specific corpus data comprises entity data and relationship data. The data format is { entity id, attribute 1, attribute 2, \ 8230;, attribute n, which contains another entity information: { entity id, attribute 1, attribute 2, \8230;, attribute n } }. And obtaining the corpus data by adopting a crawler mode.
S312, determining a preset knowledge graph based on the category of the corpus data and the association relationship between the corpus data.
The corpus data acquired by the electronic equipment correspondingly carries the category to which the corpus data belongs and the association relation between different corpus data. Then, the electronic device can directly determine the preset knowledge graph by using the association relationship of different corpora.
Optionally, after obtaining the corpus data, the electronic device may perform data cleaning on the corpus data, and remove the scrambled characters to obtain data in the standard format. The specific methods include but are not limited to: scrambling codes, notations of highly used entity names, etc. Meanwhile, when the messy code characters are checked, whether the entity exists or not needs to be judged, and if the entity exists, the entity needs to be extracted.
Wherein, cleaning data and eliminating messy characters can be realized by adopting the following steps:
a) If the field has disorder code and symbol, delete
b) Checking the entity name, checking the correctness of the field, and deleting unnecessary entities.
In some optional implementations of this embodiment, the step S312 may include:
(1) And extracting entities in the corpus data to construct an entity naming set, wherein the different entities have an association relationship.
(2) And acquiring preset entity categories to construct corresponding files, and storing the entities into the corresponding files based on the categories of the entities in the corpus data.
Wherein the entity has a unique index in the file, and the unique index comprises an entity identification and an entity category identification.
Specifically, a file is constructed according to each entity category, and the content of the file depends on a specific service scenario, which is not limited in any way. If no special description exists, all entity attribute fields of the original corpus data are reserved.
(3) And associating files corresponding to the preset entity category based on the association relationship among different entities to determine the preset knowledge graph.
And simultaneously, constructing an entity relationship file, wherein the format of the entity relationship in the file is as follows: entity id1, entity id2, relationship value. A globally unique index id field may also be added to the current relationship to facilitate lookup of the relationship.
And after all types of files are obtained, using a batch import command to import the files of the maintenance bureau into the database, and completing the construction of the preset knowledge graph.
The entity type and the attribute of the entity are utilized, the association relation between different entities is combined, the preset knowledge graph suitable for multi-type question query is constructed, the entity is stored in the preset knowledge graph in an indexing mode, and the entity query efficiency is improved.
And S313, acquiring the question to be processed.
Reference is made in detail to the description of the to-be-processed question acquisition in S11 of the embodiment shown in fig. 1, and details are not repeated here.
It should be noted that, the above S311 to S312 may be completed before the question to be processed is acquired, and once the preset knowledge graph is constructed, the sentence to be processed may be directly used after the question to be processed is acquired, and it is not necessary to construct the sentence to be processed each time before the question to be processed is acquired.
And S32, performing entity analysis on the question to be processed, and determining a retrieval sentence corresponding to the question to be processed.
Please refer to S22 in fig. 2 for details, which are not described herein.
And S33, retrieving in a preset knowledge graph based on the retrieval sentences, and determining answers of the question sentences to be processed.
Please refer to S23 in fig. 2, which is not repeated herein.
According to the question-answer interaction method provided by the embodiment, the preset knowledge graph is constructed by using the association between the corpus data of different categories, so that the preset knowledge graph can realize the query of the question sentences of different categories to obtain the answers of the question sentences of different categories, and the application scene of the method is expanded.
As a specific application example of the embodiment, data in the english music field is collected, and then the implementation of automatic question answering is realized through a component knowledge map database. The specific implementation process is as follows:
1) Specific raw data of music is acquired. The data is divided into three files of album, song list and singer. Wherein 3 files all contain music entities and the albums and sings contain genre entities. The original data is cleaned first, and all entities are extracted. A file is constructed according to each type of entity, and 5 entity files of an album, a song list, a singer, music and a style are constructed in total. And finally, generating a global unique index id field according to the unique primary key id and the category of the entity under the file. And simultaneously, constructing an entity relationship file, wherein the format of the entity relationship in the file is as follows: entity id1, entity id2, relationship value. Meanwhile, a global unique index id field is added to the current relationship so as to facilitate the search of the relationship.
And after all the files are obtained, importing the data files into the database by using a batch import command to complete the construction of the knowledge graph.
2) And acquiring a question sentence input by a user and carrying out standardization processing on the question sentence. Specifically, a character should be "," "in the character set using a character replacement function. "? "all changed to English; english abbreviation is changed into non-abbreviation format, english spaces between words are all changed into 1, and capital and lowercase of letters of the words are all changed into lowercase, so that problem input of a user is changed into a uniform standardized format.
3) And extracting the names of the entities and constructing an entity naming set according to the 5 entity relationship files. And simultaneously, carrying out entity command matching by using a character string matching algorithm, filtering a matching result, and filtering an entity with an entity name being a part of another entity name. And extracting the entity content to obtain the entity category, and matching the question sentence with the specific category according to the question format based on the category. And if no matching result exists, putting the question sentence into the classification model, and finally obtaining a question method result and entity content.
4) And obtaining a specific database query statement template according to the component result, and filling the entity content to obtain a complete database query statement.
5) And executing the database query statement and returning a corresponding question-answer reply.
The practical effects are as follows:
the problems are that: do you knock w of the singer of the song 95minutes aione
The answer is The singel Jason of The singer who sides this song is.
The problem is that I wave to knock the album of material singer's name
And (3) answer: the markers of this album are Mela Mrange and Kcee Omar Krizbeatz.
The question-answer interaction method provided by this embodiment may be applied to a specific english corpus, and the corpus format is as follows: { entity id, attribute 1, attribute 2, \8230;, attribute n, entity information containing another: { entity id, attribute 1, \8230;, attribute n } }, and a knowledge graph is combined to obtain a classification result through a series of rules and models. And obtaining a database query statement according to the entity content, and accessing the database to obtain a final answer. The preset knowledge graph provided by the embodiment has expandability, supports multi-entity query and meets complex service scenes. Meanwhile, expensive servers are not needed for prevention, and the hardware cost is low.
In this embodiment, a question-answering type interactive device is further provided, and the device is used for implementing the above embodiments and preferred embodiments, and the description of the device is omitted. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware or a combination of software and hardware is also possible and contemplated.
The present embodiment provides a question-answering type interactive device, as shown in fig. 4, including:
an obtaining module 41, configured to obtain a question to be processed and obtain a preset knowledge graph, where the preset knowledge graph is constructed based on association relations of different categories of corpus data;
an analysis module 42, configured to perform entity analysis on the question to be processed, and determine a retrieval statement corresponding to the question to be processed;
and the retrieval module 43 is configured to perform retrieval in a preset knowledge graph based on the retrieval statement, and determine an answer of the question to be processed.
In the question-answering interaction device provided by this embodiment, since the preset knowledge graph is constructed according to the association relationship between the corpus data of different categories, the preset knowledge graph has knowledge graphs of multiple categories, and the retrieval sentences obtained by the entity analysis of the sentences to be processed are retrieved in the preset knowledge graph, so that accurate answers can be retrieved.
The interactive question-answering device in this embodiment is in the form of a functional unit, where the unit refers to an ASIC circuit, a processor and memory executing one or more software or fixed programs, and/or other devices that can provide the above-described functionality.
Further functional descriptions of the modules are the same as those of the corresponding embodiments, and are not repeated herein.
An embodiment of the present invention further provides an electronic device, which has the question-answering type interactive apparatus shown in fig. 4.
Referring to fig. 5, fig. 5 is a schematic structural diagram of an electronic device according to an alternative embodiment of the present invention, and as shown in fig. 5, the electronic device may include: at least one processor 51, such as a CPU (Central Processing Unit), at least one communication interface 53, memory 54, at least one communication bus 52. Wherein a communication bus 52 is used to enable the connection communication between these components. The communication interface 53 may include a Display (Display) and a Keyboard (Keyboard), and the optional communication interface 53 may also include a standard wired interface and a standard wireless interface. The Memory 54 may be a high-speed RAM Memory (volatile Random Access Memory) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The memory 54 may alternatively be at least one memory device located remotely from the processor 51. Wherein the processor 51 may be combined with the apparatus described in fig. 4, the memory 54 stores an application program, and the processor 51 calls the program code stored in the memory 54 for performing any of the above method steps.
The communication bus 52 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The communication bus 52 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 5, but this is not intended to represent only one bus or type of bus.
The memory 54 may include a volatile memory (RAM), such as a random-access memory (RAM); the memory may also include a non-volatile memory (e.g., flash memory), a hard disk (HDD) or a solid-state drive (SSD); the memory 54 may also comprise a combination of the above types of memory.
The processor 51 may be a Central Processing Unit (CPU), a Network Processor (NP), or a combination of a CPU and an NP.
The processor 51 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.
Optionally, the memory 54 is also used to store program instructions. The processor 51 may call program instructions to implement the question-and-answer interaction method as shown in the embodiments of fig. 1 to 3 of the present application.
The embodiment of the invention also provides a non-transitory computer storage medium, wherein the computer storage medium stores computer executable instructions, and the computer executable instructions can execute the question-answer interaction method in any method embodiment. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.
Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art can make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims (10)

1. A question-answering interaction method, comprising:
the method comprises the steps of obtaining a question to be processed and obtaining a preset knowledge graph, wherein the preset knowledge graph is constructed based on the incidence relation of different types of corpus data;
performing entity analysis on the question to be processed, and determining a retrieval sentence corresponding to the question to be processed;
and searching in a preset knowledge graph based on the search sentences, and determining answers of the question sentences to be processed.
2. The method according to claim 1, wherein the performing entity analysis on the question to be processed to determine the search sentence corresponding to the question to be processed comprises:
acquiring an entity naming set corresponding to the preset knowledge graph;
matching in the question to be processed based on each entity name in the entity name set, and determining entity data in the question to be processed;
determining the type of a target question to which the question to be processed belongs by using the entity data in the question to be processed;
and forming a retrieval sentence corresponding to the question to be processed based on the target question category and the entity data in the question to be processed.
3. The method of claim 2, wherein the matching in the question to be processed based on each entity name in the entity name set to determine entity data in the question to be processed comprises:
matching in the question to be processed by utilizing each entity name in the entity naming set to obtain at least one matched entity;
judging whether a similar matching entity exists in the at least one matching entity;
and when the similar matching entity exists in the at least one matching entity, screening the similar matching entity to determine entity data in the question to be processed.
4. The method of claim 3, wherein the screening the similar matching entities to determine entity data in the question to be processed comprises:
keeping the entity with the longest character length in the similar matching entities, and deleting other similar matching entities;
and extracting the entity corresponding category with the longest character length to form entity data in the question to be processed, wherein the entity data comprises an entity name and an entity category.
5. The method according to claim 2, wherein the forming a search sentence corresponding to the question to be processed based on the target question category and entity data in the question to be processed comprises:
determining a target query statement template corresponding to the target question category by using the corresponding relation between the question category and the query statement template;
and filling the target query statement template based on the entity data to form a retrieval statement corresponding to the question to be processed.
6. The method according to any one of claims 1-5, wherein the obtaining a preset knowledge-graph comprises:
obtaining corpus data of various categories;
and determining the preset knowledge graph based on the category of the corpus data and the incidence relation between the corpus data.
7. The method according to claim 1, wherein the determining the predetermined knowledge-graph based on the category of the corpus data and the association relationship between the corpus data comprises:
extracting entities in the corpus data to construct an entity naming set, wherein the different entities have an incidence relation;
acquiring preset entity categories to construct corresponding files, and storing the entities and the attributes thereof in the corresponding files based on the categories of the entities in the corpus data, wherein the entities have unique indexes in the files, and the unique indexes comprise the identification of the entities and the category identification of the entities;
and associating the files corresponding to the preset entity categories based on the association relationship among the different entities to determine the preset knowledge graph.
8. A question-answering interaction device, comprising:
the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a question to be processed and acquiring a preset knowledge map, and the preset knowledge map is constructed based on the incidence relation of different categories of corpus data;
the analysis module is used for carrying out entity analysis on the question to be processed and determining a retrieval sentence corresponding to the question to be processed;
and the retrieval module is used for retrieving in a preset knowledge graph based on the retrieval sentences and determining answers of the question sentences to be processed.
9. An electronic device, comprising:
a memory and a processor, the memory and the processor being communicatively connected to each other, the memory storing therein computer instructions, and the processor executing the computer instructions to perform the question-answering interaction method according to any one of claims 1 to 7.
10. A computer-readable storage medium storing computer instructions for causing a computer to execute the method for question-answering interaction according to any one of claims 1 to 7.
CN202110682887.7A 2021-06-17 2021-06-17 Question-answer type interaction method and device and electronic equipment Pending CN115495560A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110682887.7A CN115495560A (en) 2021-06-17 2021-06-17 Question-answer type interaction method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110682887.7A CN115495560A (en) 2021-06-17 2021-06-17 Question-answer type interaction method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN115495560A true CN115495560A (en) 2022-12-20

Family

ID=84465494

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110682887.7A Pending CN115495560A (en) 2021-06-17 2021-06-17 Question-answer type interaction method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN115495560A (en)

Similar Documents

Publication Publication Date Title
CN108287858B (en) Semantic extraction method and device for natural language
CN109582799B (en) Method and device for determining knowledge sample data set and electronic equipment
CN106570180B (en) Voice search method and device based on artificial intelligence
CN109726274B (en) Question generation method, device and storage medium
CN110275965B (en) False news detection method, electronic device and computer readable storage medium
CN101167075B (en) Characteristic expression extracting device, method, and program
CN109947952B (en) Retrieval method, device, equipment and storage medium based on English knowledge graph
EP3869360A1 (en) New category tag mining method and device, electronic device and computer-readable medium
CN111460798A (en) Method and device for pushing similar meaning words, electronic equipment and medium
CN111813961B (en) Data processing method and device based on artificial intelligence and electronic equipment
CN110619115A (en) Template creating method and device, electronic equipment and storage medium
CN108345694B (en) Document retrieval method and system based on theme database
CN114402384A (en) Data processing method, device, server and storage medium
CN111553138A (en) Auxiliary writing method and device for standardizing content structure document
CN113127617B (en) Knowledge question answering method of general domain knowledge graph, terminal equipment and storage medium
JP6942759B2 (en) Information processing equipment, programs and information processing methods
CN111506740A (en) Word list adding method and device, storage medium and electronic device
CN115495560A (en) Question-answer type interaction method and device and electronic equipment
CN114186557A (en) Method, device and storage medium for determining subject term
CN112445888A (en) Information extraction method and related equipment
CN111507062A (en) Text display method, device and system, storage medium and electronic device
CN111177340A (en) Method and device for generating family tree questionnaire and method and device for processing family tree questionnaire
CN112749316A (en) Translation quality determination method and device, storage medium and processor
CN111967257B (en) Word segmentation method and device, electronic equipment and storage medium
CN116757203B (en) Natural language matching method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination