Disclosure of Invention
In order to solve the problems of poor universality and low efficiency in the prior art, the embodiment of the invention provides a question-answering method, a question-answering device, electronic equipment and a storage medium.
In order to achieve the above purpose, the embodiments of the present application adopt the following technical solutions:
a question-answering method comprising:
acquiring a natural query statement input by a user, and converting the natural query statement into a semantic symbol string;
generating a semantic path corresponding to the natural query statement according to the natural query statement and the semantic symbol string;
generating machine query sentences which can be identified by a database storing the knowledge graph according to the semantic path;
and inquiring a question-answer result corresponding to the machine query statement in a database for storing a knowledge graph according to the machine query statement, and feeding back the question-answer result.
In the scheme, the question sentence is symbolized, then the semantic path is generated, and finally the semantic path is converted into a query sentence which can be identified by a database for searching, and an accurate query result is returned. The method does not need to manually design a problem template, is universal to different knowledge graphs, has strong adaptability, is not limited by the structure of the knowledge graph, and has high efficiency.
The acquiring a natural query statement input by a user and converting the natural query statement into a semantic symbol string includes:
acquiring a natural query statement input by a user;
matching corpora corresponding to each effective phrase in the natural query sentence from a data set, wherein the data set comprises an entity library, an attribute library, a tag library and a relation library;
determining semantic symbols corresponding to each effective phrase of the natural query statement according to the data set category corresponding to the corpus;
and obtaining a semantic symbol string corresponding to the natural query statement according to the semantic symbols.
The matching of the corpus corresponding to each effective phrase in the natural query sentence from the data set includes:
and matching the corpora corresponding to each effective phrase in the natural query sentence by using an entity library, an attribute library, a tag library and a relation library in sequence.
Generating a semantic path corresponding to the natural query statement according to the natural query statement and the semantic symbol string, wherein the semantic path comprises:
sequentially determining all nodes of the semantic path according to all entities, attributes or labels in the semantic symbol string, wherein each node comprises semantic symbols of the corresponding entities, attributes or labels and phrases corresponding to the entities, attributes or labels;
and linking corresponding nodes in all nodes according to the relation semantic symbols in the semantic symbol string and phrases corresponding to the relation semantic symbols.
In the step of determining in turn all nodes of the semantic path based on all entities, attributes or labels in the semantic string,
when a plurality of entities, attributes or tags are adjacent, an intersection of the plurality of entities, attributes or tags is obtained as a node intersection, and the node intersection includes semantic symbols of the plurality of entities, attributes or tags and phrases corresponding to the plurality of entities, attributes or tags.
Generating the machine query statement from the semantic path using a Neo4j database.
A question answering apparatus comprising:
the system comprises an acquisition module, a semantic symbol string generation module and a semantic symbol string generation module, wherein the acquisition module is used for acquiring a natural query statement input by a user and converting the natural query statement into the semantic symbol string;
the analysis module generates a semantic path corresponding to the natural query statement according to the natural query statement and the semantic symbol string, and generates a machine query statement which can be identified by a database for storing a knowledge graph according to the semantic path;
and the query module is used for querying the question and answer result corresponding to the machine query statement in a database for storing a knowledge graph according to the machine query statement and feeding back the question and answer result.
An electronic device, comprising:
a processor and a memory for storing computer program instructions;
wherein, when said computer program is loaded and run by said processor, said processor performs the question-answering method as described above.
A computer readable storage medium storing computer program instructions which, when loaded and executed by a processor, carry out the question-answering method as described above.
Compared with the prior art, the embodiment of the application has the following beneficial effects:
in the question-answering method provided by the embodiment of the invention, after the natural query sentence is converted by using the semantic path, the link length of the semantic path question sentence is not limited, and each node can have a plurality of limited descriptions, so that a plurality of complex question sentences can be successfully converted into the semantic path consisting of each node, and the application scene of the embodiment in the field of knowledge question-answering is enlarged, so that the question-answering method provided by the embodiment of the invention is more suitable for the query in various fields and the query of various complex sentences.
Meanwhile, compared with the existing question-answering system which needs to initialize different logic expression rules according to different knowledge maps to convert specific natural consultation sentences, the question-answering method based on the knowledge maps provided by the embodiment of the invention converts natural inquiry sentences into semantic symbol strings and further into semantic paths, so that rules do not need to be preset, the question-answering method can be applied to various knowledge maps, templates do not need to be preset or rules do not need to be initialized again according to different knowledge maps, and the question-answering method provided by the embodiment of the invention has better universality.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of protection of the present invention.
Referring to fig. 1, fig. 1 is a schematic view of a scenario of a question answering method according to the present invention. In this scenario, the query terminal 10, the question answering device 20 and the database 30 are included, the query terminal 10 may communicate with the question answering device 20 so as to send a query request to the question answering device 20, the question answering device 20 presents the question answering result to the query terminal 10, and the question answering device 20 may also search corresponding data from the database 30 based on the query request. The query terminal 10 provides a user interface, and provides an input window, a query button, and a query result display interface on the user interface, where the input window is used to input a question sentence of a text, i.e., a natural query sentence, and the query result display interface is used to visualize a query result. The inquiry terminal 10 may be a user device such as a mobile phone, a laptop, a tablet, etc. in which a browser is installed. After a user inputs a natural query sentence on a browser, the browser sends the natural query sentence to the question-answering device 20, the question-answering device 20 identifies and processes the natural query sentence, then obtains a question-answering result corresponding to the natural query sentence in the database 30, and finally feeds the question-answering result back to the user. The question answering apparatus 20 is integrated with a knowledge-graph-based question answering device described later, and can perform the processing steps of the knowledge-graph-based question answering method described later, and the question answering apparatus 20 may be a computer or a server or the like having storage and processing capabilities.
Knowledge-graphs are used to describe various entities and their relationships that exist in the real world. The knowledge graph includes entities, attributes of the entities, tags of the entities, and relationships between the entities. The knowledge graph can be viewed as a huge network. The attribute broadly includes attribute values, and the attribute values may be precise values or ranges. Specifically, the knowledge graph is composed of entities, attributes of the entities, relationships between the entities and tags of the entities.
For example, the knowledge graph provided by the embodiment of the invention comprises entity labels of a company and a natural person. Where a company includes various organizations. The company has the attributes of full name, short name, registered capital, operating range, industry, address and the like. The nature person has the attributes of name, age, calendar, salary, etc. There are relations between companies, such as invested, supplier, and client. The company has relation with natural people such as CEO, natural person stockholder, and job.
The embodiment of the invention provides a question-answering method which is realized based on a knowledge graph technology. Specifically, referring to fig. 2a, the question answering method according to one embodiment of the present invention includes the following steps:
s100, acquiring a natural query sentence input by a user, identifying semantic symbols in a knowledge graph corresponding to the natural query sentence, and converting the natural query sentence into a semantic symbol string.
In some embodiments, referring to fig. 2b, the step S100 includes the following steps:
s101, acquiring a natural query statement input by a user.
The user can input the natural query sentence through the user interface provided by the terminal, and the question and answer equipment can obtain the natural query sentence sent by the terminal.
And S102, matching corpora corresponding to each effective phrase in the natural query sentence from a data set of the knowledge graph, wherein the data set comprises an entity library, an attribute library, a tag library and a relation library.
In the embodiment of the invention, the full names, short names and natural person names of companies in the knowledge graph are used for generating an entity library, the attributes of the companies and the natural persons are used for generating an attribute library, the relation between the companies and the natural persons is used for generating a relation library, and the companies and the natural persons form a label library.
After the natural query sentence is obtained, the natural query sentence is segmented, and the word segmenter loads the words in the data set, so that the words can be correctly segmented when the words are segmented (the word segmentation technology is the prior art, and detailed discussion is not expanded in the embodiment). Each phrase is then matched to each dataset. Preferably, the entity library, the attribute library, the tag library and the relationship library are used in sequence to match the corpus corresponding to each effective phrase in the natural query sentence.
For example, general matters of the world health organization
The effective phrase "world health organization" in "is firstly matched from the entity library in the knowledge map, and then" general matters "is matched from the relation library.
S103, determining semantic symbols corresponding to each effective phrase in the natural query sentence according to the data set category corresponding to the corpus.
In the industry, generally, a character E represents an entity, a character R represents a relationship between entities, a character P represents an attribute of an entity, a character L represents a label of an entity, and a character V represents an attribute value, so in this step, a semantic symbol of an entity is E, a semantic symbol of an attribute is P, a semantic symbol of a relationship is R, a semantic symbol of a label is L, and a semantic symbol of an attribute value is V.
For example, a semantic notation corresponding to "world health organization" is determined as E, and a semantic notation corresponding to "general affairs" is determined as "R".
In some embodiments, the natural query statement also contains attribute values, and V is identified by: in the sentence after P and before another semantic symbol, the part left after the stop word is removed. V may be an exact value, a range, and may be a maximum or minimum.
For example, the natural query statement "which companies have a registered capital investment of greater than 5000 ten thousand for company A
In the description, the semantic symbol corresponding to the effective phrase "company" is E, the semantic symbol corresponding to the effective phrase "investment" is R, the semantic symbol corresponding to the effective phrase "registration capital" is P, the semantic symbol corresponding to the effective phrase "company" is L, at this time, the semantic symbol corresponding to the phrase "greater than 5000 ten thousand" between the effective phrase "registration capital" and "company" is an attribute value V, and V is used instead of the attribute value hereinafter.
In some embodiments, "registered capital greater than 5000 ten thousand" may be represented using the semantic notation PV correspondence.
It should be noted that matching a corpus with certain characteristics in a natural query statement input by a natural person from a certain data set is a technique known by those skilled in the art, and is not a main point of protection of the embodiments of the present invention, and details are not described here.
And S104, obtaining a semantic symbol string corresponding to the natural query statement according to the semantic symbols. That is, the semantic symbol strings can be obtained by combining the semantic symbol numbers obtained in sequence.
For example, "general interference of the world health organization
"the corresponding semantic symbol is E and R in turn, then the semantic symbol string is ER.
For example, "what is the salary of company A's CEO
"the corresponding semantic notation is E, R and P in turn, the semantic notation string is ERP.
Therefore, the method based on the knowledge graph provided by the embodiment of the present invention can divide all query statements into 2 types, one type is P-ending and represents the attributes of the query entity of the natural query statement, and the other type is not P-ending and represents the query entity of the natural query statement.
For example, "general interference of the world health organization
"the corresponding semantic string is ER, not ending with P, and indicates that the natural query statement asks for an entity. "what is the salary of CEO of company A
"the corresponding semantic symbol string is ERP, which indicates that the query of the natural query statement is an attribute.
S200, generating a semantic path corresponding to the natural query sentence according to the natural query sentence and the semantic symbol string. The semantic path comprises at least two nodes, the at least two nodes are linked through a relation semantic symbol and a phrase corresponding to the relation semantic symbol, and each node comprises a semantic symbol of a corresponding entity, attribute or label and a phrase corresponding to the entity, attribute or label.
This step is used to dynamically construct semantic paths, eliminating the need to use various rules to construct logical expressions for various knowledge graphs, or to construct different question and answer templates for different knowledge graphs.
For example, what is the salary in generating "company A's CEO
"after the corresponding semantic string ERP, a corresponding semantic path may be constructed, as shown in fig. 3.
For example, those companies with registered capital greater than 5000 ten thousand who invested in "A company" are generating
"after the corresponding semantic notation string ERPVL, a corresponding semantic path may be constructed, as shown in fig. 4.
For example, after generating the semantic notation string RELRL corresponding to "what company the company invested in company a also invested in" the corresponding semantic notation path may be constructed, as shown in fig. 5. Preferably, in this case of RE, when constructing the semantic path, the node E may be constructed in front, so that the semantic path is more smooth.
For example, what is the compensation in generating the CEO for the company with the "most registered capital
After "corresponding semantic notation PVLRP string, a corresponding semantic path may be constructed, as shown in fig. 6.
According to the semantic path, generally, all objects queried by the natural query question are represented by the last node, namely the last node, in the constructed semantic path. Each node can see that there are many PVs and ls and the corresponding phrases of the PVs and ls limit them. The semantic path length corresponding to a natural query sentence is theoretically unlimited, and a very complex question sentence can be successfully converted into a simple semantic path in the mode, and further converted into a machine language which can be recognized by a machine.
Specifically, the step S200 includes the following steps:
s201, sequentially determining all nodes of the semantic path according to all entities, attributes or labels in the semantic symbol string, wherein each node comprises the semantic symbols of the corresponding entities, attributes or labels and phrases corresponding to the entities, attributes or labels.
For example, referring to FIG. 3, "Payment of company A's CEO is
"the semantic symbol string is ERP, wherein E and P can both be sequentially constructed as nodes: (E: company A) and (P: compensation).
For example, referring to fig. 5, "what company a company invested" also invested in the corresponding semantic string RELRL, where E, L and L may in turn construct 3 nodes: (E: company A), (L: company) and (L: company).
In some embodiments, in this step S201, when multiple entities, attributes or labels are adjacent (occur together), it is considered as a description of the same object (E or PV or L), and therefore their node intersection is taken, where the node intersection includes semantic symbols of multiple entities, attributes or labels and phrases corresponding to multiple entities, attributes or labels.
For example, referring to FIG. 4, "companies having registered capital invested in company A greater than 5000 ten thousand have
"corresponding semantic symbol ERPVL, wherein PVL is adjacent to each other, and one end of PVL is R, and the other end is none, i.e. PVL can be regarded as a node intersection. Thus, 2 nodes can be constructed in sequence: (E: company A) and (PV: registered capital > 5000 kilo L: company).
S202, linking corresponding nodes in all nodes according to the relation semantic symbols in the semantic symbol string and phrases corresponding to the relation semantic symbols.
For example, "what is the salary of company A's CEO
"the semantic symbol string is ERP, R and its corresponding phrase" CEO "are used to link these two nodes.
For example, "what company a company invests" also invests in the corresponding semantic string RELRL, the first R and its corresponding phrase "invest" are used to link the second node (L: company) to the first node (E: company a), and the second R and its corresponding phrase "invest" are used to link the second node (L: company) to the third node (L: company).
For example, "companies investing in company A have registered capital greater than 5000 ten thousand
"corresponding semantic notation ERPVL, wherein 2 nodes can be constructed in sequence: (E: company A) and (PV: registered capital > 5000 kilo L: company), R and its corresponding phrase "investment" are used to link the two nodes.
And S300, generating a machine query statement corresponding to the knowledge graph according to the semantic path.
This step is used to convert the semantic path into a query statement of the database in which the knowledge graph of this embodiment is located. For example, if the knowledge-graph is stored in the Neo4j database, the semantic path may be generated into a machine query statement using the Neo4j database.
For example, "what company the company invested in company a also invested in" translates into the following query statement according to its semantic path:
MATCH (n0 { name: 'A company' }) < - [: investment ] - (n1) - [: investment ] - > (n2: company)
RETURN n2。
For example, "how much the most capital registered company's CEO is paid
", convert to the following query statement according to the machine semantic path:
MATCH (n0: Co) - [: CEO ] - > (n1)
Capital as pv registered WITH the letters of the company, N0, N1, n0.
ORDER BY pv DESC
RETURN n1. remunerates LIMIT 1.
S400, according to the machine query statement, querying a question and answer result corresponding to the machine query statement in a Neo4j database, and feeding back the question and answer result.
Those skilled in the art can understand that the machine query statement is a query statement matched with a database corresponding to a knowledge graph, and for the obtained machine query statement, query can be performed on the database corresponding to the knowledge graph to obtain a question and answer result.
For example, the user inputs a natural query sentence "general matters of the world health organization" in the search box, the user transmits the natural language query sentence to the question answering device 20 (for example, a server) through the terminal 10, the question answering device 20 identifies and processes the natural query sentence, and obtains a question answering result "Tandeser adanom" from the database 30.
In the embodiment of the invention, each effective phrase in the natural query sentence is matched and converted into the corresponding semantic symbol, then the semantic path is generated based on the semantic symbol string formed by the semantic symbols, and finally the semantic path is converted into the query sentence of the Neo4j database, for example, so that the accurate query result can be found from the database and returned. The method does not need to manually design a problem template, so that the efficiency is improved, and the semantic path is dynamically changed according to the natural query statement and does not have correspondence with the knowledge graph, so that the method is universal for different knowledge graphs. The semantic link length of the question is not limited, and each node can have a plurality of limited descriptions, so that the complicated question types can be solved. Can be universally used in each knowledge map, thereby having wide application prospect in the field of knowledge question answering.
To facilitate a better implementation of the above-described aspects of embodiments of the present invention, the following also provides relevant means for implementing the above-described aspects.
The embodiment of the invention provides a question answering device which is realized based on a knowledge graph technology. Specifically, referring to fig. 7, the question answering device provided in this embodiment includes:
the obtaining module 101 obtains a natural query statement input by a user, and converts the natural query statement into a semantic symbol string. According to one specific implementation mode, after a natural query statement is acquired, semantic symbols in a knowledge graph corresponding to the natural query statement are identified, wherein the semantic symbols comprise entities, attributes of the entities, tags of the entities and relations among the entities, and then semantic symbol strings are generated based on the semantic symbols.
And the analysis module 102 is used for generating a semantic path corresponding to the natural query statement according to the natural query statement and the semantic symbol string, and generating a machine query statement which can be identified by a database for storing the knowledge graph according to the semantic path. The semantic path comprises at least two nodes, the at least two nodes are linked through a relation semantic symbol and a phrase corresponding to the relation semantic symbol, and each node comprises a semantic symbol of a corresponding entity, attribute or label and a phrase corresponding to the entity, attribute or label.
And the query module 103 queries the question and answer result corresponding to the machine query statement in the knowledge graph according to the machine query statement, and feeds back the question and answer result.
The question answering device in the embodiment is a software system for implementing the question answering method, and executes each step of the question answering method, so that for undescribed parts, reference can be made to the related description in the question answering method.
Referring to fig. 8, an embodiment of the present application further provides an electronic device, which may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a server, and the like.
Generally, an electronic device includes: at least one processor 301; and a memory 302 for storing computer program instructions.
The processor 301 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 301 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 301 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 301 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. Processor 301 may also include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning such that knowledge-graph based questions and answers may be trained autonomously for learning, improving efficiency and accuracy.
Memory 302 may include one or more computer-readable storage media, which may be non-transitory. Memory 302 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 302 is used to store at least one instruction for execution by processor 801 to implement the knowledge-graph based question-answering method provided by the method embodiments herein.
In some embodiments, the following may be optionally included: a communication interface 303 and at least one peripheral device. The processor 301, the memory 302 and the communication interface 303 may be connected by a bus or signal lines. Various peripheral devices may be connected to communication interface 303 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 304, a display screen 305, and a power source 306.
The communication interface 303 may be used to connect at least one peripheral device related to I/O (Input/Output) to the processor 301 and the memory 302. In some embodiments, processor 301, memory 302, and communication interface 303 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 301, the memory 302 and the communication interface 303 may be implemented on a single chip or circuit board, which is not limited in this embodiment.
The Radio Frequency circuit 304 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 304 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 304 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 304 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 304 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 304 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.
The display screen 305 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 305 is a touch display screen, the display screen 305 also has the ability to capture touch signals on or over the surface of the display screen 305. The touch signal may be input to the processor 301 as a control signal for processing. At this point, the display screen 305 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display screen 305 may be one, the front panel of the electronic device; in other embodiments, the display screens 305 may be at least two, respectively disposed on different surfaces of the electronic device or in a folded design; in still other embodiments, the display screen 305 may be a flexible display screen disposed on a curved surface or a folded surface of the electronic device. Even further, the display screen 305 may be arranged in a non-rectangular irregular figure, i.e. a shaped screen. The Display screen 305 may be made of LCD (liquid crystal Display), OLED (Organic Light-Emitting Diode), and the like.
The power supply 306 is used to power various components in the electronic device. The power source 306 may be alternating current, direct current, disposable or rechargeable. When the power source 306 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.
It should be noted that the above-described embodiments of the apparatus are merely schematic, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiment of the apparatus provided by the present invention, the connection relationship between the modules indicates that there is a communication connection between them, and may be specifically implemented as one or more communication buses or signal lines. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that the present invention may be implemented by software plus necessary general hardware, and may also be implemented by special hardware including special integrated circuits, special CPUs, special memories, special components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions may be various, such as analog circuits, digital circuits, or dedicated circuits. However, the implementation of a software program is a more preferable embodiment for the present invention. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, where the computer software product is stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a Read-only memory (ROM), a random-access memory (RAM), a magnetic disk or an optical disk of a computer, and includes instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.
It is to be understood that the above description is intended to be illustrative, and not restrictive. Many embodiments and many applications other than the examples provided will be apparent to those of skill in the art upon reading the above description. The scope of the present teachings should, therefore, be determined not with reference to the above description, but should instead be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled. The disclosures of all articles and references, including patent applications and publications, are hereby incorporated by reference for all purposes. The omission in the foregoing claims of any aspect of subject matter that is disclosed herein is not intended to forego such subject matter, nor should the inventors be construed as having contemplated such subject matter as being part of the disclosed subject matter.