WO2022222716A1 - 化工知识图谱的构建方法及装置以及智能问答方法及装置 - Google Patents

化工知识图谱的构建方法及装置以及智能问答方法及装置 Download PDF

Info

Publication number
WO2022222716A1
WO2022222716A1 PCT/CN2022/083978 CN2022083978W WO2022222716A1 WO 2022222716 A1 WO2022222716 A1 WO 2022222716A1 CN 2022083978 W CN2022083978 W CN 2022083978W WO 2022222716 A1 WO2022222716 A1 WO 2022222716A1
Authority
WO
WIPO (PCT)
Prior art keywords
knowledge
data
graph
entity
chemical
Prior art date
Application number
PCT/CN2022/083978
Other languages
English (en)
French (fr)
Inventor
杜文莉
唐漾
王冰
Original Assignee
华东理工大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华东理工大学 filed Critical 华东理工大学
Priority to US18/556,617 priority Critical patent/US20240256924A1/en
Publication of WO2022222716A1 publication Critical patent/WO2022222716A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Definitions

  • the invention relates to the technical fields of knowledge graphs, natural language processing and the like, in particular to a method for constructing a knowledge graph of chemical industry, a device for constructing a graph of chemical industry knowledge, an intelligent question answering method for chemical industry knowledge, and an intelligent question answering device for chemical industry knowledge , and two computer-readable storage media.
  • the chemical industry plays an important role in promoting the construction of the Belt and Road Initiative, the national "13th Five-Year Plan", my country's economic development, and improving the competitiveness of my country's economy in the international market.
  • the level of technical equipment in the chemical industry is also improving. This creates conditions for enterprises to reduce energy consumption, reduce pollution and improve efficiency. Through the comprehensive utilization of resources and energy, good economic and social benefits have been obtained.
  • the chemical industry is also one of the more dangerous industries in my country. The result of chemical accidents is directly related to the personal safety of the people and the national economic and property safety.
  • the present invention provides a method for constructing a chemical knowledge graph, a device for constructing a chemical knowledge graph, an intelligent question answering method for chemical knowledge, and a chemical Knowledge-based intelligent question answering device, and two computer-readable storage media.
  • the construction method of the above-mentioned chemical industry knowledge graph includes the following steps: acquiring knowledge data in the field of chemical industry; preprocessing the knowledge data to obtain entity data and attribute data related to the chemical industry knowledge therein; Determine a preliminary knowledge representation according to the entity data and the attribute data; perform entity alignment on the preliminary knowledge representation to obtain a standard knowledge representation; and construct the chemical industry knowledge graph according to the standard knowledge representation.
  • the construction method of the chemical knowledge graph can automatically collect the relevant knowledge in the chemical field to construct the chemical knowledge graph based on natural language processing, big data and artificial intelligence technology, thereby greatly improving the construction speed of the chemical field knowledge graph and reducing the chemical knowledge.
  • the manual construction cost of the graph is acquiring knowledge data in the field of chemical industry; preprocessing the knowledge data to obtain entity data and attribute data related to the chemical industry knowledge therein; Determine a preliminary knowledge representation according to the entity data and the attribute data; perform entity alignment on the preliminary knowledge representation to obtain a standard knowledge representation; and construct the chemical industry knowledge graph according to the standard knowledge representation.
  • the knowledge data may include structured data, semi-structured data and/or unstructured data.
  • the step of preprocessing the knowledge data may include: performing data integration on the structured data to obtain entity data and attribute data related to chemical knowledge therein; and/or performing data integration on the semi-structured data and /or performing knowledge extraction on the unstructured data to obtain entity data and attribute data related to chemical knowledge therein.
  • the attribute data may include data attribute data and relational attribute data.
  • the data attribute data is used to describe the attribute value of one of the entity data in the same preliminary knowledge representation.
  • the relationship attribute data is used to describe the relationship between the two entity data in the same preliminary knowledge representation.
  • the step of determining the preliminary knowledge representation according to the entity data and the attribute data includes: entity-data attribute-attribute value or first entity-relation attribute-second
  • a preliminary knowledge representation in the form of a triple is constructed according to the acquired entity data and the attribute data.
  • the step of performing entity alignment on the preliminary knowledge representation to obtain a standard knowledge representation may include: analyzing a plurality of preliminary knowledge representations to determine that the same chemical entity is indicated therein. and decompose the plurality of different entity data indicating the same chemical entity into the same entity data to obtain a standard knowledge representation that uses the same entity data to indicate the same chemical entity.
  • the step of constructing the chemical industry knowledge graph according to the standard knowledge representation includes: according to the entity data and the attribute data in the standard knowledge representations Carry out knowledge discovery to obtain at least one standard knowledge representation with high reliability; perform knowledge reasoning according to the entity data and the attribute data in the multiple standard knowledge representations to obtain multiple unknown reliability standards knowledge representation; performing quality assessment on the plurality of standard knowledge representations of unknown reliability to determine a high-reliability standard knowledge representation among them; and constructing the chemical engineering knowledge according to each of the high-reliability standard knowledge representations Atlas.
  • the step of performing quality assessment on the plurality of standard knowledge representations of unknown reliability may include: comparing the plurality of standard knowledge representations of unknown reliability with performing text matching on the knowledge data in the chemical industry to obtain the text matching degrees of each of the standard knowledge representations respectively; and determining the standard knowledge representations whose text matching degrees are higher than a preset matching degree threshold as the highly credible A standard knowledge representation of degrees.
  • the apparatus for constructing the chemical knowledge graph provided according to the second aspect of the present invention includes a memory and a processor.
  • the processor is connected to the memory and is configured to implement the method for constructing a chemical knowledge graph provided by the first aspect of the present invention.
  • the construction device can automatically collect relevant knowledge in the chemical industry to construct a chemical knowledge map based on natural language processing, big data and artificial intelligence technologies, thereby greatly improving the construction speed of the chemical industry knowledge map and reducing Manual construction cost of chemical knowledge graph.
  • the above computer-readable storage medium provided according to the third aspect of the present invention has computer instructions stored thereon.
  • the method for constructing a chemical knowledge graph provided by the first aspect of the present invention is implemented.
  • the computer-readable storage medium can automatically collect relevant knowledge in the chemical industry to construct a chemical knowledge map based on natural language processing, big data and artificial intelligence technologies, thereby greatly improving the construction speed of the chemical industry knowledge map , and reduce the manual construction cost of chemical knowledge graph.
  • the above-mentioned intelligent question answering method for chemical industry knowledge includes the following steps: obtaining a question raised by a user; preprocessing the question, identifying the question entity data and question attribute data related to the chemical industry knowledge therein, and Identify the intent of the problem; determine the first graph entity data associated with the problem entity data from each graph entity data of the chemical industry knowledge graph, wherein the chemical industry knowledge graph is the chemical industry provided by the first aspect of the present invention.
  • the intelligent question answering method provides the intelligent question answering function in the chemical industry through the combination of chemical knowledge map and natural language processing technology, and can more accurately understand the user's question and answer. real needs, so as to provide more accurate and efficient solutions.
  • the step of identifying the question entity data and question attribute data related to chemical knowledge may include: inputting the question into a pre-trained question parsing module to obtain the relevant Question entity data and question attribute data based on chemical engineering knowledge, wherein the question parsing module is a deep learning model trained based on question samples of chemical engineering knowledge.
  • the question parsing module may include an entity link dictionary and an attribute dictionary.
  • the step of identifying the problem entity data and the problem attribute data related to chemical engineering knowledge therein may further include: inputting the acquired problem entity data into the entity link dictionary, and based on the fuzzy matching of synonyms and/or machine learning.
  • the problem entity data is mapped to data consistent with the description of the chemical industry knowledge graph; and the obtained problem attribute data is input into the attribute dictionary, and the problem attribute data is mapped based on synonyms and/or machine learning fuzzy matching.
  • the chemical knowledge graph describes consistent data.
  • the step of identifying the intent of the question may include: in response to identifying a question entity data related to chemical engineering knowledge and a corresponding question entity data from the question problem attribute data, determining that the intent of the problem is to retrieve the corresponding second entity according to the first entity and the attribute; and in response to identifying two of the problem entity data related to chemical engineering knowledge from the problem, determining that the problem is The intent of the question is to retrieve the corresponding attributes from the first entity and the second entity.
  • the knowledge inference is performed according to the first graph entity data, the intent of the question, and the standard knowledge representation in the chemical knowledge graph to obtain multiple candidate paths.
  • the steps may include: according to the intent of the question and the standard knowledge representation in the chemical knowledge graph, selecting all second graph attribute data or second graph entity data related to the first graph entity data;
  • the first graph entity data is combined with each of the second graph attribute data or each of the second graph entity data to obtain a plurality of candidate paths.
  • all second graph attribute data or The step of the second graph entity data may include: in response to the intent of the question being to retrieve the corresponding second entity according to the first entity and the attribute, selecting all entities related to the first graph according to the standard knowledge representation in the chemical engineering knowledge graph second graph attribute data related to entity data; and in response to the intention of the question to retrieve corresponding attributes according to the first entity and the second entity, select all the attributes related to the first entity and the second entity according to the standard knowledge representation in the chemical knowledge graph Second graph entity data related to graph entity data.
  • the second graph attribute data related to the first graph entity data may include a second graph attribute that is once or twice related to the first graph entity data
  • the first-degree correlation means that the first graph entity data can be related to the second graph attribute data through a piece of the standard knowledge representation
  • the second-degree correlation means that the first graph entity data can be
  • the second graph attribute data is associated with the two standard knowledge representations.
  • the second graph entity data related to the first graph entity data may include second graph entity data that is once or twice related to the first graph entity data, wherein the once related refers to the The first graph entity data can be associated with the second graph entity data through one of the standard knowledge representations, and the second degree of correlation means that the first graph entity data can be associated with the second graph entity data through two of the standard knowledge representations.
  • the second graph entity data may include second graph entity data that is once or twice related to the first graph entity data, wherein the once related refers to the The first graph entity data can be associated with the second graph entity data through one of the standard knowledge representations, and the second degree of correlation means that the first graph entity data can be associated with the second graph entity data through two of the standard knowledge representations.
  • the first atlas entity data is combined with each of the second atlas attribute data or each of the second atlas entity data to obtain multiple candidates.
  • the step of the path may include: in response to the intention of the question being to retrieve the corresponding second entity according to the first entity and the attribute, respectively combining the first graph entity data and each of the selected second graph attribute data, In order to obtain a plurality of candidate paths; and in response to the intention of the question to retrieve corresponding attributes according to the first entity and the second entity, the first map entity data and the selected second map entity data are respectively performed. combined to obtain multiple candidate paths.
  • the step of separately calculating the text matching degree between the multiple candidate paths and the question may include: inputting the question into a word vector pre-trained based on chemical engineering knowledge samples model to obtain the first vector of the problem; input the obtained multiple candidate paths into the word vector model respectively, to obtain the second vector of each of the candidate paths respectively; and calculate each of the second The cosine value of the vector and the first vector is used as the degree of text matching between each of the candidate paths and the question.
  • the step of searching the chemical industry knowledge graph according to the best search path to obtain an answer corresponding to the question may include: according to the best search path searching the chemical industry knowledge graph to determine the corresponding standard knowledge representation; determining the position of the answer in the standard knowledge representation according to the intent of the question; and arranging the answer in combination with the question to obtain a standard knowledge representation form of answer.
  • the intelligent question answering method may further include the following step: returning the answer in the standard form to the user.
  • the above-mentioned intelligent question answering device for chemical industry knowledge includes a memory and a processor.
  • the processor is connected to the memory and is configured to implement the intelligent question answering method for chemical engineering knowledge provided by the second aspect of the present invention.
  • the intelligent question answering device can provide the intelligent question answering function in the chemical industry through the combination of chemical knowledge graph and natural language processing technology, and can more accurately understand the real needs of users, thereby providing more accurate, more efficient solution.
  • the above computer-readable storage medium provided according to the sixth aspect of the present invention has computer instructions stored thereon.
  • the computer instructions When the computer instructions are executed by the processor, the above-mentioned intelligent question answering method for chemical engineering knowledge provided by the second aspect of the present invention can be implemented.
  • the computer-readable storage medium can provide the intelligent question answering function in the chemical industry through the combination of chemical knowledge graph and natural language processing technology, and can more accurately understand the real needs of users, thereby providing more accurate questions and answers. Accurate, more efficient solutions.
  • FIG. 1 shows a schematic structural diagram of an intelligent question answering device based on chemical knowledge graph provided according to some embodiments of the present invention.
  • FIG. 2 shows a schematic flowchart of constructing a chemical industry knowledge graph according to some embodiments of the present invention.
  • FIG. 3 shows a schematic flowchart of performing intelligent question and answer according to some embodiments of the present invention.
  • the terms “installed”, “connected” and “connected” should be understood in a broad sense, unless otherwise expressly specified and limited, for example, it may be a fixed connection or a detachable connection Connection, or integral connection; can be mechanical connection, can also be electrical connection; can be directly connected, can also be indirectly connected through an intermediate medium, can be internal communication between two elements.
  • installed should be understood in a broad sense, unless otherwise expressly specified and limited, for example, it may be a fixed connection or a detachable connection Connection, or integral connection; can be mechanical connection, can also be electrical connection; can be directly connected, can also be indirectly connected through an intermediate medium, can be internal communication between two elements.
  • first,” “second,” “third,” etc. may be used herein to describe various components, regions, layers and/or sections, these components, regions, layers and/or sections These terms should not be limited and are only used to distinguish different components, regions, layers and/or sections. Thus, a first component, region, layer and/or section discussed below could be termed a second component, region, layer and/or section without departing from some embodiments of the present invention.
  • the chemical industry is one of the more dangerous industries in our country at present, and the result of chemical accidents is directly related to the personal safety of the people and the economic and property safety of the country. Due to the wide range, variety, and large number of knowledge sources in the chemical industry, it is difficult for those skilled in the art to fully grasp the relevant knowledge of all branches. In the event of an emergency, technicians often do not know how to deal with it. In response to this problem, the prior art provides some solutions to realize intelligent question answering based on retrieval technology or deep learning matching technology. However, these existing technologies do not involve knowledge in the chemical field, so it is difficult to directly apply them to the chemical field. The real needs of users can not really solve the problems of users in the field of chemical industry.
  • the present invention provides a concept of combining knowledge graph and natural language processing technology in the chemical industry. Compared with the prior art based on retrieval technology or deep learning matching technology, the present invention can construct a knowledge graph in the chemical field by mining data in the chemical field, and perform inference-based intelligent question and answer based on the constructed chemical knowledge graph, so it can be more efficient. It can help the chemical production industry personnel to make decisions and solve some complex problems quickly, thereby reducing the incidence of safety accidents and better protecting the interests of enterprises and the country.
  • the present invention provides a method for constructing a chemical knowledge graph, a device for constructing a chemical knowledge graph, an intelligent question answering method for chemical knowledge, an intelligent question answering device for chemical knowledge, and two computer-readable storage medium.
  • the method for constructing a chemical knowledge graph provided by the first aspect of the present invention may be implemented by the apparatus for constructing a chemical knowledge graph provided by the second aspect of the present invention.
  • a memory and a processor may be configured in the construction device.
  • the memory includes, but is not limited to, the computer-readable storage medium provided by the third aspect of the present invention, on which computer instructions are stored.
  • the processor is connected to the memory, and is configured to execute computer instructions stored on the memory, so as to implement the method for constructing a chemical knowledge graph provided by the first aspect of the present invention.
  • the intelligent question answering method for chemical knowledge provided by the fourth aspect of the present invention can be implemented by the intelligent question answering device for chemical knowledge provided by the fifth aspect of the present invention.
  • the smart question answering device may also be configured with a memory and a processor.
  • the memory includes, but is not limited to, the computer-readable storage medium provided by the sixth aspect of the present invention, on which computer instructions are stored.
  • the processor is connected to the memory and is configured to execute computer instructions stored on the memory to implement the intelligent question answering method for chemical engineering knowledge provided by the fourth aspect of the present invention.
  • FIG. 1 shows a schematic diagram of the architecture of an intelligent question answering system based on chemical knowledge graph provided according to some embodiments of the present invention.
  • the above-mentioned intelligent question answering device 10 provided by the fifth invention of the present invention may be configured with a question preprocessing module 12, a question analysis module 13, a question post-processing module 14, an auxiliary dictionary 15, and this
  • the second aspect of the invention provides the above-mentioned apparatus 11 for constructing a knowledge graph of chemical industry.
  • the constructing device 11 is configured inside the smart question answering device 10 in the form of a module.
  • the constructing device 11 is connected to the smart question answering device 10 through communication interface connection, data line connection, wireless network connection, etc. from the outside temporarily or for a long time.
  • FIG. 2 shows a schematic flowchart of constructing a chemical industry knowledge graph according to some embodiments of the present invention.
  • the construction device 11 can first obtain the original knowledge data in the chemical industry through a human-computer interaction interface, a communication interface with an external storage medium and/or a network interface.
  • the original knowledge data can be either triple structured data satisfying the form of "subject-verb-object", semi-structured data recorded in other structures, or unstructured data recorded in natural language.
  • the above-mentioned original knowledge data in the chemical industry includes, but is not limited to, knowledge related to chemical processes.
  • the construction device 11 may first preprocess the original knowledge data to construct an initial data set, and then determine a preliminary ontology knowledge representation according to the constructed initial data set. Specifically, for triple structured data that satisfies the form of “subject-predicate-object”, the construction device 11 can perform data integration, and directly add the entity data and attribute data related to chemical knowledge to the The initial data set is used as the preliminary knowledge representation of the structured data.
  • the construction device 11 needs to first perform knowledge extraction on them, extract entity data and attribute data related to chemical knowledge, and then extract The entity data and attribute data associated with each other are added to the initial data set as the preliminary knowledge representation of these semi-structured and unstructured data.
  • the main products of the coking unit are dry gas, liquefied gas, gasoline, diesel oil, wax oil and coke.
  • the products of the coking unit are all semi-finished products, which require further processing by downstream units, and do not have high requirements on product properties.
  • the above relationship attribute data is only a non-limiting example of attribute data, and does not limit the protection scope of the present invention.
  • the above property data may further include data property (Data Property) data for describing the property value of a corresponding entity data, such as "gasoline, density, 0.7-0.78".
  • the construction device 11 can The preliminary knowledge representation performs entity alignment to obtain a standard knowledge representation of multiple unified forms.
  • the entity alignment of chemical knowledge mainly includes the operation of coreference resolution, which is used to solve the problem that multiple attributes point to the same named entity.
  • the construction device 11 can perform coreference resolution on these two entities, and resolve all data attributes and relationship attributes related to the data of these two entities to refer to the same entity (for example, "raw material pump P2101"), so as to solve the problem that multiple attributes point to the same entity. Problems with named entities.
  • the construction device 11 can perform knowledge discovery and knowledge reasoning according to these standard knowledge representations to obtain new chemical knowledge, and use the New knowledge with high credibility is incorporated into the constructed chemical knowledge graph.
  • the above-mentioned knowledge discovery refers to the process of masking the tedious details of the original data and identifying effective, novel, potentially useful and understandable knowledge from the data set.
  • the new knowledge obtained by this method is often highly credible.
  • the above-mentioned knowledge reasoning refers to the process of obtaining new knowledge or conclusions that satisfy semantics through various methods. This method can often obtain unexpected new knowledge, but it cannot guarantee the credibility of the new knowledge.
  • the pumping amount of the raw material pump P2101 is too small, and as a result, the liquid level of the raw material buffer tank is too high;
  • the bottom circulation flow of fractionation tower C-9102 is too small. The reason is that the valve FV9133 is closed due to the failure of FIC9133 of the bottom circulation flow control loop of fractionation tower C-9102;
  • heating furnace F9101 is equipped with low feed flow interlock: when the feed flow is lower than 27.5T/H, the main burner of this group of heating furnaces is extinguished;
  • the construction device 11 can be found by means of knowledge, combined with "fractionator C-9102 bottom circulation flow is too small, the reason is that the fractionation tower C-9102 bottom circulation flow control loop FIC9133 failure, resulting in valve FV9133 closed small", and "fractionation tower C-9102 bottom circulation flow control loop FIC9133 failure"
  • Column C-9102 bottom circulation flow is too small, safety measures, heating furnace F9101 is equipped with low feed flow interlock: when the feed flow is lower than 27.5T/H, the main burner of this group of heating furnaces will be turned off", these two
  • the failure of FIC9133 of the bottom circulating flow control loop of the fractionation tower C-9102 caused the valve FV9133 to be closed, safety measures, and the heating furnace F9101 has a low feed flow interlock: the feed flow rate is lower than 27.5T/H
  • the construction device 11 can also infer that "the liquid level of the raw material buffer tank is too high” according to the semantics represented by the known standard knowledge of "the amount of raw material added is too large, the consequence is that the liquid level of the raw material buffer tank is too high” by means of knowledge reasoning , the reason, the new knowledge of the amount of raw material added is too large; according to the semantics expressed by the known standard knowledge of "the pumping amount of the raw material pump P2101 is too small, the consequence, the liquid level of the raw material buffer tank is too high", it is inferred that "the liquid level of the raw material buffer tank is too high” High, the reason, the raw material pump P2101 pumping volume is too small” new knowledge; combined with the “raw material adding volume is too large, the consequence, the raw material buffer tank liquid level is too high” and “the raw material pump P2101 pumping volume is too small, the consequence, the raw material The semantics represented by the known standard knowledge of "the liquid level of the buffer tank is too high", infers the new knowledge of "
  • the construction device 11 may perform text matching on a plurality of standard knowledge representations of unknown reliability obtained through knowledge reasoning with the original knowledge data, respectively, to obtain The text matching degree of each standard knowledge representation.
  • the text matching degree in response to "The liquid level of the raw material buffer tank is too high, the reason is that the amount of raw material added is too large” and "The liquid level of the raw material buffer tank is too high, the reason is that the pumping volume of the raw material pump P2101 is too small" is lower than the preset matching degree. threshold, which the construction device 11 can determine as a low-confidence standard knowledge representation.
  • the construction device 11 may Standard knowledge representation identified as high confidence.
  • the construction device 11 can construct a chemical knowledge graph based on the high-reliability standard knowledge representation obtained through quality evaluation screening and the above-mentioned high-reliability standard knowledge representation obtained through knowledge discovery, so as to provide the intelligent knowledge of chemical industry
  • the question and answer device 10 makes a call.
  • the construction device 11 may continuously acquire chemical knowledge data during the use of the intelligent question answering device 10 to form a new high-reliability standard knowledge representation, and use the newly formed standard knowledge The representation is added to the constructed chemical knowledge graph in real time to update the chemical knowledge graph.
  • the intelligent question answering device 10 can automatically collect the relevant knowledge in the chemical industry during its daily use, and build the relevant knowledge in the chemical industry based on natural language processing, big data and artificial intelligence technologies. To the chemical knowledge map, so as to further improve the comprehensiveness, accuracy and real-time nature of chemical knowledge in the chemical knowledge map.
  • FIG. 3 shows a schematic flowchart of a smart question and answer provided according to some embodiments of the present invention.
  • the intelligent question answering device 10 can quickly and accurately provide corresponding answers to the questions raised by the user in the chemical field based on the constructed chemical knowledge graph.
  • the intelligent question and answer device 10 can first obtain the question input by the user through the human-computer interaction interface such as a keyboard and a microphone, and then use the question preprocessing module 12 to parse the question sentence to Identify the problem entity data and problem attribute data related to chemical knowledge therein, and identify the intent of the problem.
  • the above step of question parsing may be implemented by a pre-trained question parsing module.
  • the intelligent question answering device 10 can first use the pre-trained voice recognition module and semantic recognition module to convert the voice data into corresponding text data, and then input the converted text data into The pre-trained question parsing module is used to identify the question entity data and question attribute data related to chemical knowledge, and to identify the intent of the question.
  • the above-mentioned speech recognition module and semantic recognition module are the prior art in the art, and details are not described herein again.
  • a deep learning model can be used. Technicians can first make a large number of chemical knowledge question samples by labeling relevant knowledge in the chemical field, and then train the question parsing module based on these chemical knowledge question samples, so that they can obtain entity data and attribute data from chemical knowledge. function.
  • the property data may include relational property (Relation Property) data and data property (Data Property) data, wherein the relational property data is used to describe the relationship property between two corresponding entities, and the data property data is used to describe a A property value corresponding to an entity.
  • the question parsing module can identify the entity data of "low return flow of fractionator C-9102", and "How to cause” attribute data.
  • the intelligent question answering device 10 can use the auxiliary dictionary module 15 to further map and transform the identified entity data and attribute data.
  • the auxiliary dictionary module 15 may be configured with an entity link dictionary and an attribute dictionary.
  • the intelligent question answering device 10 may first call the entity link dictionary to check whether there is a synonym of the entity data recorded therein. If a synonym of the entity data is recorded in the entity link dictionary, the intelligent question answering device 10 can use the synonym to replace the entity data, so as to map the question entity data into data consistent with the chemical knowledge graph description.
  • the intelligent question answering device 10 can further query the entity link dictionary for the relevant records that meet the fuzzy matching rules based on the fuzzy matching technology of machine learning, and use the fuzzy matching to find the relevant records in the entity link dictionary.
  • the relevant records (such as "fractionation tower C-9102 bottom circulation flow is too low") to replace the entity data, so as to map the problem entity data into data consistent with the description of the chemical knowledge map.
  • the intelligent question answering device 10 may also call the attribute dictionary, and based on synonyms and/or machine learning fuzzy matching technology, the "how to cause” attribute The data is mapped to the attribute data of the "reason" recorded in the knowledge graph.
  • the question parsing module can identify the intent of the question based on the identified question entity data and question attribute data. Specifically, for the above-mentioned embodiment, in response to identifying a problem entity data related to chemical knowledge from the problem (ie "fractionator C-9102 bottom recycle flow is too low"), and a corresponding problem attribute data ( That is, "reason"), the question parsing module can determine that the intent of the question is to retrieve the corresponding second entity according to the first entity and attributes. Optionally, in some other embodiments, the question parsing module may also determine that the intent of the question is to search according to the first entity and the second entity in response to identifying two question entity data related to chemical knowledge from the question. The corresponding properties will not be repeated here.
  • the intelligent question answering device 10 can use the question analysis and reasoning module 13 to first compare the results output by the question preprocessing module 12 with the chemical industry knowledge graph The relevant standard knowledge representation in the chemical knowledge graph is combined to carry out knowledge inference, and then the candidate path for obtaining the answer to the question is obtained.
  • the problem analysis and reasoning module 13 can first query the chemical industry knowledge map according to the problem entity data to associate it with the corresponding first map entity data in the chemical industry knowledge map, and then determine all the chemical knowledge map related to the first map entity data in the chemical knowledge map. Multiple standard knowledge representations of a graph entity data. Afterwards, the problem analysis and reasoning module 13 can select all the second graph attribute data related to the first graph entity data through the plurality of standard knowledge representations based on the above-mentioned intention of retrieving the corresponding second entity according to the first entity and the attribute, and then The first map entity data and each second map attribute data are respectively combined to obtain a plurality of candidate paths.
  • the problem analysis and reasoning module 13 can first compare it with the "fractionator C-9102 bottom circulation flow rate" recorded in the chemical knowledge map.
  • the first graph entity data of “less” is associated, and then all standard knowledge representations related to the first graph entity data are queried in the chemical knowledge graph.
  • the standard knowledge representation related to the first map entity data of the "fractionator C-9102 bottom recycle flow is too low” may include: “fractionator C-9102 bottom recycle flow is too low, the reason, Fractionation tower C-9102 bottom circulating flow control loop FIC9133 failure”; “fractionating tower C-9102 bottom circulating reflux flow is too small, as a result, fractionation tower C-9102 bottom coking causes heating furnace feed flow fluctuation”; and “fractionation tower C-9102 The bottom circulating flow of tower C-9102 is too small, safety measures, heating furnace F9101 is equipped with low feed flow interlock: when the feed flow is lower than 27.5T/H, the main burner of this group of heating furnaces will be extinguished.
  • the problem analysis and reasoning module 13 can select all the second graph attribute data (i.e. the above “Cause”, “Consequence” and “Safety Measures”) to construct multiple candidate paths. Specifically, the problem analysis and reasoning module 13 can combine the first map entity data and the second map attribute data "cause” of "fractionator C-9102 bottom circulation flow is too low” as “fractionator C-9102 bottom loop" The first candidate path for the cause of insufficient reflux flow”; the first map entity data and the second map attribute data "consequences" of the "fractionator C-9102 bottom circulating reflux flow is too low” can be combined as "fractionator C- 9102 The second candidate path of the consequence of insufficient bottom circulation reflux flow”; the first map entity data of "fractionator C-9102 bottom loop reflux flow is too low” can also be combined with the second map attribute data "safety measures” as The third candidate path for "safety measures for too little bottom recycle flow in fractionation column C-9102".
  • the second graph attribute data related to the first graph entity data not only includes the above-mentioned once-related second graph attribute data (that is, it can be associated with the first graph entity through a standard knowledge representation).
  • the second graph attribute data of the data may also include the second graph attribute data that is twice related to the first graph entity data (that is, the second graph that can be associated with the first graph entity data requires two standard knowledge representations) attribute data).
  • the standard knowledge that the valve FV9133 is turned off indicates that the problem analysis and reasoning module 13 can further reason and obtain the new knowledge of “the bottom circulation flow of the fractionation tower C-9102 is too small, the reason is that the valve FV9133 is turned off”, and determine according to the new knowledge.
  • the second graph attribute data "reason" related to the first graph entity data twice.
  • the problem analysis and reasoning module 13 can combine the first map entity data of "fractionator C-9102 bottom circulation flow is too low" and the second map attribute data "cause” related to the second degree into “distillation tower C- 9102 "The cause of insufficient bottom circulation flow” is the fourth candidate path.
  • the problem analysis and reasoning module 13 may first query the above-mentioned chemical industry knowledge graph according to the first graph entity data, to A plurality of standard knowledge representations in which all relevant data of the first graph entity are determined are determined. After that, the problem analysis and reasoning module 13 can select all the second entities that are once or twice related to the first graph entity data through the multiple standard knowledge representations based on the above-mentioned intention of retrieving the corresponding second entities according to the first entities and attributes.
  • Graphs are physical data.
  • the once-related second graph entity data refers to the second graph entity data to which the first graph entity data can be associated through a standard knowledge representation.
  • the second-degree related second graph entity data refers to the second graph entity data to which the first graph entity data can be associated through two standard knowledge representations.
  • the problem analysis and reasoning module 13 may combine the first graph entity data and the selected second graph entity data respectively to form a plurality of candidate paths. The combination of these candidate paths is the same as that in the above-mentioned embodiment, which is not repeated here.
  • the intelligent question answering device 10 can use the post-question processing module 14 to perform path matching on the multiple candidate paths to determine the best search path among them. Afterwards, the post-question processing module 14 searches the chemical knowledge graph according to the optimal search path to obtain an answer corresponding to the question.
  • the post-question processing module 14 can first input the text of the question raised by the user into the word vector model pre-trained based on chemical knowledge samples, so as to Get the first text vector corresponding to the question. Afterwards, the post-question processing module 14 may input the above-mentioned first to fourth candidate paths into the word vector model respectively, so as to obtain the second vector of each candidate path respectively. After that, the post-question processing module 14 can calculate the cosine values of each second vector and the first vector respectively, as the text matching degree of each candidate path and the question.
  • the text matching degree of the first candidate path "The reason why the bottom circulating flow rate of fractionating tower C-9102 is too low” is 0.98
  • the text matching degree of the second candidate path “Consequences of too little bottom circulation flow in fractionation tower C-9102” is 0.85
  • the text matching of the third candidate path “safety measures for too little bottom circulation flow in fractionation tower C-9102”
  • the degree of text matching is 0.74
  • the text matching degree of the fourth candidate path "The reason why the bottom circulation flow rate of fractionation tower C-9102 is too low” is also 0.98.
  • the problem post-processing module 14 can select the first candidate path and the fourth candidate path with the largest text matching degree according to the ranking of the text matching degree (namely, the entity: the fractionating tower C-9102 bottom circulation flow rate is too small; attributes: reason) is the best search path. Afterwards, the problem post-processing module 14 can search the chemical knowledge map according to the optimal search path to determine the corresponding standard knowledge representation, that is, “The bottom circulation flow rate of fractionation tower C-9102 is too small, the reason is that the bottom circulation flow of fractionation tower C-9102 is too small. Reflux flow control loop FIC9133 failure" and "The bottom circulating reflux flow of fractionation tower C-9102 is too small, the reason is that valve FV9133 is closed small”.
  • the post-question processing module 14 can determine that the answer is located in the second entity represented by the relevant standard knowledge, that is, the above-mentioned "fractionator C-9102 bottom circulation reflux flow rate" based on the above-mentioned intention to retrieve the corresponding second entity according to the first entity and the attribute. Control circuit FIC9133 failure" and "valve FV9133 closed”. Finally, the post-question processing module 14 can sort out the obtained answers in combination with the above questions, so as to obtain an answer in the standard form of “Failure of the bottom circulation flow control circuit FIC9133 of the fractionation tower C-9102, causing the valve FV9133 to be closed small”, and put the answer in the standard form. Answers in standard form are returned to the user through human-computer interaction interfaces such as speakers or display screens.
  • the above-mentioned intelligent question answering device 10 can combine the chemical industry knowledge graph and natural language processing technology in the chemical industry to provide the technical personnel in the chemical industry with the intelligent question answering function of chemical knowledge.
  • the present invention combines chemical knowledge map to perform reasoning intelligent question and answer, which can more accurately and efficiently understand the real needs of technicians in the chemical field, assisting Technicians in the chemical industry make decisions and solve complex problems quickly, thereby reducing the rate of safety incidents and better protecting business and national interests.
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable gate arrays
  • a general-purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine.
  • a processor may also be implemented as a combination of computing devices, such as a combination of a DSP and a microprocessor, multiple microprocessors, one or more microprocessors cooperating with a DSP core, or any other such configuration.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明涉及一种化工知识图谱的构建方法及装置、一种化工知识的智能问答方法及装置,以及两种计算机可读存储介质。该构建方法包括以下步骤:获取化工领域的知识数据;对所述知识数据进行预处理,以获取其中相关于化工知识的实体数据及属性数据;根据所述实体数据及所述属性数据确定初步知识表示;对所述初步知识表示进行实体对齐以获取标准知识表示;以及根据所述标准知识表示构建所述化工知识图谱。该化工知识图谱的构建方法能够基于自然语言处理、大数据及人工智能技术,自动收集化工领域的相关知识来构建化工知识图谱,从而大幅度地提升化工领域知识图谱的构建速度,并降低化工知识图谱的人工构建成本。

Description

化工知识图谱的构建方法及装置以及智能问答方法及装置 技术领域
本发明涉及知识图谱及自然语言处理等技术领域,尤其涉及一种化工知识图谱的构建方法、一种化工知识图谱的构建装置、一种化工知识的智能问答方法、一种化工知识的智能问答装置,以及两种计算机可读存储介质。
背景技术
化工行业作为当前我国重点发展的新兴产业之一,对于一带一路建设、国家“十三五”规划、我国经济发展、提高我国经济在国际市场的竞争力等方面起着重要的推动作用。随着我国经济发展水平的提高,化工行业的技术装备水平也在提高。这为企业降低能耗、减少污染、提高效率创造了条件,通过资源、能源的综合利用,获得了好的经济效益和社会效益。与此同时,化工产业也是我国当前较为危险的行业之一,化工事故的结果直接关乎到人民的人身安全和国家的经济财产安全。
由于化工领域的知识来源广泛、种类繁多、数量较大等特点,本领域的技术人员很难全面地掌握所有分支的相关知识。一旦遇到紧急事件时,技术人员往往不知道该如何去处理。针对这一问题,现有技术提供了一些基于检索技术或者深度学习匹配技术来实现智能问答的方案。然而,这些现有技术一来不涉及化工领域的知识,因此较难直接应用到化工领域;二来无法高效地应对化工领域的知识来源广泛、种类繁多、数量较大等特点,时常不能理解用户的真实需求,不能真正地解决用户化工领域的问题。
因此,本领域亟需一种完整、准确、高效的化工知识管理技术,用于实时地针对化工领域的技术人员面临的问题提供相关的讲解说明及解决办法,以更好地辅助化工领域的技术人员进行决策,并快速地解决一些复杂的化工问题,从而降低安全事故的发生率,并更好地保障人民的人身安全和企业及国家的经济财产安全。
发明内容
以下给出一个或多个方面的简要概述以提供对这些方面的基本理解。此概述 不是所有构想到的方面的详尽综览,并且既非旨在指认出所有方面的关键性或决定性要素亦非试图界定任何或所有方面的范围。其唯一的目的是要以简化形式给出一个或多个方面的一些概念以为稍后给出的更加详细的描述之前序。
为了更好地保障人民的人身安全和国家的经济财产安全,本发明提供了一种化工知识图谱的构建方法、一种化工知识图谱的构建装置、一种化工知识的智能问答方法、一种化工知识的智能问答装置,以及两种计算机可读存储介质。
根据本发明的第一方面提供的上述化工知识图谱的构建方法包括以下步骤:获取化工领域的知识数据;对所述知识数据进行预处理,以获取其中相关于化工知识的实体数据及属性数据;根据所述实体数据及所述属性数据确定初步知识表示;对所述初步知识表示进行实体对齐以获取标准知识表示;以及根据所述标准知识表示构建所述化工知识图谱。该化工知识图谱的构建方法能够基于自然语言处理、大数据及人工智能技术,自动收集化工领域的相关知识来构建化工知识图谱,从而大幅度地提升化工领域知识图谱的构建速度,并降低化工知识图谱的人工构建成本。
优选地,在本发明的一些实施例中,所述知识数据可以包括结构化数据、半结构化数据和/或非结构化数据。所述对所述知识数据进行预处理的步骤可以包括:对所述结构化数据进行数据集成,以获取其中相关于化工知识的实体数据及属性数据;和/或对所述半结构化数据和/或所述非结构化数据进行知识抽取,以获取其中相关于化工知识的实体数据及属性数据。
优选地,在本发明的一些实施例中,所述属性数据可以包括数据属性数据及关系属性数据。所述数据属性数据用于描述同一初步知识表示中的一个所述实体数据的属性值。所述关系属性数据用于描述同一初步知识表示中的两个所述实体数据之间的关系。
优选地,在本发明的一些实施例中,所述根据所述实体数据及所述属性数据确定初步知识表示的步骤包括:以实体-数据属性-属性值或第一实体-关系属性-第二实体的形式,根据获取的所述实体数据及所述属性数据构建三元组形式的初步知识表示。
可选地,在本发明的一些实施例中,所述对所述初步知识表示进行实体对齐以获取标准知识表示的步骤可以包括:分析多条所述初步知识表示,以确定其中 指示同一化工实体的多条不同的实体数据;以及将所述指示同一化工实体的多条不同的实体数据消解为同一实体数据,以获取利用同一实体数据指示同一化工实体的标准知识表示。
可选地,在本发明的一些实施例中,所述根据所述标准知识表示构建所述化工知识图谱的步骤包括:根据多条所述标准知识表示中的所述实体数据及所述属性数据进行知识发现,以获取至少一条高可信度的标准知识表示;根据多条所述标准知识表示中的所述实体数据及所述属性数据进行知识推理,以获取多条未知可信度的标准知识表示;对所述多条未知可信度的标准知识表示进行质量评估,以确定其中高可信度的标准知识表示;以及根据各所述高可信度的标准知识表示构建所述化工知识图谱。
优选地,在本发明的一些实施例中,所述对所述多条未知可信度的标准知识表示进行质量评估的步骤可以包括:将所述多条未知可信度的标准知识表示分别与所述化工领域的知识数据进行文本匹配,以分别获取各所述标准知识表示的文本匹配度;以及将其中文本匹配度高于预设的匹配度阈值的标准知识表示确定为所述高可信度的标准知识表示。
根据本发明的第二方面提供的上述化工知识图谱的构建装置包括存储器及处理器。所述处理器连接所述存储器,并被配置用于实施本发明的第一方面所提供的化工知识图谱的构建方法。通过实施该构建方法,该构建装置能够基于自然语言处理、大数据及人工智能技术,自动收集化工领域的相关知识来构建化工知识图谱,从而大幅度地提升化工领域知识图谱的构建速度,并降低化工知识图谱的人工构建成本。
根据本发明的第三方面提供的上述计算机可读存储介质,其上存储有计算机指令。所述计算机指令被处理器执行时,实施本发明的第一方面所提供的化工知识图谱的构建方法。通过实施该构建方法,该计算机可读存储介质能够基于自然语言处理、大数据及人工智能技术,自动收集化工领域的相关知识来构建化工知识图谱,从而大幅度地提升化工领域知识图谱的构建速度,并降低化工知识图谱的人工构建成本。
根据本发明的第四方面提供的上述化工知识的智能问答方法包括以下步骤:获取用户提出的问题;对所述问题进行预处理,识别其中相关于化工知识的问题 实体数据及问题属性数据,并识别所述问题的意图;从化工知识图谱的各图谱实体数据中确定关联于所述问题实体数据的第一图谱实体数据,其中,所述化工知识图谱是由本发明的第一方面提供的上述化工知识图谱的构建方法所构建;根据所述第一图谱实体数据、所述问题的意图及所述化工知识图谱中的标准知识表示进行知识推理,以获取多条候选路径;分别计算所述多条候选路径与所述问题的文本匹配度,并选取文本匹配度最大的候选路径为最佳搜索路径;以及根据所述最佳搜索路径搜索所述化工知识图谱,以获得对应于所述问题的答案。相比于基于检索技术或者深度学习匹配技术进行智能问答的现有技术,该智能问答方法通过化工知识图谱与自然语言处理技术相结合来提供化工领域的智能问答功能,能够更准确地理解用户的真实需求,从而提供更准确、更有效的解决方案。
优选地,在本发明的一些实施例中,所述识别其中相关于化工知识的问题实体数据及问题属性数据的步骤可以包括:将所述问题输入预先训练的问句解析模块,以获取其中相关于化工知识的问题实体数据及问题属性数据,其中,所述问句解析模块是基于化工知识的问题样本所训练的深度学习模型。
优选地,在本发明的一些实施例中,所述问句解析模块中可以包括实体链接词典及属性词典。所述识别其中相关于化工知识的问题实体数据及问题属性数据的步骤还可以包括:将获取的所述问题实体数据输入所述实体链接词典,基于同义词和/或机器学习的模糊匹配将所述问题实体数据映射为与所述化工知识图谱描述一致的数据;以及将获取的所述问题属性数据输入所述属性词典,基于同义词和/或机器学习的模糊匹配将所述问题属性数据映射为与所述化工知识图谱描述一致的数据。
可选地,在本发明的一些实施例中,所述识别所述问题的意图的步骤可以包括:响应于从所述问题中识别到一个所述相关于化工知识的问题实体数据及一个对应的问题属性数据,判定所述问题的意图为根据第一实体及属性检索对应的第二实体;以及响应于从所述问题中识别到两个所述相关于化工知识的问题实体数据,判定所述问题的意图为根据第一实体及第二实体检索对应的属性。
可选地,在本发明的一些实施例中,所述根据所述第一图谱实体数据、所述问题的意图及所述化工知识图谱中的标准知识表示进行知识推理,以获取多条候选路径的步骤可以包括:根据所述问题的意图及所述化工知识图谱中的标准知识 表示,选择所有与所述第一图谱实体数据相关的第二图谱属性数据或第二图谱实体数据;以及分别将所述第一图谱实体数据与各所述第二图谱属性数据或各所述第二图谱实体数据进行组合,以获取多条候选路径。
优选地,在本发明的一些实施例中,所述根据所述问题的意图及所述化工知识图谱中的标准知识表示,选择所有与所述第一图谱实体数据相关的第二图谱属性数据或第二图谱实体数据的步骤可以包括:响应于所述问题的意图为根据第一实体及属性检索对应的第二实体,根据所述化工知识图谱中的标准知识表示选择所有与所述第一图谱实体数据相关的第二图谱属性数据;以及响应于所述问题的意图为根据第一实体及第二实体检索对应的属性,根据所述化工知识图谱中的标准知识表示选择所有与所述第一图谱实体数据相关的第二图谱实体数据。
优选地,在本发明的一些实施例中,所述与所述第一图谱实体数据相关的第二图谱属性数据可以包括与所述第一图谱实体数据一度相关或二度相关的第二图谱属性数据,其中,所述一度相关是指所述第一图谱实体数据能通过一条所述标准知识表示关联到所述第二图谱属性数据,所述二度相关是指所述第一图谱实体数据能通过两条所述标准知识表示关联到所述第二图谱属性数据。所述与所述第一图谱实体数据相关的第二图谱实体数据可以包括与所述第一图谱实体数据一度相关或二度相关的第二图谱实体数据,其中,所述一度相关是指所述第一图谱实体数据能通过一条所述标准知识表示关联到所述第二图谱实体数据,所述二度相关是指所述第一图谱实体数据能通过两条所述标准知识表示关联到所述第二图谱实体数据。
可选地,在本发明的一些实施例中,所述分别将所述第一图谱实体数据与各所述第二图谱属性数据或各所述第二图谱实体数据进行组合,以获取多条候选路径的步骤可以包括:响应于所述问题的意图为根据第一实体及属性检索对应的第二实体,分别将所述第一图谱实体数据与选择的各所述第二图谱属性数据进行组合,以获取多条候选路径;以及响应于所述问题的意图为根据第一实体及第二实体检索对应的属性,分别将所述第一图谱实体数据与选择的各所述第二图谱实体数据进行组合,以获取多条候选路径。
可选地,在本发明的一些实施例中,所述分别计算所述多条候选路径与所述问题的文本匹配度的步骤可以包括:将所述问题输入基于化工知识样本预先训练 的词向量模型,以获取所述问题的第一向量;将获取的所述多条候选路径分别输入所述词向量模型,以分别获取各所述候选路径的第二向量;以及分别计算各所述第二向量与所述第一向量的余弦值,以作为各所述候选路径与所述问题的文本匹配度。
可选地,在本发明的一些实施例中,所述根据所述最佳搜索路径搜索所述化工知识图谱,以获得对应于所述问题的答案的步骤可以包括:根据所述最佳搜索路径搜索所述化工知识图谱,以确定对应的标准知识表示;根据所述问题的意图确定所述答案在所述标准知识表示中的位置;以及结合所述问题对所述答案进行整理,以获得标准形式的答案。
优选地,在本发明的一些实施例中,所述智能问答方法还可以包括以下步骤:将所述标准形式的答案返回给所述用户。
根据本发明的第五方面提供的上述化工知识的智能问答装置包括存储器及处理器。所述处理器连接所述存储器,并被配置用于实施本发明的第二方面提供的上述化工知识的智能问答方法。通过实施该化工知识的智能问答方法,该智能问答装置能够通过化工知识图谱与自然语言处理技术相结合来提供化工领域的智能问答功能,能够更准确地理解用户的真实需求,从而提供更准确、更有效的解决方案。
根据本发明的第六方面提供的上述计算机可读存储介质,其上存储有计算机指令。所述计算机指令被处理器执行时,能够实施本发明的第二方面提供的上述化工知识的智能问答方法。通过实施该化工知识的智能问答方法,该计算机可读存储介质能够通过化工知识图谱与自然语言处理技术相结合来提供化工领域的智能问答功能,能够更准确地理解用户的真实需求,从而提供更准确、更有效的解决方案。
附图说明
图1示出了根据本发明的一些实施例提供的基于化工知识图谱的智能问答装置的架构示意图。
图2示出了根据本发明的一些实施例提供的构建化工知识图谱的流程示意图。
图3示出了根据本发明的一些实施例提供的进行智能问答的流程示意图。
附图标记
10             智能问答装置;
11             化工知识图谱构建模块;
12             问题预处理模块;
13             问题分析推理模块;
14             问题后处理模块;
15             辅助词典;
S1~S14        步骤。
具体实施方式
以下由特定的具体实施例说明本发明的实施方式,本领域技术人员可由本说明书所揭示的内容轻易地了解本发明的其他优点及功效。虽然本发明的描述将结合优选实施例一起介绍,但这并不代表此发明的特征仅限于该实施方式。恰恰相反,结合实施方式作发明介绍的目的是为了覆盖基于本发明的权利要求而有可能延伸出的其它选择或改造。为了提供对本发明的深度了解,以下描述中将包含许多具体的细节。本发明也可以不使用这些细节实施。此外,为了避免混乱或模糊本发明的重点,有些具体细节将在描述中被省略。
在本发明的描述中,需要说明的是,除非另有明确的规定和限定,术语“安装”、“相连”、“连接”应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或一体地连接;可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连,可以是两个元件内部的连通。对于本领域的普通技术人员而言,可以具体情况理解上述术语在本发明中的具体含义。
另外,在以下的说明中所使用的“上”、“下”、“左”、“右”、“顶”、“底”、“水平”、“垂直”应被理解为该段以及相关附图中所绘示的方位。此相对性的用语仅是为了方便说明之用,其并不代表其所叙述的装置需以特定方位来制造或运作,因此不应理解为对本发明的限制。
能理解的是,虽然在此可使用用语“第一”、“第二”、“第三”等来叙述各种组件、区域、层和/或部分,这些组件、区域、层和/或部分不应被这些用语限定,且这些用语仅是用来区别不同的组件、区域、层和/或部分。因此,以下讨论的第一组件、区域、层和/或部分可在不偏离本发明一些实施例的情况下被称为第二组 件、区域、层和/或部分。
如上所述,化工产业是我国当前较为危险的行业之一,化工事故的结果直接关乎到人民的人身安全和国家的经济财产安全。由于化工领域的知识来源广泛、种类繁多、数量较大等特点,本领域的技术人员很难全面地掌握所有分支的相关知识。一旦遇到紧急事件时,技术人员往往不知道该如何去处理。针对这一问题,现有技术提供了一些基于检索技术或者深度学习匹配技术来实现智能问答的方案。然而,这些现有技术一来不涉及化工领域的知识,因此较难直接应用到化工领域;二来无法高效地应对化工领域的知识来源广泛、种类繁多、数量较大等特点,时常不能理解用户的真实需求,不能真正地解决用户化工领域的问题。
为了更好地保障人民的人身安全和国家的经济财产安全,本发明提供了一种将知识图谱和自然语言处理技术结合应用在化工领域的构思。相比于基于检索技术或者深度学习匹配技术的现有技术,本发明能够通过挖掘化工领域的数据来构建化工领域的知识图谱,并基于构建的化工知识图谱进行推理式的智能问答,因此能够更好地辅助化工生产行业人员进行决策,快速地解决一些复杂问题,从而降低安全事故的发生率并更好地保障企业和国家利益。
具体来说,本发明提供了一种化工知识图谱的构建方法、一种化工知识图谱的构建装置、一种化工知识的智能问答方法、一种化工知识的智能问答装置,以及两种计算机可读存储介质。
在一些非限制性的实施例中,本发明的第一方面所提供的化工知识图谱的构建方法,可以由本发明的第二方面所提供的化工知识图谱的构建装置来实施。该构建装置中可以配置有存储器及处理器。该存储器包括但不限于本发明的第三方面所提供的计算机可读存储介质,其上存储有计算机指令。该处理器连接该存储器,并被配置用于执行该存储器上存储的计算机指令,以实施本发明的第一方面所提供的化工知识图谱的构建方法。
相应地,本发明的第四方面所提供的化工知识的智能问答方法,可以由本发明的第五方面所提供的化工知识的智能问答装置来实施。该智能问答装置中也可以配置有存储器及处理器。该存储器包括但不限于本发明的第六方面所提供的计算机可读存储介质,其上存储有计算机指令。该处理器连接该存储器,并被配置用于执行该存储器上存储的计算机指令,以实施本发明的第四方面所提供的化工 知识的智能问答方法。
请参考图1,图1示出了根据本发明的一些实施例提供的基于化工知识图谱的智能问答系统的架构示意图。
在图1所示的实施例中,本发明的第五发明提供的上述智能问答装置10中可以配置有问题预处理模块12、问题分析模块13、问题后处理模块14、辅助词典15,以及本发明的第二方面提供的上述化工知识图谱的构建装置11。在一些实施例中,该构建装置11以模块的形式配置于智能问答装置10内部。在另一些实施例中,该构建装置11通过通信接口连接、数据线连接、无线网络连接等方式,从外部临时或长期地通信连接智能问答装置10。
请进一步参考图2,图2示出了根据本发明的一些实施例提供的构建化工知识图谱的流程示意图。
如图1及图2所示,在构建化工知识图谱的过程中,构建装置11可以首先通过人机交互接口、与外部存储介质的通信接口和/或网络接口,获取化工领域的原始知识数据。该原始知识数据既可以是满足“主-谓-宾”形式的三元组结构化数据,也可以是以其他结构记载的半结构化数据,或者以自然语言形式记载的非结构化数据。在一些实施例中,上述化工领域的原始知识数据包括但不限于化工过程的相关知识。
在获得化工领域的原始知识数据后,构建装置11可以先对这些原始知识数据进行预处理以构建初始数据集,再根据构建的初始数据集来确定初步的本体化知识表示。具体来说,对于满足“主-谓-宾”形式的三元组结构化数据,构建装置11可以对其进行数据集成,直接将其中相关于化工知识的实体数据及属性数据相互关联地添加到初始数据集中,以作为该结构化数据的初步知识表示。对于不满足“主-谓-宾”形式的半结构化数据及非结构化数据,构建装置11需要先对其进行知识抽取,从中抽取相关于化工知识的实体数据及属性数据,再将抽取获得的实体数据及属性数据相互关联地添加到初始数据集中,以作为这些半结构化数据及非结构化数据的初步知识表示。
以自然语言形式记载的非结构化数据为例:
(1)焦化装置的产品主要有干气、液化气、汽油、柴油、蜡油和焦炭。焦化装置的产品均为半成品,需要下游装置进一步加工,对产品性质要求不高。
(2)原料缓冲罐液位偏高主要原因有:原料加入量过大、泵P-2101抽出量过小、原料带水或管线串汽、罐顶与C-2102连通线不通造成憋压、原料泵P2101或仪表故障。
(3)分馏塔C-9102底循回流流量过少的原因为“分馏塔C-9102底循回流流量控制回路FIC9133故障,导致阀门FV9133关小。”其后果为“分馏塔C-9102塔底结焦,引起加热炉进料流量波动,加热炉F-9101炉管烧穿”。安全措施为“加热炉F9101设有进料低流量联锁:进料流量低于27.5T/H时,熄该组的加热炉主火嘴。”
通过对上述(1)~(3)的原始知识数据进行知识抽取,可以获得“焦化装置,产品,干气”;“焦化装置,产品,液化气”;“焦化装置,产品,汽油”;“原料缓冲罐液位偏高,原因,原料加入量过大”;“原料缓冲罐液位偏高,原因,泵P-2101抽出量过小”等相关于化工知识的实体数据及属性数据,其中,“焦化装置,产品,干气”为一条初步知识表示,“焦化装置”及“干气”为该初步知识表示中的实体数据,而“产品”为关系属性(Relation Property)数据,用于描述“焦化装置”及“干气”这两个实体数据之间的关系属性。
本领域的技术人员可以理解,上述关系属性数据只是属性数据的一种非限制性的实施例,并不对本发明的保护范围构成限制。可选地,在另一些实施例中,上述属性数据还可以包括数据属性(Data Property)数据,用于描述对应的一个实体数据的属性值,例如“汽油,密度,0.7~0.78”。
如图1及图2所示,在完成原始知识数据的预处理,并获得各结构化数据、各半结构化数据及各非架构化数据的多条初步知识表示后,构建装置11可以对这些初步知识表示进行实体对齐,以获得多条统一形式的标准知识表示。在一些实施例中,化工知识的实体对齐主要包括共指消解的操作,用于解决多条属性指向同一命名实体的问题。
举例来说,在“原料泵P2101或仪表故障”及“泵P-2101抽出量过小”的实体数据中,“原料泵P2101”与“泵P-2101”实际为同一实体。构建装置11可以对这两个实体进行共指消解,将涉及这两个实体数据的所有数据属性和关系属性都消解指代为同一实体(例如“原料泵P2101”),从而解决多条属性指向同一命名实体的问题。
如图1及图2所示,在完成实体对齐并获得化工知识的多条标准知识表示后, 构建装置11可以根据这些标准知识表示进行知识发现和知识推理以获取新的化工知识,并将其中可信度较高的新知识纳入已构建的化工知识图谱中。
上述知识发现是指屏蔽原始数据的繁琐细节,从数据集中识别有效、新颖、潜在有用以及可理解知识的过程,由此方法获得的新知识往往可信度较高。上述知识推理是指通过各种方法来获取满足语义的新的知识或结论的过程,由此方法往往能获得意想不到的新知识,但是也无法保证该新知识的可信度。
举例来说,针对完成实体对齐后获得的多条化工知识的标准知识表示:
原料加入量过大,后果,原料缓冲罐液位偏高;
原料泵P2101抽出量过小,后果,原料缓冲罐液位偏高;
分馏塔C-9102底循回流流量过少,原因,分馏塔C-9102底循回流流量控制回路FIC9133故障导致阀门FV9133关小;
分馏塔C-9102底循回流流量过少,安全措施,加热炉F9101设有进料低流量联锁:进料流量低于27.5T/H时熄该组的加热炉主火嘴;
……
构建装置11可以通过知识发现的方式,结合“分馏塔C-9102底循回流流量过少,原因,分馏塔C-9102底循回流流量控制回路FIC9133故障,导致阀门FV9133关小”,以及“分馏塔C-9102底循回流流量过少,安全措施,加热炉F9101设有进料低流量联锁:进料流量低于27.5T/H时,熄该组的加热炉主火嘴”,这两条标准知识表示,发现“分馏塔C-9102底循回流流量控制回路FIC9133故障导致阀门FV9133关小,安全措施,加热炉F9101设有进料低流量联锁:进料流量低于27.5T/H时熄该组的加热炉主火嘴”的新知识。由于该新知识是通过两条已知的标准知识表示通过充分必要的逻辑关系结合获得,其通常具备较高的高可信度。
此外,构建装置11还可以通过知识推理的方式,根据“原料加入量过大,后果,原料缓冲罐液位偏高”的已知标准知识表示的语义,推断出“原料缓冲罐液位偏高,原因,原料加入量过大”的新知识;根据“原料泵P2101抽出量过小,后果,原料缓冲罐液位偏高”的已知标准知识表示的语义,推断出“原料缓冲罐液位偏高,原因,原料泵P2101抽出量过小”的新知识;并结合该“原料加入量过大,后果,原料缓冲罐液位偏高”及“原料泵P2101抽出量过小,后果,原料缓冲罐液位偏高”的 已知标准知识表示的语义,推断出“原料缓冲罐液位偏高,原因,原料加入量过大或原料泵P2101抽出量过小”的新知识。由于这些新知识是通过语义推断获得,其可信度往往无法得到保证,因此需要通过进一步的质量评估来进行筛选。
如图1及图2所示,在本发明的一些实施例中,构建装置11可以将通过知识推理获得的多条未知可信度的标准知识表示分别与原始知识数据进行文本匹配,以分别获取各标准知识表示的文本匹配度。响应于“原料缓冲罐液位偏高,原因,原料加入量过大”及“原料缓冲罐液位偏高,原因,原料泵P2101抽出量过小”的文本匹配度低于预设的匹配度阈值,构建装置11可以将其确定为低可信度的标准知识表示。反之,响应于“原料缓冲罐液位偏高,原因,原料加入量过大或原料泵P2101抽出量过小”的文本匹配度高于或等于预设的匹配度阈值,构建装置11可以将其确定为高可信度的标准知识表示。
之后,构建装置11可以根据这些通过质量评估筛选获得的高可信度的标准知识表示,以及上述通过知识发现获得的高可信度的标准知识表示,构建化工知识图谱,以供化工知识的智能问答装置10进行调用。
进一步地,在一些实施例中,构建装置11还可以在智能问答装置10的使用过程中,持续地获取化工知识数据以形成新的高可信度的标准知识表示,并将新形成的标准知识表示实时地添加到已构建的化工知识图谱中以更新该化工知识图谱。如此,通过配置该构建模块11,智能问答装置10能够在其日常使用的过程中自动收集化工领域的相关知识,并基于自然语言处理、大数据及人工智能技术,将这些化工领域的相关知识构建到化工知识图谱中,从而进一步提升化工知识图谱中的化工知识的全面性、准确性和实时性。
请进一步参考图3,图3示出了根据本发明的一些实施例提供的进行智能问答的流程示意图。
如图1及图3所示,在完成化工知识图谱的构建后,智能问答装置10即可基于构建的化工知识图谱,快速、准确地针对用户提出的化工领域的问题来提供对应的答案。
具体来说,在进行化工知识的智能问答的过程中,智能问答装置10可以首先通过键盘、麦克风等人机交互接口获取用户输入的问题,再利用问题预处理模块12对其进行问句解析以识别其中相关于化工知识的问题实体数据及问题属性数据, 并识别该问题的意图。在一些实施例中,上述问句解析的步骤可以通过一个预先训练的问句解析模块来实施。具体来说,针对用户通过麦克风输入的语音数据,智能问答装置10可以首先利用预先训练的语音识别模块及语义识别模块,将该语音数据转换为对应的文本数据,再将转换获得的文本数据输入预先训练的问句解析模块来识别其中相关于化工知识的问题实体数据及问题属性数据,并识别该问题的意图。
可以理解的是,上述语音识别模块及语义识别模块是本领域的现有技术,在此不再赘述。至于上述问句解析模块,则可以选用深度学习模型。技术人员可以先通过标注化工领域的相关知识来制作大量的化工知识的问题样本,再基于这些化工知识的问题样本来训练问句解析模块,以使其获得从化工知识中识别实体数据及属性数据的功能。如上所述,属性数据可以包括关系属性(Relation Property)数据及数据属性(Data Property)数据,其中,关系属性数据用于描述两个对应实体之间的关系属性,而数据属性数据用于描述一个对应实体的一种属性值。
举例来说,针对用户提出的“分馏塔C-9102回流量过少是如何导致的”的问题,问句解析模块可以从中识别到“分馏塔C-9102回流量过少”的实体数据,以及“如何导致”的属性数据。
为了避免用户口语化的问题难以与化工知识图谱中标准化的化工知识数据进行关联,智能问答装置10可以利用辅助词典模块15,对识别获取的实体数据及属性数据进行进一步的映射转换。在一些实施例中,该辅助词典模块15中可以配置有实体链接词典及属性词典。响应于从问题中识别到上述“分馏塔C-9102回流量过少”的实体数据,智能问答装置10可以首先调用实体链接词典,以查询其中是否记载有该实体数据的同义词。若实体链接词典中记载有该实体数据的同义词,则智能问答装置10可以使用该同义词来替代该实体数据,以将该问题实体数据映射为与化工知识图谱描述一致的数据。反之,若实体链接词典中没有记载该实体数据的同义词,则智能问答装置10可以进一步基于机器学习的模糊匹配技术,从该实体链接词典中查询符合模糊匹配规则的相关记载,并使用模糊匹配到的相关记载(例如“分馏塔C-9102底循回流流量过少”)来替代该实体数据,以将该问题实体数据映射为与化工知识图谱描述一致的数据。同样地,响应于从问题中识别到上述“如何导致”的属性数据,智能问答装置10也可以调用属性词典,并基于同义词 和/或机器学习的模糊匹配技术,将该“如何导致”的属性数据映射为知识图谱中记载的“原因”的属性数据。
之后,问句解析模块可以基于识别到的问题实体数据及问题属性数据来识别该问题的意图。具体来说,针对上述实施例,响应于从问题中识别到一个相关于化工知识的问题实体数据(即“分馏塔C-9102底循回流流量过少”),以及一个对应的问题属性数据(即“原因”),问句解析模块可以判定该问题的意图为根据第一实体及属性检索对应的第二实体。可选地,在另一些实施例中,问句解析模块也可以响应于从问题中识别到两个相关于化工知识的问题实体数据,判定该问题的意图为根据第一实体及第二实体检索对应的属性,在此不再赘述。
如图1及图3所示,在识别到问题涉及的实体数据、属性数据及意图后,智能问答装置10可以利用问题分析推理模块13先将问题预处理模块12输出的结果与化工知识图谱中的知识进行关联,再结合化工知识图谱中相关的标准知识表示进行知识推理,进而得到获取问题答案的候选路径。
具体来说,问题分析推理模块13可以首先根据该问题实体数据查询化工知识图谱,以将其与化工知识图谱中对应的第一图谱实体数据进行关联,再确定化工知识图谱中所有相关于该第一图谱实体数据的多条标准知识表示。之后,问题分析推理模块13可以基于上述根据第一实体及属性检索对应的第二实体的意图,通过该多条标准知识表示选择所有与该第一图谱实体数据相关的第二图谱属性数据,再分别将该第一图谱实体数据与各第二图谱属性数据进行组合,以获取多条候选路径。
举例来说,针对上述“分馏塔C-9102底循回流流量过少”的实施例,问题分析推理模块13可以首先将其与化工知识图谱中记载的“分馏塔C-9102底循回流流量过少”的第一图谱实体数据进行关联,再在化工知识图谱中查询所有与该第一图谱实体数据相关的标准知识表示。在一些实施例中,与该“分馏塔C-9102底循回流流量过少”的第一图谱实体数据相关的标准知识表示可以包括:“分馏塔C-9102底循回流流量过少,原因,分馏塔C-9102底循回流流量控制回路FIC9133故障”;“分馏塔C-9102底循回流流量过少,后果,分馏塔C-9102塔底结焦引起加热炉进料流量波动”;以及“分馏塔C-9102底循回流流量过少,安全措施,加热炉F9101设有进料低流量联锁:进料流量低于27.5T/H时熄该组的加热炉主火嘴”。
之后,问题分析推理模块13可以基于上述根据第一实体及属性检索对应的第二实体的意图,从查询到的标准知识表示中选择所有与该第一图谱实体数据相关的第二图谱属性数据(即上述“原因”、“后果”及“安全措施”),以构建多条候选路径。具体来说,问题分析推理模块13可以将该“分馏塔C-9102底循回流流量过少”的第一图谱实体数据与第二图谱属性数据“原因”组合为“分馏塔C-9102底循回流流量过少原因”的第一候选路径;可以将该“分馏塔C-9102底循回流流量过少”的第一图谱实体数据与第二图谱属性数据“后果”组合为“分馏塔C-9102底循回流流量过少后果”的第二候选路径;也可以将该“分馏塔C-9102底循回流流量过少”的第一图谱实体数据与第二图谱属性数据“安全措施”组合为“分馏塔C-9102底循回流流量过少安全措施”的第三候选路径。
进一步地,在一些实施例中,与该第一图谱实体数据相关的第二图谱属性数据不仅包括上述一度相关的第二图谱属性数据(即通过一条标准知识表示即可关联到该第一图谱实体数据的第二图谱属性数据),还可以包括与该第一图谱实体数据二度相关的第二图谱属性数据(即需要通过两条标准知识表示才能关联到该第一图谱实体数据的第二图谱属性数据)。例如,针对“分馏塔C-9102底循回流流量过少,原因,分馏塔C-9102底循回流流量控制回路FIC9133故障”以及“分馏塔C-9102底循回流流量控制回路FIC9133故障,后果,阀门FV9133关小”的标准知识表示,问题分析推理模块13还可以进一步推理获得“分馏塔C-9102底循回流流量过少,原因,阀门FV9133关小”的新知识,并根据该新知识确定与该第一图谱实体数据二度相关的第二图谱属性数据“原因”。之后,问题分析推理模块13可以将该“分馏塔C-9102底循回流流量过少”的第一图谱实体数据与该二度相关的第二图谱属性数据“原因”组合为“馏塔C-9102底循回流流量过少原因”的第四候选路径。
本领域的技术人员可以理解,基于上述根据第一实体及属性检索对应的第二实体的意图来生成多条候选路径的方案,只是本发明提供的一种非限制性的实施方式,旨在清楚地展示本发明的主要构思,并提供一种便于公众实施的具体方案,而非用于限制本发明的保护范围。
可选地,在另一些实施例中,基于上述根据第一实体及第二实体检索对应的属性的意图,问题分析推理模块13可以首先根据该第一图谱实体数据来查询上述化工知识图谱,以确定其中所有相关于该第一图谱实体数据的多条标准知识表示。 之后,问题分析推理模块13可以基于上述根据第一实体及属性检索对应的第二实体的意图,通过该多条标准知识表示选择所有与该第一图谱实体数据一度相关或二度相关的第二图谱属实体据。该一度相关的第二图谱属实体据是指第一图谱实体数据能通过一条标准知识表示关联到的第二图谱实体数据。该二度相关的第二图谱属实体据是指该第一图谱实体数据能通过两条标准知识表示关联到的第二图谱实体数据。再之后,问题分析推理模块13可以将该第一图谱实体数据与选择的各第二图谱实体数据分别进行组合,以构成多条候选路径。这些候选路径的组合方式与上述实施例相同,在此不再赘述。
如图1及图3所示,在生成多条候选路径后,智能问答装置10可以利用问题后处理模块14对这多条候选路径进行路径匹配,以确定其中的最佳搜索路径。之后,问题后处理模块14根据该最佳搜索路径搜索化工知识图谱,以获得对应于问题的答案。
具体来说,针对上述“分馏塔C-9102底循回流流量过少”的实施例,问题后处理模块14可以首先将用户提出的问题的文本输入基于化工知识样本预先训练的词向量模型,以获取对应于该问题的第一文本向量。之后,问题后处理模块14可以将上述第一至第四候选路径分别输入该词向量模型,以分别获取各候选路径的第二向量。再之后,问题后处理模块14可以分别计算各第二向量与该第一向量的余弦值,以作为各候选路径与该问题的文本匹配度。
在上述实施例中,基于上述“分馏塔C-9102循回流流量过少的原因是什么”的问题文本,第一候选路径“分馏塔C-9102底循回流流量过少原因”的文本匹配度为0.98,第二候选路径“分馏塔C-9102底循回流流量过少后果”的文本匹配度为0.85,第三候选路径“分馏塔C-9102底循回流流量过少安全措施”的文本匹配度为0.74,第四候选路径“分馏塔C-9102底循回流流量过少原因”的文本匹配度也为0.98。
如此,问题后处理模块14即可根据该文本匹配度的排名,选择文本匹配度最大的第一候选路径及第四候选路径(即实体:分馏塔C-9102底循回流流量过少;属性:原因)为最佳搜索路径。之后,问题后处理模块14可以根据该最佳搜索路径搜索化工知识图谱,以确定对应的标准知识表示,即“分馏塔C-9102底循回流流量过少,原因,分馏塔C-9102底循回流流量控制回路FIC9133故障”及“分馏塔C-9102底循回流流量过少,原因,阀门FV9133关小”。再之后,问题后处理模块14 可以基于上述根据第一实体及属性检索对应的第二实体的意图,确定答案位于相关标准知识表示的第二实体,即上述“分馏塔C-9102底循回流流量控制回路FIC9133故障”及“阀门FV9133关小”。最后,问题后处理模块14可以结合上述问题对获得的答案进行整理,以获得“分馏塔C-9102底循回流流量控制回路FIC9133故障,导致阀门FV9133关小”的标准形式的答案,并将该标准形式的答案通过扬声器或显示屏等人机交互接口返回给用户。
如此,本发明提供的上述智能问答装置10即可结合化工领域的化工知识图谱及自然语言处理技术,为化工领域的技术人员提供化工知识的智能问答功能。相比于基于检索技术或者深度学习匹配技术来实现智能问答的现有技术,本发明结合化工知识图谱来进行推理式智能问答,能够更准确、更高效地理解化工领域技术人员的真实需求,辅助化工领域的技术人员进行决策,并快速地解决复杂的问题,从而降低安全事故的发生该率,并更好地保障企业和国家利益。
尽管为使解释简单化将上述方法图示并描述为一系列动作,但是应理解并领会,这些方法不受动作的次序所限,因为根据一个或多个实施例,一些动作可按不同次序发生和/或与来自本文中图示和描述或本文中未图示和描述但本领域技术人员可以理解的其他动作并发地发生。
本领域技术人员将可理解,信息、信号和数据可使用各种不同技术和技艺中的任何技术和技艺来表示。例如,以上描述通篇引述的数据、指令、命令、信息、信号、位(比特)、码元、和码片可由电压、电流、电磁波、磁场或磁粒子、光场或光学粒子、或其任何组合来表示。
本领域技术人员将进一步领会,结合本文中所公开的实施例来描述的各种解说性逻辑板块、模块、电路、和算法步骤可实现为电子硬件、计算机软件、或这两者的组合。为清楚地解说硬件与软件的这一可互换性,各种解说性组件、框、模块、电路、和步骤在上面是以其功能性的形式作一般化描述的。此类功能性是被实现为硬件还是软件取决于具体应用和施加于整体系统的设计约束。技术人员对于每种特定应用可用不同的方式来实现所描述的功能性,但这样的实现决策不应被解读成导致脱离了本发明的范围。
结合本文所公开的实施例描述的各种解说性逻辑模块、和电路可用通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列 (FPGA)或其它可编程逻辑器件、分立的门或晶体管逻辑、分立的硬件组件、或其设计成执行本文所描述功能的任何组合来实现或执行。通用处理器可以是微处理器,但在替换方案中,该处理器可以是任何常规的处理器、控制器、微控制器、或状态机。处理器还可以被实现为计算设备的组合,例如DSP与微处理器的组合、多个微处理器、与DSP核心协作的一个或多个微处理器、或任何其他此类配置。
提供对本公开的先前描述是为使得本领域任何技术人员皆能够制作或使用本公开。对本公开的各种修改对本领域技术人员来说都将是显而易见的,且本文中所定义的普适原理可被应用到其他变体而不会脱离本公开的精神或范围。由此,本公开并非旨在被限定于本文中所描述的示例和设计,而是应被授予与本文中所公开的原理和新颖性特征相一致的最广范围。

Claims (22)

  1. 一种化工知识图谱的构建方法,其特征在于,包括以下步骤:
    获取化工领域的知识数据;
    对所述知识数据进行预处理,以获取其中相关于化工知识的实体数据及属性数据;
    根据所述实体数据及所述属性数据确定初步知识表示;
    对所述初步知识表示进行实体对齐以获取标准知识表示;以及
    根据所述标准知识表示构建所述化工知识图谱。
  2. 如权利要求1所述的构建方法,其特征在于,所述知识数据包括结构化数据、半结构化数据和/或非结构化数据,所述对所述知识数据进行预处理的步骤包括:
    对所述结构化数据进行数据集成,以获取其中相关于化工知识的实体数据及属性数据;和/或
    对所述半结构化数据和/或所述非结构化数据进行知识抽取,以获取其中相关于化工知识的实体数据及属性数据。
  3. 如权利要求2所述的构建方法,其特征在于,所述属性数据包括数据属性数据及关系属性数据,其中,所述数据属性数据用于描述同一初步知识表示中的一个所述实体数据的属性值,所述关系属性数据用于描述同一初步知识表示中的两个所述实体数据之间的关系。
  4. 如权利要求3所述的构建方法,其特征在于,所述根据所述实体数据及所述属性数据确定初步知识表示的步骤包括:
    以实体-数据属性-属性值或第一实体-关系属性-第二实体的形式,根据获取的所述实体数据及所述属性数据构建三元组形式的初步知识表示。
  5. 如权利要求1所述的构建方法,其特征在于,所述对所述初步知识表示进行实体对齐以获取标准知识表示的步骤包括:
    分析多条所述初步知识表示,以确定其中指示同一化工实体的多条不同的实 体数据;以及
    将所述指示同一化工实体的多条不同的实体数据消解为同一实体数据,以获取利用同一实体数据指示同一化工实体的标准知识表示。
  6. 如权利要求1所述的构建方法,其特征在于,所述根据所述标准知识表示构建所述化工知识图谱的步骤包括:
    根据多条所述标准知识表示中的所述实体数据及所述属性数据进行知识发现,以获取至少一条高可信度的标准知识表示;
    根据多条所述标准知识表示中的所述实体数据及所述属性数据进行知识推理,以获取多条未知可信度的标准知识表示;
    对所述多条未知可信度的标准知识表示进行质量评估,以确定其中高可信度的标准知识表示;以及
    根据各所述高可信度的标准知识表示构建所述化工知识图谱。
  7. 如权利要求6所述的构建方法,其特征在于,所述对所述多条未知可信度的标准知识表示进行质量评估的步骤包括:
    将所述多条未知可信度的标准知识表示分别与所述化工领域的知识数据进行文本匹配,以分别获取各所述标准知识表示的文本匹配度;以及
    将其中文本匹配度高于预设的匹配度阈值的标准知识表示确定为所述高可信度的标准知识表示。
  8. 一种化工知识图谱的构建装置,其特征在于,包括:
    存储器;以及
    处理器,所述处理器连接所述存储器,并被配置用于实施如权利要求1~7中任一项所述的化工知识图谱的构建方法。
  9. 一种计算机可读存储介质,其上存储有计算机指令,其特征在于,所述计算机指令被处理器执行时,实施如权利要求1~7中任一项所述的化工知识图谱的构建方法。
  10. 一种化工知识的智能问答方法,其特征在于,包括以下步骤:
    获取用户提出的问题;
    对所述问题进行预处理,识别其中相关于化工知识的问题实体数据及问题属性数据,并识别所述问题的意图;
    从化工知识图谱的各图谱实体数据中确定关联于所述问题实体数据的第一图谱实体数据,其中,所述化工知识图谱是由权利要求1~7中任一项所述的化工知识图谱的构建方法所构建;
    根据所述第一图谱实体数据、所述问题的意图及所述化工知识图谱中的标准知识表示进行知识推理,以获取多条候选路径;
    分别计算所述多条候选路径与所述问题的文本匹配度,并选取文本匹配度最大的候选路径为最佳搜索路径;以及
    根据所述最佳搜索路径搜索所述化工知识图谱,以获得对应于所述问题的答案。
  11. 如权利要求10所述的智能问答方法,其特征在于,所述识别其中相关于化工知识的问题实体数据及问题属性数据的步骤包括:
    将所述问题输入预先训练的问句解析模块,以获取其中相关于化工知识的问题实体数据及问题属性数据,其中,所述问句解析模块是基于化工知识的问题样本所训练的深度学习模型。
  12. 如权利要求11所述的智能问答方法,其特征在于,所述问句解析模块中包括实体链接词典及属性词典,所述识别其中相关于化工知识的问题实体数据及问题属性数据的步骤还包括:
    将获取的所述问题实体数据输入所述实体链接词典,基于同义词和/或机器学习的模糊匹配将所述问题实体数据映射为与所述化工知识图谱描述一致的数据;以及
    将获取的所述问题属性数据输入所述属性词典,基于同义词和/或机器学习的模糊匹配将所述问题属性数据映射为与所述化工知识图谱描述一致的数据。
  13. 如权利要求所述10的智能问答方法,其特征在于,所述识别所述问题的意图的步骤包括:
    响应于从所述问题中识别到一个所述相关于化工知识的问题实体数据及一个 对应的问题属性数据,判定所述问题的意图为根据第一实体及属性检索对应的第二实体;以及
    响应于从所述问题中识别到两个所述相关于化工知识的问题实体数据,判定所述问题的意图为根据第一实体及第二实体检索对应的属性。
  14. 如权利要求10所述的智能问答方法,其特征在于,所述根据所述第一图谱实体数据、所述问题的意图及所述化工知识图谱中的标准知识表示进行知识推理,以获取多条候选路径的步骤包括:
    根据所述问题的意图及所述化工知识图谱中的标准知识表示,选择所有与所述第一图谱实体数据相关的第二图谱属性数据或第二图谱实体数据;以及
    分别将所述第一图谱实体数据与各所述第二图谱属性数据或各所述第二图谱实体数据进行组合,以获取多条候选路径。
  15. 如权利要求14所述的智能问答方法,其特征在于,所述根据所述问题的意图及所述化工知识图谱中的标准知识表示,选择所有与所述第一图谱实体数据相关的第二图谱属性数据或第二图谱实体数据的步骤包括:
    响应于所述问题的意图为根据第一实体及属性检索对应的第二实体,根据所述化工知识图谱中的标准知识表示选择所有与所述第一图谱实体数据相关的第二图谱属性数据;以及
    响应于所述问题的意图为根据第一实体及第二实体检索对应的属性,根据所述化工知识图谱中的标准知识表示选择所有与所述第一图谱实体数据相关的第二图谱实体数据。
  16. 如权利要求15所述的智能问答方法,其特征在于,所述与所述第一图谱实体数据相关的第二图谱属性数据包括与所述第一图谱实体数据一度相关或二度相关的第二图谱属性数据,其中,所述一度相关是指所述第一图谱实体数据能通过一条所述标准知识表示关联到所述第二图谱属性数据,所述二度相关是指所述第一图谱实体数据能通过两条所述标准知识表示关联到所述第二图谱属性数据,所述与所述第一图谱实体数据相关的第二图谱实体数据包括与所述第一图谱实体数据一度相关或二度相关的第二图谱实体数据,其中,所述一度相关是指所述第一图谱实体数据能通过一条所述标准知识表示关联到所述第二图谱实体数据, 所述二度相关是指所述第一图谱实体数据能通过两条所述标准知识表示关联到所述第二图谱实体数据。
  17. 如权利要求15所述的智能问答方法,其特征在于,所述分别将所述第一图谱实体数据与各所述第二图谱属性数据或各所述第二图谱实体数据进行组合,以获取多条候选路径的步骤包括:
    响应于所述问题的意图为根据第一实体及属性检索对应的第二实体,分别将所述第一图谱实体数据与选择的各所述第二图谱属性数据进行组合,以获取多条候选路径;以及
    响应于所述问题的意图为根据第一实体及第二实体检索对应的属性,分别将所述第一图谱实体数据与选择的各所述第二图谱实体数据进行组合,以获取多条候选路径。
  18. 如权利要求10所述的智能问答方法,其特征在于,所述分别计算所述多条候选路径与所述问题的文本匹配度的步骤包括:
    将所述问题输入基于化工知识样本预先训练的词向量模型,以获取所述问题的第一向量;
    将获取的所述多条候选路径分别输入所述词向量模型,以分别获取各所述候选路径的第二向量;以及
    分别计算各所述第二向量与所述第一向量的余弦值,以作为各所述候选路径与所述问题的文本匹配度。
  19. 如权利要求10所述的智能问答方法,其特征在于,所述根据所述最佳搜索路径搜索所述化工知识图谱,以获得对应于所述问题的答案的步骤包括:
    根据所述最佳搜索路径搜索所述化工知识图谱,以确定对应的标准知识表示;
    根据所述问题的意图确定所述答案在所述标准知识表示中的位置;以及
    结合所述问题对所述答案进行整理,以获得标准形式的答案。
  20. 如权利要求19所述的智能问答方法,其特征在于,还包括以下步骤:
    将所述标准形式的答案返回给所述用户。
  21. 一种化工知识的智能问答装置,其特征在于,包括:
    存储器;以及
    处理器,所述处理器连接所述存储器,并被配置用于实施如权利要求10~20中任一项所述的化工知识的智能问答方法。
  22. 一种计算机可读存储介质,其上存储有计算机指令,其特征在于,所述计算机指令被处理器执行时,实施如权利要求10~20中任一项所述的化工知识的智能问答方法。
PCT/CN2022/083978 2021-04-21 2022-03-30 化工知识图谱的构建方法及装置以及智能问答方法及装置 WO2022222716A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/556,617 US20240256924A1 (en) 2021-04-21 2022-03-30 Construction method and device of chemical engineering knowledge graph and intelligent question answering method and device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110431113.7 2021-04-21
CN202110431113.7A CN112948566B (zh) 2021-04-21 2021-04-21 化工知识图谱的构建方法及装置以及智能问答方法及装置

Publications (1)

Publication Number Publication Date
WO2022222716A1 true WO2022222716A1 (zh) 2022-10-27

Family

ID=76233120

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/083978 WO2022222716A1 (zh) 2021-04-21 2022-03-30 化工知识图谱的构建方法及装置以及智能问答方法及装置

Country Status (3)

Country Link
US (1) US20240256924A1 (zh)
CN (1) CN112948566B (zh)
WO (1) WO2022222716A1 (zh)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115618947A (zh) * 2022-12-05 2023-01-17 中国人民解放军总医院 医疗知识图谱质量评估系统、装置、设备、介质及产品
CN115809311A (zh) * 2022-12-22 2023-03-17 企查查科技有限公司 知识图谱的数据处理方法、装置及计算机设备
CN116054910A (zh) * 2022-12-20 2023-05-02 中国人民解放军63819部队 基于知识图谱构建的地球站设备故障分析及装置
CN116150929A (zh) * 2023-04-17 2023-05-23 中南大学 一种铁路选线知识图谱的构建方法
CN116821712A (zh) * 2023-08-25 2023-09-29 中电科大数据研究院有限公司 非结构化文本与知识图谱的语义匹配方法及装置
CN117171332A (zh) * 2023-11-02 2023-12-05 江西拓世智能科技股份有限公司 基于ai的智能问答方法及系统
CN117271754A (zh) * 2023-11-17 2023-12-22 杭州海康威视数字技术股份有限公司 数据检索方法、装置及设备
CN117313849A (zh) * 2023-10-12 2023-12-29 湖北华中电力科技开发有限责任公司 一种基于多源异构数据融合技术的能源行业知识图谱构建方法及装置
CN117669718A (zh) * 2023-12-05 2024-03-08 广州鸿蒙信息科技有限公司 一种基于人工智能的消防知识训练模型及训练方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112948566B (zh) * 2021-04-21 2024-02-02 华东理工大学 化工知识图谱的构建方法及装置以及智能问答方法及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107491555A (zh) * 2017-09-01 2017-12-19 北京纽伦智能科技有限公司 知识图谱构建方法和系统
CN109492077A (zh) * 2018-09-29 2019-03-19 北明智通(北京)科技有限公司 基于知识图谱的石化领域问答方法及系统
CN110008353A (zh) * 2019-04-09 2019-07-12 福建奇点时空数字科技有限公司 一种动态知识图谱的构建方法
CN110837550A (zh) * 2019-11-11 2020-02-25 中山大学 基于知识图谱的问答方法、装置、电子设备及存储介质
CN112100351A (zh) * 2020-09-11 2020-12-18 陕西师范大学 一种通过问题生成数据集构建智能问答系统的方法及设备
CN112948566A (zh) * 2021-04-21 2021-06-11 华东理工大学 化工知识图谱的构建方法及装置以及智能问答方法及装置

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180232443A1 (en) * 2017-02-16 2018-08-16 Globality, Inc. Intelligent matching system with ontology-aided relation extraction
CN108268581A (zh) * 2017-07-14 2018-07-10 广东神马搜索科技有限公司 知识图谱的构建方法及装置
CN110597969B (zh) * 2019-08-12 2022-05-24 中国农业大学 一种农业知识智能问答方法、系统以及电子设备
CN111339267A (zh) * 2020-02-17 2020-06-26 京东方科技集团股份有限公司 基于知识图谱的问答方法及系统、计算机设备及介质
CN111613277A (zh) * 2020-05-22 2020-09-01 重庆大学 一种危险化学品领域的知识表示方法
CN112258044A (zh) * 2020-10-23 2021-01-22 上海印钞有限公司 图像判废分析反馈系统
CN112182252B (zh) * 2020-11-09 2021-08-31 浙江大学 基于药品知识图谱的智能用药问答方法及其设备
CN112463926A (zh) * 2020-12-07 2021-03-09 广东电网有限责任公司佛山供电局 一种数据检索/智能问答方法、装置、存储介质
CN112287095A (zh) * 2020-12-30 2021-01-29 中航信移动科技有限公司 确定问题答案的方法、装置、计算机设备及存储介质
CN113821588A (zh) * 2021-06-02 2021-12-21 腾讯科技(深圳)有限公司 文本处理方法、装置、电子设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107491555A (zh) * 2017-09-01 2017-12-19 北京纽伦智能科技有限公司 知识图谱构建方法和系统
CN109492077A (zh) * 2018-09-29 2019-03-19 北明智通(北京)科技有限公司 基于知识图谱的石化领域问答方法及系统
CN110008353A (zh) * 2019-04-09 2019-07-12 福建奇点时空数字科技有限公司 一种动态知识图谱的构建方法
CN110837550A (zh) * 2019-11-11 2020-02-25 中山大学 基于知识图谱的问答方法、装置、电子设备及存储介质
CN112100351A (zh) * 2020-09-11 2020-12-18 陕西师范大学 一种通过问题生成数据集构建智能问答系统的方法及设备
CN112948566A (zh) * 2021-04-21 2021-06-11 华东理工大学 化工知识图谱的构建方法及装置以及智能问答方法及装置

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115618947A (zh) * 2022-12-05 2023-01-17 中国人民解放军总医院 医疗知识图谱质量评估系统、装置、设备、介质及产品
CN116054910A (zh) * 2022-12-20 2023-05-02 中国人民解放军63819部队 基于知识图谱构建的地球站设备故障分析及装置
CN116054910B (zh) * 2022-12-20 2024-05-14 中国人民解放军63819部队 基于知识图谱构建的地球站设备故障分析及装置
CN115809311A (zh) * 2022-12-22 2023-03-17 企查查科技有限公司 知识图谱的数据处理方法、装置及计算机设备
CN116150929A (zh) * 2023-04-17 2023-05-23 中南大学 一种铁路选线知识图谱的构建方法
CN116150929B (zh) * 2023-04-17 2023-07-07 中南大学 一种铁路选线知识图谱的构建方法
CN116821712A (zh) * 2023-08-25 2023-09-29 中电科大数据研究院有限公司 非结构化文本与知识图谱的语义匹配方法及装置
CN116821712B (zh) * 2023-08-25 2023-12-19 中电科大数据研究院有限公司 非结构化文本与知识图谱的语义匹配方法及装置
CN117313849A (zh) * 2023-10-12 2023-12-29 湖北华中电力科技开发有限责任公司 一种基于多源异构数据融合技术的能源行业知识图谱构建方法及装置
CN117171332A (zh) * 2023-11-02 2023-12-05 江西拓世智能科技股份有限公司 基于ai的智能问答方法及系统
CN117271754A (zh) * 2023-11-17 2023-12-22 杭州海康威视数字技术股份有限公司 数据检索方法、装置及设备
CN117271754B (zh) * 2023-11-17 2024-06-04 杭州海康威视数字技术股份有限公司 数据检索方法、装置及设备
CN117669718A (zh) * 2023-12-05 2024-03-08 广州鸿蒙信息科技有限公司 一种基于人工智能的消防知识训练模型及训练方法

Also Published As

Publication number Publication date
CN112948566B (zh) 2024-02-02
CN112948566A (zh) 2021-06-11
US20240256924A1 (en) 2024-08-01

Similar Documents

Publication Publication Date Title
WO2022222716A1 (zh) 化工知识图谱的构建方法及装置以及智能问答方法及装置
WO2021196520A1 (zh) 一种面向税务领域知识图谱的构建方法及系统
CN108763333A (zh) 一种基于社会媒体的事件图谱构建方法
Stoilos et al. Fuzzy extensions of OWL: Logical properties and reduction to fuzzy description logics
CN111967761B (zh) 一种基于知识图谱的监控预警方法、装置及电子设备
Yang et al. Research on enterprise risk knowledge graph based on multi-source data fusion
CN107358315A (zh) 一种信息预测方法及终端
CN112434522B (zh) 一种降低敏感词误警率的文本审核后处理装置及方法
Dang et al. Information retrieval from legal documents with ontology and graph embeddings approach
Li et al. Neural factoid geospatial question answering
Guan et al. Quantifying semantic similarity of Chinese words from HowNet
CB et al. Ontology-based semantic data interestingness using BERT models
Yang et al. Affective knowledge augmented interactive graph convolutional network for chinese-oriented aspect-based sentiment analysis
KR102625553B1 (ko) 입력된 쿼리와 관련된 규제법률조항을 도출하는 방법, 컴퓨터-판독가능 기록매체 및 이를 수행하는 컴퓨팅시스템
Wu et al. Research of knowledge graph technology and its applications in agricultural information consultation field
Huang et al. Token Relation Aware Chinese Named Entity Recognition
Qiu et al. Review of development and construction of Uyghur knowledge graph
Rohil et al. Natural language interfaces to domain specific knowledge bases: an illustration for querying elements of the periodic table
CN113222251A (zh) 一种基于案件争议焦点的辅助裁判结果预测方法及系统
Lisi et al. Towards learning fuzzy dl inclusion axioms
Yang et al. Construction and Application of Chinese Enterprise Knowledge Graph Based on Neural Networks
Albarghothi An ontology-based semantic web for Arabic question answering: The case of e-government services
Gao et al. Research on the importance of data enhancement technology in power document understanding
Huang et al. Construction and Application of Knowledge Graph for Quality and Safety Supervision of Transportation Engineering
Gao et al. Jointly Events Extraction for Database Alarm based on Dynamic Matching Strategy and GCN

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22790825

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22790825

Country of ref document: EP

Kind code of ref document: A1