CN114840685A

CN114840685A - Emergency plan knowledge graph construction method

Info

Publication number: CN114840685A
Application number: CN202210447707.1A
Authority: CN
Inventors: 关城; 于振; 徐希源; 冯杰; 张泽浩; 张志珍; 翁蔚; 郭雨松; 许鑫; 房殿阁; 唐诗洋; 米昕禾; 严屹然; 刘泽宇; 张业欣; 马钢
Original assignee: State Grid Smart Grid Research Institute Co ltd; State Grid Fujian Electric Power Co Ltd
Current assignee: State Grid Smart Grid Research Institute Co ltd; State Grid Fujian Electric Power Co Ltd
Priority date: 2022-04-26
Filing date: 2022-04-26
Publication date: 2022-08-02

Abstract

The application relates to a construction method, a construction device, construction equipment and a storage medium of an emergency plan knowledge graph, in particular to the technical field of natural language processing. The method comprises the following steps: constructing a target knowledge graph ontology graph; acquiring an emergency plan text; performing similarity matching between the emergency plan text and each candidate theme to determine a target theme corresponding to the emergency plan text in each candidate theme; performing area division on the emergency plan text based on the target subject, and acquiring a target text of a text area; performing knowledge extraction on the target text through a target knowledge extraction model to obtain a target entity and a relation between the target entities; and associating the target entities and the relationship between the target entities with the body map of the emergency plan knowledge map to construct the emergency plan knowledge map. By the scheme, the accuracy of entity extraction in the knowledge graph construction process can be improved.

Description

Emergency plan knowledge graph construction method

Technical Field

The invention relates to the technical field of natural language processing, in particular to a construction method of an emergency plan knowledge graph.

Background

The emergency plan, which is an important work of emergency management, is a plan which is made in advance for possible emergency situations in relation to emergency management, emergency command, and emergency rescue.

When an emergency happens, most of traditional text-type plans exist in a paper form, and the texts of the plans have a lot of defects, so that the traditional text-type plans are difficult to effectively serve for a command decision-making department and an action department, and the command decision-making department is difficult to master the handling state of the emergency and the response condition of each department in time and cannot make a decision accurately. The method for actively exploring and utilizing modern technical means such as computer technology, network technology, simulation technology and the like to perfect the emergency plan is the research focus of the current emergency plan, and the method can realize the standard management of emergency plan systems and text contents for main emergency plan departments by applying a knowledge graph to the digital management and intelligent pushing of the emergency plan and establishing an emergency plan digital platform, thereby facilitating the plan knowledge inquiry and assisting the emergency command decision.

However, the entity extraction accuracy of the knowledge graph realized by the machine learning model is low because the pre-arranged text is an unstructured text and the text structure is not standardized.

Disclosure of Invention

The application provides an emergency plan knowledge graph construction method, device, equipment and storage medium, and the training efficiency of a machine learning model is improved.

In one aspect, a method for constructing an emergency plan knowledge graph is provided, and the method includes:

constructing a target knowledge graph ontology graph; the target knowledge graph ontology graph comprises entities and relations among the entities;

acquiring an emergency plan text;

carrying out similarity matching on the emergency plan text and each candidate theme to obtain matching degree, and determining a target theme corresponding to the emergency plan text in each candidate theme based on the matching degree;

performing area division on the emergency plan text based on the target subject, and acquiring a target text of a text area;

performing knowledge extraction on the target text through a target knowledge extraction model to obtain a target entity and a relation between the target entities;

and associating the target entities and the relationship between the target entities with the body map of the emergency plan knowledge map to construct the emergency plan knowledge map.

In another aspect, an emergency plan knowledge graph building apparatus is provided, the apparatus including:

the ontology construction module is used for constructing a target knowledge graph ontology graph; the target knowledge graph ontology graph comprises entities and relations among the entities;

the plan text acquisition module is used for acquiring an emergency plan text;

the similarity matching module is used for performing similarity matching between the emergency plan text and each candidate theme to obtain matching degrees, and determining a target theme corresponding to the emergency plan text in each candidate theme based on the matching degrees;

the area division module is used for carrying out area division on the emergency plan text based on the target subject and acquiring a target text of a text area;

the knowledge extraction module is used for extracting knowledge from the target text through a target knowledge extraction model to obtain a target entity and a relation between the target entities;

and the map construction module is used for associating the target entities and the relation between the target entities with the body map of the emergency plan knowledge map to construct the emergency plan knowledge map.

In one possible implementation manner, the similarity matching module is further configured to,

for each candidate theme, carrying out similarity matching on the file name of the emergency plan text and the theme name of the candidate theme to obtain a first matching value;

carrying out similarity matching on the title of the emergency plan text and the title of the candidate theme to obtain a second matching value;

similarity matching is carried out on the keywords of the emergency plan text and the keywords of the candidate topics, and a third matching value is obtained; the occurrence frequency of the keywords of the emergency plan text in the emergency plan text is higher than a target threshold;

weighting and summing the first matching value, the second matching value and the third matching value to obtain the matching degree of the candidate subject and the emergency plan text;

and determining the target theme of the emergency plan text according to the matching degree of each candidate theme and the emergency plan text.

In one possible implementation manner, the region dividing module is further configured to,

acquiring at least one pair of starting keywords and ending keywords corresponding to the target subject;

and for each pair of the start keyword and the end keyword, acquiring a target text of a text region between the start keyword and the end keyword in the target subject.

In one possible implementation, the apparatus further includes:

the part of speech analysis module is used for carrying out word segmentation processing on the target text and carrying out part of speech analysis on each obtained target word;

and the sentence dividing module is used for dividing the target text according to the part of speech relationship among the target words and punctuation marks in the target text to obtain the target text after the sentence division.

The knowledge extraction module is also used for extracting the knowledge,

and extracting knowledge of the target text after the sentence division through a target knowledge extraction model to obtain the target entity and the relation between the target entities.

In a possible implementation manner, the target text after the sentence division includes at least one target sentence;

the device further comprises:

the part-of-speech relation detection module is used for detecting the part-of-speech relation among all words in each target sentence aiming at each target sentence;

and the sentence completion module is used for executing completion operation on the target sentence according to the target part-of-speech relationship and the target paragraph where the target sentence is located when the target sentence lacks the target part-of-speech relationship.

In one possible implementation, the sentence completion module is further configured to,

when the target sentence has a moving object relationship and no main-meaning relationship, completing the subject of the target sentence according to the candidate subject in the target paragraph;

and when the target sentence has a major-minor relationship and no moving-guest relationship, combining the next sentence of the target sentence with the target sentence.

In one possible implementation, the apparatus further includes:

the training text acquisition module is used for acquiring a training plan text; the training plan text comprises entity marking information; the entity marking information is used for indicating sentences, entity categories and entity contents where the entities are located;

and the model training module is used for extracting knowledge from the training plan text through an initial knowledge extraction model, and performing target times of iterative updating on the initial knowledge extraction model according to an extraction result and the entity labeling information so as to generate the target knowledge extraction model.

In still another aspect, a computer device is provided, where the computer device includes a processor and a memory, where the memory stores at least one instruction, at least one program, a code set, or a set of instructions, and the at least one instruction, at least one program, a code set, or a set of instructions is loaded and executed by the processor to implement the emergency plan knowledge graph building method.

In yet another aspect, a computer-readable storage medium is provided, in which at least one instruction is stored, and the at least one instruction is loaded and executed by a processor to implement the emergency plan knowledge-graph constructing method described above.

In yet another aspect, a computer program product is provided, as well as a computer program product or a computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to cause the computer device to execute the emergency plan knowledge graph construction method.

The technical scheme provided by the application can comprise the following beneficial effects:

when the emergency plan knowledge graph is constructed, the computer equipment can firstly construct a target knowledge graph body to define the relation between the entity and the entity, the computer equipment can obtain an emergency plan text needing to be input, similarity matching is carried out on the emergency plan text and the candidate theme, a target theme corresponding to the emergency plan text is determined, the computer equipment can divide the emergency plan text according to the target theme at the moment, the text of a body area is determined, knowledge extraction is carried out on the text of the body area through a knowledge extraction model, and the entity is extracted to construct the emergency plan knowledge graph. In the scheme, the text of the emergency plan is matched with the preset candidate theme, so that the text area corresponding to the emergency plan is determined, the influence of an irregular text structure on a knowledge extraction model is reduced, and the accuracy of entity extraction in the process of building the knowledge map is improved.

Drawings

In order to more clearly illustrate the detailed description of the present application or the technical solutions in the prior art, the drawings needed to be used in the detailed description of the present application or the prior art description will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic diagram illustrating a construction of an emergency plan knowledge-graph building system according to an exemplary embodiment.

Fig. 2 is a method flow diagram illustrating a method for emergency protocol knowledge graph construction in accordance with an exemplary embodiment.

Fig. 3 is a method flow diagram illustrating a method for emergency protocol knowledge-graph construction in accordance with an exemplary embodiment.

Fig. 4 shows a topological schematic diagram of an ontology graph of an emergency plan knowledge graph according to an embodiment of the present application.

Fig. 5 shows a flow chart of topic identification related to the embodiment of the present application.

Fig. 6 shows a flow diagram of an intelligent segment according to an embodiment of the present application.

Fig. 7 shows a schematic diagram of an intelligent clause and completion flow according to an embodiment of the present application.

Fig. 8 is a block diagram illustrating a construction of an emergency plan knowledge-map construction apparatus according to an exemplary embodiment.

FIG. 9 is a schematic diagram of a computer device provided in accordance with an exemplary embodiment of the present application.

Detailed Description

The technical solutions of the present application will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be understood that "indication" mentioned in the embodiments of the present application may be a direct indication, an indirect indication, or an indication of an association relationship. For example, a indicates B, which may mean that a directly indicates B, e.g., B may be obtained by a; it may also mean that a indicates B indirectly, for example, a indicates C, and B may be obtained by C; it can also mean that there is an association between a and B.

In the description of the embodiments of the present application, the term "correspond" may indicate that there is a direct correspondence or an indirect correspondence between the two, may also indicate that there is an association between the two, and may also indicate and be indicated, configure and configured, and so on.

In the embodiment of the present application, "predefining" may be implemented by saving a corresponding code, table, or other manners that may be used to indicate related information in advance in a device (for example, including a terminal device and a network device), and the present application is not limited to a specific implementation manner thereof.

Before describing the various embodiments shown herein, several concepts related to the present application will be described.

1) AI (Artificial Intelligence, intellectual Association)

Artificial Intelligence (Artificial Intelligence), abbreviated in english as AI. The method is a new technical science for researching and developing theories, methods, technologies and application systems for simulating, extending and expanding human intelligence. Artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence, a field of research that includes robotics, language recognition, image recognition, natural language processing, and expert systems, among others. Since the birth of artificial intelligence, theories and technologies become mature day by day, and application fields are expanded continuously, so that science and technology products brought by the artificial intelligence in the future can be assumed to be 'containers' of human intelligence. The artificial intelligence can simulate the information process of human consciousness and thinking. Artificial intelligence is not human intelligence, but can think like a human, and can also exceed human intelligence.

The main material basis for studying artificial intelligence and the machines that can implement the technical platform of artificial intelligence are computers. In addition to computer science, artificial intelligence also relates to a plurality of disciplines such as information theory, cybernetics, automation, bionics, biology, psychology, mathematical logic, linguistics, medicine, philosophy, and the like. The main contents of the artificial intelligence subject research comprise: knowledge representation, automatic reasoning and searching methods, machine learning and knowledge acquisition, knowledge processing systems, natural language understanding, computer vision, intelligent robots, automatic programming, and the like.

2) Machine Learning (Machine Learning, ML)

Machine learning is a multi-field cross subject and relates to a plurality of subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.

3) Knowledge map (Knowledge Graph)

Knowledge map (Knowledge Graph) is a series of different graphs displaying Knowledge development process and structure relationship in the book intelligence field, describing Knowledge resources and carriers thereof by using visualization technology, mining, analyzing, constructing, drawing and displaying Knowledge and mutual relation between Knowledge resources and Knowledge carriers. The knowledge graph is a modern theory which achieves the aim of multi-discipline fusion by combining theories and methods of applying subjects such as mathematics, graphics, information visualization technology, information science and the like with methods such as metrology introduction analysis, co-occurrence analysis and the like and utilizing a visualized graph to vividly display the core structure, development history, frontier field and overall knowledge framework of the subjects.

As described in the background, an emergency plan, which is an important task of emergency management, is a plan made in advance for possible emergency situations in connection with emergency management, emergency command, and emergency rescue.

In order to improve the above problem, an embodiment of the present application provides a method for constructing an emergency plan knowledge graph, where the method includes: constructing a target knowledge graph ontology graph; acquiring an emergency plan text; carrying out similarity matching on the emergency plan text and each candidate theme to obtain matching degree, and determining a target theme corresponding to the emergency plan text in each candidate theme based on the matching degree; performing area division on the emergency plan text based on the target subject, and acquiring a target text of a text area; performing knowledge extraction on the target text through a target knowledge extraction model to obtain a target entity and a relation between the target entities; and associating the target entities and the relationship between the target entities with the body map of the emergency plan knowledge map to construct the emergency plan knowledge map. By the scheme, the accuracy of entity extraction in the knowledge graph construction process can be improved. The above-described scheme is described in detail below.

Fig. 1 is a schematic diagram illustrating a construction of an emergency plan knowledge-graph building system according to an exemplary embodiment. Optionally, the emergency plan knowledge graph building system includes a server 110 and a terminal 120. The terminal 120 and the server 110 perform data communication via a communication network, which may be a wired network or a wireless network.

Optionally, the emergency plan knowledge graph constructing system may construct the emergency plan knowledge graph in the server 110, that is, each data (for example, entity attributes, entity relationships, and the like) corresponding to the knowledge graph is stored in a graph database in the server 110 in the form of a graph database.

Optionally, the server 110 includes a machine learning model for performing knowledge extraction, where the machine learning model may be a machine learning model trained in the server 110 through a training protocol text, or the machine learning model (e.g., a model training device) may also be a machine learning model trained in another computer device through a training protocol text. On the model training device, after the machine learning model for performing knowledge extraction is trained through the training protocol text, the structure of the machine learning model and the parameter information of the machine learning model can be sent to the server 110, so that the server 110 constructs the machine learning model for performing knowledge extraction.

Optionally, the knowledge extraction process may be executed on the terminal 120, that is, the terminal 120 may receive the parameter information of the machine learning model and the structure information of the machine learning model sent by the model training device or the server 110, and construct a corresponding machine learning model on the terminal 120. After the terminal 120 receives the emergency plan text requiring knowledge extraction, the machine learning model may be invoked through the application program to extract knowledge from the emergency plan text, and each data (e.g., entity attributes, entity relationships, etc.) obtained by the knowledge extraction is sent and stored in the server 110, so that the server 110 constructs a knowledge graph.

Alternatively, the terminal 120 may be a terminal device having an instruction input component, where the instruction input component may include a touch display screen, a mouse, a keyboard, and other components that generate instruction information according to a user operation, and the user may perform a specified operation on the instruction input component to control the terminal 120 to perform the specified operation (e.g., obtaining an emergency plan text, performing knowledge extraction on the emergency plan text, and the like).

Optionally, the terminal 120 may be a mobile terminal such as a smart phone, a tablet computer, a laptop portable notebook computer, or the like, or a terminal such as a desktop computer, a projection computer, or the like, or an intelligent terminal having a data processing component, which is not limited in this embodiment of the application.

The server 110 may be implemented as one server, or may be implemented as a server cluster formed by a group of servers, which may be physical servers or cloud servers. In one possible implementation, the server 110 is a backend server for applications in the terminal 120.

Optionally, the server may be an independent physical server, or a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server that provides technical computation services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, Network service, cloud communication, middleware service, domain name service, security service, CDN (Content Delivery Network), big data, and an artificial intelligence platform.

Optionally, the system may further include a management device, where the management device is configured to manage the system (e.g., manage connection states between the modules and the server, and the management device is connected to the server through a communication network. Optionally, the communication network is a wired network or a wireless network.

Optionally, the wireless network or wired network described above uses standard communication techniques and/or protocols. The network is typically the internet, but may be any other network including, but not limited to, a local area network, a metropolitan area network, a wide area network, a mobile, a limited or wireless network, a private network, or any combination of virtual private networks. In some embodiments, data exchanged over the network is represented using techniques and/or formats including hypertext markup language, extensible markup language, and the like. All or some of the links may also be encrypted using conventional encryption techniques such as secure sockets layer, transport layer security, virtual private network, internet protocol security, and the like. In other embodiments, custom and/or dedicated data communication techniques may also be used in place of, or in addition to, the data communication techniques described above.

Fig. 2 is a method flow diagram illustrating a method for emergency protocol knowledge graph construction in accordance with an exemplary embodiment. The method is performed by a computer device, which may be a server or a terminal in a model training system as shown in fig. 1, and is performed by the server as an example in an emergency plan knowledge graph construction method, as shown in fig. 2, which may include the following steps:

step 201, constructing a target knowledge graph ontology graph.

The target knowledge-graph ontology graph comprises entities and relations among the entities.

Ontologies are semantic data models that are used to define the types of things and the attributes that can be used to describe them. In this embodiment, the computer device may define the attributes of the instance to which the knowledge graph relates before constructing the knowledge graph.

Step 202, acquiring an emergency plan text.

And 203, performing similarity matching between the emergency plan text and each candidate theme to obtain a matching degree, and determining a target theme corresponding to the emergency plan text in each candidate theme based on the matching degree.

In a possible implementation manner of the embodiment of the application, after the computer device obtains the emergency plan text, the computer device may perform similarity matching between the emergency plan text and each candidate topic predefined in the computer device, so as to determine a target topic corresponding to the emergency plan text.

Taking the power industry as an example, the emergency plan texts are used for indicating specific action measures and command decision schemes required to be taken during the power emergency, but the power emergency also has multiple types, and the content difference of the emergency plan texts of different types may be large, for example, when the power generation equipment fails, the emergency measures corresponding to the power generation equipment and the power line fails due to natural disasters obviously have a considerable difference, so the corresponding emergency plan texts also have a large difference, so the computer equipment can classify the emergency plan texts by themes, and at this time, the text structures of the emergency plan texts classified into the same theme category are the same or similar.

And 204, performing area division on the emergency plan text based on the target subject, and acquiring a target text of a text area.

After the target subject corresponding to the emergency plan text is obtained, the area of the emergency plan text can be divided according to the text structure indicated by the target subject, and the computer equipment can directly determine the body area corresponding to the emergency plan text according to the text structure indicated by the target subject, determine the target text of the body area as the body text, and perform operations such as knowledge extraction on the body text.

And step 205, performing knowledge extraction on the target text through a target knowledge extraction model to obtain a target entity and a relation between the target entities.

In the embodiment of the application, the target text is the body text extracted according to the target subject, so that when the knowledge extraction operation is performed on the body text through the target knowledge extraction model, the influence on the knowledge extraction due to the structure differentiation of the emergency plan text is reduced as much as possible, and the accuracy of the target knowledge extraction model in executing the knowledge extraction operation is improved.

And step 206, associating the target entity and the relation between the target entities with the body map of the emergency plan knowledge map to construct the emergency plan knowledge map.

After the relationship between the target entity and the target entity is extracted, the relationship between the target entity and the target entity can be associated with the ontology graph of the emergency plan knowledge graph (that is, the ontology graph is input into a graph database corresponding to the emergency plan knowledge graph for storage), so that the emergency plan knowledge graph is constructed.

After the emergency plan knowledge graph is constructed, because the essence of the knowledge graph is a graph-based semantic network and the incidence relation between the entities is stored, the computer equipment can perform path retrieval on the entities in the graph according to the query condition input by the user, so that the entities corresponding to the query condition are obtained and returned to the user, and the user can quickly and accurately query the content corresponding to the query condition in the knowledge graph.

In summary, when the emergency plan knowledge graph is constructed, the computer device may first construct a target knowledge graph ontology graph to define a relationship between an ontology and an ontology, at this time, the computer device may obtain an emergency plan text to be input, perform similarity matching between the emergency plan text and a candidate topic, and determine a target topic corresponding to the emergency plan text, and at this time, the computer device may divide the emergency plan text according to the target topic, thereby determining a text of a body region, and perform knowledge extraction on the text of the body region through a knowledge extraction model, thereby extracting an entity to construct the emergency plan knowledge graph. In the scheme, the text of the emergency plan is matched with the preset candidate theme, so that the text area corresponding to the emergency plan is determined, the influence of an irregular text structure on a knowledge extraction model is reduced, and the accuracy of entity extraction in the process of building the knowledge map is improved.

Fig. 3 is a method flow diagram illustrating a method for emergency protocol knowledge-graph construction in accordance with an exemplary embodiment. The method is performed by a computer device, which may be a server or a terminal in a model training system as shown in fig. 1, and is performed by the server as an example in an emergency plan knowledge graph construction method, as shown in fig. 3, which may include the following steps:

and step 301, constructing a target knowledge graph ontology graph.

In a possible implementation manner of the embodiment of the present application, in the process of constructing the emergency plan knowledge graph, the computer device may first construct an initial ontology graph of the emergency plan knowledge graph, that is, a target knowledge graph ontology graph, so as to define the entities in the knowledge graph and the relationships between the entities.

Please refer to fig. 4, which shows a topological diagram of an ontology graph of an emergency plan knowledge graph according to an embodiment of the present application. As shown in fig. 4, in the emergency plan knowledge graph building process, the attribute ontology that may be involved in the emergency plan knowledge graph building process may include an upper-level department, an execution department, a consultation department, task content, task preconditions, a source system, and a target system.

At this time, the attribute information of the entity contained in the ontology may also be defined in advance, for example, the attribute information of the entity contained in the ontology may include: department { id, name, department level }, item department { name, role }, item { original text, id, task premise, task content, target system, source system }, link { id, serial number, name }, stage { id, name }, subject { id, name, local city, keyword }, company { id, name, level }, item propulsion { item propulsion }.

At this time, the relationship between the ontologies can be characterized as the relationship between the entities included in the ontologies, for example, the relationship between the entities included in the ontologies may include: { item-related department-item department }, { item department-corresponding department-department }, { item-belonging link-link }, { item-prepending item-item promotion }, { item promotion-postending item-item }, { link-belonging phase-phase }, { phase-belonging event-topic }, and { department-belonging company-company }.

Step 302, acquiring an emergency plan text.

And 303, performing similarity matching between the emergency plan text and each candidate theme to obtain a matching degree, and determining a target theme corresponding to the emergency plan text in each candidate theme based on the matching degree.

In this embodiment of the application, each candidate topic may be pre-stored in the computer device, and the computer device may store configuration information corresponding to each candidate topic, for example, for any candidate topic, topic information such as a topic name corresponding to the candidate topic, a title of the candidate topic, and a keyword of the candidate topic may be stored in the computer device, so that the computer device performs similarity matching with the emergency plan text according to the topic information of the candidate topic.

In a possible implementation manner, for each candidate topic, similarity matching is carried out on the file name of the emergency plan text and the topic name of the candidate topic to obtain a first matching value;

similarity matching is carried out on the keywords of the emergency plan text and the keywords of the candidate topics, and a third matching value is obtained; the occurrence frequency of the keywords of the emergency plan text in the emergency plan text is higher than a target threshold value;

weighting and summing the first matching value, the second matching value and the third matching value to obtain the matching degree of the candidate theme and the emergency plan text;

Please refer to fig. 5, which illustrates a flow chart of topic identification according to an embodiment of the present application. As shown in fig. 5, when a PDF file (i.e. an emergency plan text in the embodiment of the present application) is obtained, the PDF file may be subjected to filename similarity matching, in-file title matching, and in-file keyword matching, and the obtained matching results of the PDF file and the in-file keyword matching are calculated according to a certain weight, so as to obtain the matching degree between the PDF file and each candidate topic.

And (4) topic identification, namely judging the topic of the input document content, and acquiring the key text content to be analyzed according to a topic structure model preset by a system after the topic is judged.

A topic identification method (i.e., topic matching) can identify a topic based on domain-specific keyword matching techniques. The method specifically comprises the following steps: and designing a topic keyword model, wherein the topic keywords are mainly used for matching topics and contain information as keywords and a specific domain. If the keyword is typhoon and the specific domain is title, when the document is input, the system will obtain the title domain of the document and match the keyword, if the matching is hit, the document is divided into defined subjects. The purpose of theme distinguishing is that the paragraph structures corresponding to different themes are different, and after the theme is determined, the sentences to be extracted can be extracted according to the corresponding paragraph structures, so that the efficiency and accuracy of program operation can be greatly improved.

Step 304, at least one pair of start keywords and end keywords corresponding to the target topic is obtained.

When the subject classification is performed on the emergency plan text, that is, the target subject matched with the emergency plan text is determined, the target subject in the computer equipment contains each paragraph structure, a start keyword and an end keyword, and the computer equipment can divide the emergency plan text into each paragraph according to the start keyword and the end keyword.

And 305, acquiring a target text of a text area between the start keyword and the end keyword in the target subject for each pair of the start keyword and the end keyword.

In a possible implementation manner, after the start keyword and the end keyword are obtained, for each pair of the start keyword and the end keyword, the computer device may determine content between the start keyword and the end keyword as a text region of the paragraph;

or, in another possible implementation manner, after the start keyword and the end keyword are obtained, for each pair of the start keyword and the end keyword, the computer device may use a sentence in which the start keyword is located as a start sentence, use a sentence in which the end keyword is located as an end sentence, determine a text region from the start sentence to the end sentence as a body region, and determine a text from the start sentence to the end sentence as a body text.

Please refer to fig. 6, which illustrates a flowchart of an intelligent segment according to an embodiment of the present application. As shown in fig. 6, after the computer device obtains the PDF file, it may process the directory and table in the file, for example, directly ignore the directory and table in the file, and directly read the main content part in the file, at this time, since the PDF file is a text composition of one page, when the computer device reads a single page of PDF, it may determine the page of PDF, determine whether the start and end keywords are on the page, and when the start and end keywords are on the page, extract the text between the keywords as the text; when the starting keyword is read but the ending keyword is not read, the computer equipment reads the second page of PDF, judges whether the ending keyword exists in the second page of PDF or not, and extracts a text between the ending keyword and the starting keyword as a text if the ending keyword exists; and if the ending keyword does not exist, continuously reading the PDF of the third page and judging until the PDF single page with the ending keyword is read.

After the computer device divides the text area for each pair of the start keyword and the end keyword, the emergency plan text is actually divided into an area formed by paragraphs formed by the text areas and a non-text area outside the text area, and the non-text area can be regarded as a text feature with low importance, so that in order to avoid the influence of an irregular text structure on semantics, knowledge extraction can be performed only on text contents of the text area part in the subsequent knowledge extraction process. Although different text structures corresponding to different subjects are obtained, the text area of the emergency plan text is screened out through the keywords in the subjects, so that computer equipment can extract more important text contents in the different text structures, and the influence of the irregular text structures on follow-up knowledge extraction is reduced as much as possible.

In the embodiment of the application, in order to further improve the accuracy of knowledge extraction of the emergency plan text, after segmenting the emergency plan text to obtain the body regions of the emergency plan text, the sentence segmentation can be performed on each body region, so that the text of the body region is divided into the sentences, and knowledge extraction can be performed from the sentence level in the knowledge extraction process.

And step 306, performing word segmentation processing on the target text, and performing part-of-speech analysis on each obtained target word.

In order to realize the sentence segmentation of the good paragraphs, word segmentation processing is firstly carried out on a target text (for example, through jieba word segmentation), word segmentation is carried out based on a paddley cut in the jieba word segmentation, on one hand, Chinese word segmentation (including stop words) can be realized, and on the other hand, word part-of-speech and semantic annotation can be carried out on the words (20 word part-of-speech tags can be segmented out by default in a paddley model, for example, a noun and a verb v).

And 307, segmenting the target text according to the part-of-speech relationship among the target words and punctuation marks in the target text to obtain the segmented target text.

After the parts of speech of each target word is obtained, dependency syntactic analysis tools (such as DDparser) can be used to analyze the dependency relationship between words in the sentence (i.e. the part of speech relationship between target words, such as the relationship between subject and predicate, hereinafter referred to as "predicate relationship" or the like).

When the part-of-speech relationship between target words in a sentence is obtained, automatic segmentation can be performed according to the part-of-speech relationship and punctuation marks, and after segmentation, whether a parallel relationship or a disjunctive structure exists in the sentence can be judged, wherein the sentences are generally in the same item, so that the sentences with the parallel relationship and the sentences with the disjunctive relationship can be recombined into a sentence.

After the text content in the emergency plan text is acquired, because the format of the plan text is not standardized, the problem of lack of sentence components, such as lack or lack of a task execution subject, lack or lack of a task operation object, and the like, often exists, in a possible implementation manner of the embodiment of the application, the computer device may further perform sentence division processing on the text content after acquiring the text content, perform part-of-speech analysis on each sentence obtained after the sentence division, determine whether there is sentence component loss in the sentence, and supplement the components in the sentence when there is sentence component loss.

In a possible implementation manner, the target text after sentence division comprises at least one target sentence;

the computer equipment detects the part-of-speech relationship among words in the target sentence aiming at each target sentence;

and when the target sentence lacks the target part-of-speech relationship, performing completion operation on the target sentence according to the target part-of-speech relationship and the target paragraph in which the target sentence is located.

In a possible implementation manner, when a moving object relation exists in a target sentence and a predicate relation does not exist, completing a subject of the target sentence according to candidate subjects in a target paragraph;

and when the main-predicate relation exists in the target sentence and the moving-guest relation does not exist, combining the next sentence of the target sentence with the target sentence.

In one possible implementation, the candidate subject may be a subject of a sentence located before the target sentence in a sentence of a paragraph to which the target sentence belongs;

alternatively, the candidate subject may be a subtitle that is the smallest distance from the target sentence text in the paragraph to which the sentence belongs.

Please refer to fig. 7, which illustrates a schematic diagram of an intelligent clause and completion flow according to an embodiment of the present application. As shown in fig. 7, in a possible implementation manner, after reading the text, the computer device first performs an initial sentence splitting operation on the text according to punctuations in the text to obtain candidate sentences, and then the computer device analyzes the part-of-speech relationship of each candidate sentence according to the DDparser.

For any candidate sentence, when the candidate sentence has SBV (subject-to-predicate relationship) and VOB (moving object relationship), the candidate sentence is a complete sentence, the candidate sentence is determined to be a sentence, the candidate sentence is directly obtained as a target sentence, and the subject in the candidate sentence is extracted to be used as the candidate subject of the next sentence with no subject.

When the candidate sentence has no SBV (main and predicate relation) and has VOB (moving object relation), the candidate sentence obviously lacks a subject, and at this time, a subject can be selected from the candidate subjects to complete the sentence of the candidate subject.

When the candidate sentence has no VOB (moving object relation), the next candidate sentence and the candidate sentence are directly merged into the target sentence.

Since in actual emergency plan text, due to the habit of writing, the subject is usually represented by default words (e.g. "it", "this"), but this may cause that the computer device may not understand the true semantics when extracting knowledge, and the computer device determines whether the sentence structure is complete and meaningful by DDParser before extracting knowledge, and extracts the subject in the sentence by jieba participle to replace the default words. And aiming at the pre-arranged plan, modifying the word banks of the jieba participle and the DDParser to improve the accuracy, thereby well achieving the aim and being basically the same as the selected sentences during manual marking.

And 308, extracting knowledge of the target text after the sentence division through a target knowledge extraction model to obtain a target entity and a relation between the target entities.

In one possible implementation, a training plan text is obtained; the training plan text comprises entity marking information; the entity marking information is used for indicating sentences, entity types and entity contents where the entities are located;

and performing knowledge extraction on the training plan text through the initial knowledge extraction model, and performing target times of iterative updating on the initial knowledge extraction model according to the extraction result and the entity labeling information to generate a target knowledge extraction model.

Optionally, the label on the training plan text is a manual label, the manual label is mainly based on the entity category in the attribute management, and the value is taken based on a key-value rule, and the content includes: sentences, attributes and values, wherein the sentences are the sentences of the entities, the attributes are the entity categories of the entities, and the values are the entity contents. When entity labeling is performed, sentences with incomplete sentence components need to be completed manually. Disaster management leader group offices such as "company typhoons, floods, etc. report to the disaster management leader group. "the content manually labeled for the sentence is: "sentence" is reported from a disaster leadership office to a disaster leadership office, such as a company typhoon or flood "," execution department "is reported to a leadership office", "task content" is reported ", and" upper department "is reported to a disaster leadership office.

Optionally, in the knowledge extraction model used in the embodiment of the present application, a BiLSTM + CRF model combining a deep learning model and machine learning may be selected, and the optimization of the algorithm parameters mainly includes: (1) setting the Dropout parameter prevents overfitting. When the Dropout loss neuron is not used, the network trains and fits the data on a smaller data set, the accuracy rate is close to 1, and the network generates an overfitting phenomenon. Thus setting Dropout to a random inactivation rate of 0.5, half of the neurons are lost during training, preventing overfitting. (2) The learning rate is attenuated in stages. Too small learning rate can cause the neural network of your to be unable to learn at all, and too big learning rate easily appears the overfitting phenomenon. Therefore, in a stepwise learning rate decay manner, the initial learning rate value is set as: the learning-rate is 1e-3 and is controlled by the decapay parameters of the optimizer class. The learning rate is big at the initial stage of learning, makes the model study fast, and the later stage constantly reduces the decay, prevents to appear overfitting, and the model can be in the later stage and continuously study and finely tune. (3) A mini-batch is set. Mini-batch is a batch of one-time training data set, if the batch gradient descent method is adopted, the whole training set needs to be processed at one time, and then one-time gradient descent is realized, so that the speed is very slow. The mini-batch gradient descent method can divide the number of samples into a plurality of small mini-batches, so that the learning speed can be improved by processing a single small batch at the same time instead of processing all the X and Y training sets. The value is set to 128. (4) When the epoch (number of training rounds) is trained, it is not enough to train all data iteratively once, and fitting convergence can be achieved by repeating the training many times. Too many rounds result in consumption of a large amount of useless resources, the model cannot be promoted, and too few rounds have poor model training effect. Thus, set to 150 rounds, the model reaches convergence and does not continue training inefficiently.

Optionally, during the training process of the knowledge extraction model, data preprocessing is first required. And converting the manually marked data into a 'BIO' mark. The model is then trained on the loss function, which is shown below.

The purpose of model training is to minimize the loss function, i.e., maximize the probability. The parameters after training are saved in the ckpt file. The parameters after training are saved in the ckpt file.

And 309, associating the target entities and the relationship between the target entities with the body map of the emergency plan knowledge map to construct the emergency plan knowledge map.

Optionally, after the emergency plan knowledge graph is constructed, association analysis and query of the knowledge graph can be performed, wherein the essence of the knowledge graph is a semantic network based on a graph, the association relation between entities is stored, the essence of the association query of the knowledge graph is to perform path retrieval on the entities in the graph, and the graph is retrieved based on a depth-first (DFS) algorithm and a breadth-first (BFS) algorithm; the knowledge graph association analysis comprises various graph analysis algorithms, such as algorithm of PageRank, loop detection, shortest and longest path, K-hop reachability query, community discovery and the like.

Optionally, in a possible implementation manner, the emergency plan knowledge graph may further include an organization management module, and the computer device may, according to a company organization architecture, expand a department name in the plan to "company name + department name" so as to implement association query between companies of different levels, for example: the 'A formula, B county power supply company emergency office' can be associated and inquired to 'A company, C city power supply company emergency office'.

Optionally, the knowledge graph query may also use a viterbi algorithm: two matrices T1 and T2 are used; t1 records the maximum probability of falling in all hidden states at the current moment, and T1ij represents the maximum probability of falling in a hidden state i at the j-th moment (i.e. the j-th text character); the maximum probability is stored by T2 as the hidden state transition from the previous time, i.e. record the transition path; and finally, tracing back forwards through ending to find the optimal path with the maximum probability, namely obtaining the label of the text sequence, and taking the obtained text sequence as a query result.

Optionally, after the knowledge graph of the emergency plan is constructed, the knowledge graph is visualized, the knowledge graph processes complex information into knowledge capable of being structurally represented through calculation, and the represented knowledge can be displayed through graph drawing, so that a valuable reference is provided for the learning of people, and convenience is provided for information retrieval. Because the knowledge graph is stored and managed in a graph mode, the knowledge graph is often displayed in a relational graph mode.

Fig. 8 is a block diagram illustrating a construction of an emergency plan knowledge-map construction apparatus according to an exemplary embodiment. The device comprises:

an ontology construction module 801, configured to construct a target knowledge graph ontology graph; the target knowledge graph ontology graph comprises entities and relations among the entities;

a plan text acquisition module 802, configured to acquire an emergency plan text;

a similarity matching module 803, configured to perform similarity matching between the emergency plan text and each candidate topic to obtain a matching degree, and determine, based on the matching degree, a target topic corresponding to the emergency plan text in each candidate topic;

the area dividing module 804 is configured to perform area division on the emergency plan text based on the target subject, and obtain a target text of a text area;

a knowledge extraction module 805, configured to perform knowledge extraction on the target text through a target knowledge extraction model, so as to obtain a target entity and a relationship between the target entities;

and the map building module 806 is configured to associate the target entity and the relationship between the target entities with the body map of the emergency plan knowledge map, so as to build the emergency plan knowledge map.

and for each pair of the starting keyword and the ending keyword, acquiring a target text of a text region between the starting keyword and the ending keyword in a target subject.

In one possible implementation, the apparatus further includes:

The knowledge extraction module is also used for extracting the knowledge,

the device further comprises:

when a moving object relation exists in the target sentence and a main-meaning relation does not exist, completing the subject of the target sentence according to the candidate subject in the target paragraph;

In one possible implementation, the apparatus further includes:

Refer to fig. 9, which is a schematic diagram of a computer device according to an exemplary embodiment of the present application, the computer device including a memory and a processor, the memory storing a computer program, and the computer program when executed by the processor implementing the method.

The processor may be a Central Processing Unit (CPU). The Processor may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or a combination thereof.

The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the methods of the embodiments of the present invention. The processor executes various functional applications and data processing of the processor by executing non-transitory software programs, instructions and modules stored in the memory, that is, the method in the above method embodiment is realized.

The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by the processor, and the like. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and such remote memory may be coupled to the processor via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

In an exemplary embodiment, a computer readable storage medium is also provided for storing at least one computer program, which is loaded and executed by a processor to implement all or part of the steps of the above method. For example, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A construction method of an emergency plan knowledge graph is characterized by comprising the following steps:

acquiring an emergency plan text;

2. The method according to claim 1, wherein the matching degree is obtained by performing similarity matching between the emergency plan text and each candidate topic, and a target topic corresponding to the emergency plan text is determined in each candidate topic based on the matching degree, including:

3. The method according to claim 1 or 2, wherein the performing area division on the emergency plan text based on the target subject and obtaining the target text of a body area comprises:

4. The method of claim 3, wherein before the extracting knowledge of the target text by the target knowledge extraction model to obtain the target entities and the relationships between the target entities, the method further comprises:

performing word segmentation processing on the target text, and performing part-of-speech analysis on each obtained target word;

according to the part-of-speech relationship among the target words and punctuation marks in the target text, the target text is divided into sentences to obtain a target text after the sentences are divided;

the knowledge extraction of the target text through the target knowledge extraction model to obtain the target entity and the relationship between the target entities comprises the following steps:

5. The method according to claim 4, wherein the target text after the sentence division comprises at least one target sentence;

before extracting knowledge from the target text after the sentence division through a target knowledge extraction model to obtain the target entity and the relation between the target entities, the method further comprises the following steps;

for each target sentence, detecting part-of-speech relations among words in the target sentence;

and when the target sentence lacks a target part-of-speech relationship, performing completion operation on the target sentence according to the target part-of-speech relationship and the target paragraph in which the target sentence is located.

6. The method of claim 5, wherein when the target sentence lacks a target part-of-speech relationship, performing a completion operation on the target sentence according to the target part-of-speech relationship and a target paragraph in which the target sentence is located comprises:

7. The method according to claim 1 or 2, characterized in that the method further comprises:

acquiring a training plan text; the training plan text comprises entity marking information; the entity marking information is used for indicating sentences, entity categories and entity contents where the entities are located;

and extracting knowledge from the training plan text through an initial knowledge extraction model, and performing target times of iterative updating on the initial knowledge extraction model according to an extraction result and the entity labeling information to generate the target knowledge extraction model.

8. An emergency plan knowledge-graph construction apparatus, the apparatus comprising:

the plan text acquisition module is used for acquiring an emergency plan text;

9. A computer device comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by the processor to implement the emergency plan knowledgegraph construction method of any one of claims 1 to 7.

10. A computer-readable storage medium having stored therein at least one instruction, which is loaded and executed by a processor to implement the emergency protocol intellectual graph construction method according to any one of claims 1 to 7.