Disclosure of Invention
It is an object of the present invention to provide a method of constructing a knowledge-graph-based marine domain expert library that overcomes or at least alleviates at least one of the above-mentioned disadvantages of the prior art.
In one aspect of the invention, a method for constructing a knowledge graph-based marine field expert database is provided, and the method for constructing the knowledge graph-based marine field expert database comprises the following steps:
constructing a marine field expert semantic model, wherein the marine field expert semantic model comprises at least one group of keyword information groups and incidence relation information;
acquiring a marine information database;
extracting information related to the keyword information group in the ocean information database according to the keyword information group to serve as an extracted knowledge information base, wherein one extracted knowledge information base can be extracted from one group of keyword information groups, and each extracted knowledge information base comprises at least one piece of knowledge information;
extracting the relation between one or more knowledge information and other knowledge information according to the incidence relation information, thereby generating a knowledge graph;
and generating a visual map according to the knowledge map.
Optionally, the method for constructing a knowledge graph-based marine domain expert library further includes:
generating a knowledge question-answering base according to the knowledge graph;
and performing man-machine interaction with the user according to the knowledge question-answering library.
Optionally, the extracting information related to the keyword information in the marine information database according to the keyword information groups is used as an extracted knowledge information base, where a group of keyword information groups can extract one extracted knowledge information base, and each extracted knowledge information base includes at least one piece of knowledge information:
identifying the text or picture content in the marine information database;
and extracting the information related to the keyword information in the identified marine information database as an extracted knowledge information base according to the keyword information group.
Optionally, the keyword information group includes:
a location keyword group, an organization mechanism keyword group, an academic achievement keyword group, a reference document keyword group, a research field keyword group, a thesis keyword group, a marine field news keyword group, an education experience keyword group, a work experience keyword group, and a name keyword group;
the extracted knowledge information base comprises a location knowledge information base extracted according to location key phrases, an organization knowledge information base extracted according to organization key phrases, an academic achievement knowledge information base extracted according to academic achievement key phrases, a reference knowledge information base extracted according to reference document key phrases, a research field knowledge information base extracted according to work experience key phrases and a name knowledge information base extracted according to name key phrases.
Optionally, the identifying the text or picture content in the marine information database includes:
and identifying the text content in the marine information database by a Bi-LSTM-CRF algorithm and a vocabulary-based Bidirectional Maximum Matching (BMM) algorithm.
Optionally, the extracting relationships with other knowledge information for one or more of the knowledge information according to the association relationship information, so as to generate the knowledge graph includes: and extracting the relation between one or more knowledge information and other knowledge information by using a PrTransH algorithm.
Optionally, the generating a visualization graph according to the knowledge graph comprises:
establishing a clustering model;
and inputting part of or all information in the knowledge graph to the clustering model so as to generate a visual clustering graph.
The application also provides a device is founded to marine field expert storehouse based on knowledge map, marine field expert storehouse based on knowledge map founds the device and includes:
the marine field expert semantic model building module is used for building a marine field expert semantic model, and the marine field expert semantic model comprises at least one group of keyword information groups and incidence relation information;
the system comprises a marine information database acquisition module, a keyword information group acquisition module and a keyword information extraction module, wherein the marine information database acquisition module is used for extracting information related to the keyword information group in the marine information database according to the keyword information group to serve as an extracted knowledge information base, one extracted knowledge information base can be extracted from one group of keyword information groups, and each extracted knowledge information base comprises at least one piece of knowledge information;
the knowledge graph generating module is used for extracting the relation between one or more pieces of knowledge information and other knowledge information according to the incidence relation information so as to generate a knowledge graph;
and the visual map generation module is used for generating a visual map according to the knowledge map.
Optionally, the apparatus for constructing a marine domain expert database based on a knowledge graph includes: the knowledge question-answer base generation module is used for generating a knowledge question-answer base according to the knowledge graph;
and the human-computer interaction module is used for carrying out human-computer interaction with the user according to the knowledge question-answering library.
The application also provides an electronic device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the computer program to realize the method for constructing the knowledge-graph-based marine domain expert library.
The present application also provides a computer-readable storage medium storing a computer program which, when executed by a processor, is capable of implementing the method for constructing a knowledge-graph-based marine domain expert library as described above.
Advantageous effects
The method for constructing the marine field expert database based on the knowledge graph constructs the knowledge graph of the marine field, so that efficient searching of knowledge is provided for relevant personnel, a foundation is laid for finding of association relation among knowledge, support is provided for marine professional knowledge service finally, and users can observe conveniently through the visual graph.
Detailed Description
In order to make the implementation objects, technical solutions and advantages of the present application clearer, the technical solutions in the embodiments of the present application will be described in more detail below with reference to the drawings in the embodiments of the present application. In the drawings, the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The described embodiments are a subset of the embodiments in the present application and not all embodiments in the present application. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. Embodiments of the present application will be described in detail below with reference to the accompanying drawings.
In the description of the present application, it is to be understood that the terms "central," "longitudinal," "lateral," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like are used in the orientation or positional relationship indicated in the drawings for convenience in describing the present application and for simplicity in description, and are not intended to indicate or imply that the referenced devices or elements must have a particular orientation, be constructed in a particular orientation, and be operated in a particular manner and are not to be considered limiting of the scope of the present application.
Fig. 1 is a schematic flow chart of a method for constructing a knowledge-graph-based marine domain expert database according to a first embodiment of the present invention.
The method for constructing the marine field expert database based on the knowledge graph as shown in FIG. 1 comprises the following steps:
step 1: constructing a marine field expert semantic model, wherein the marine field expert semantic model comprises at least one group of keyword information groups and incidence relation information;
step 2: acquiring a marine information database;
and step 3: extracting information related to the keyword information groups in the marine information database as an extracted knowledge information base according to the keyword information groups, wherein one extracted knowledge information base can be extracted from one group of keyword information groups, and each extracted knowledge information base comprises at least one piece of knowledge information;
and 4, step 4: extracting the relation between one or more knowledge information and other knowledge information according to the incidence relation information, thereby generating a knowledge graph;
and 5: and generating a visual map according to the knowledge map.
The method for constructing the marine field expert database based on the knowledge graph constructs the knowledge graph of the marine field, so that efficient searching of knowledge is provided for relevant personnel, a foundation is laid for finding of association relation among knowledge, support is provided for marine professional knowledge service finally, and users can observe conveniently through the visual graph.
In this embodiment, the method for constructing the marine domain expert database based on the knowledge graph further includes:
step 6: generating a knowledge question-answering base according to the knowledge graph;
and 7: and performing man-machine interaction with the user according to the knowledge question-answering library.
According to the method for constructing the marine field expert base based on the knowledge map, all knowledge information is obtained firstly, named entity identification, relation extraction and attribute extraction are carried out through a machine learning algorithm, and data fusion and data storage are carried out on the knowledge information. The knowledge graph is generated through the knowledge information and the incidence relation information, the effective integration of the expert knowledge graph in the marine field is achieved, and the visual graph and the intelligent question and answer are achieved on the basis of the construction of the knowledge graph.
In this embodiment, extracting information related to keyword information in the marine information database as extracted knowledge information bases according to keyword information groups, where a group of keyword information groups can extract one extracted knowledge information base, and each extracted knowledge information base includes at least one piece of knowledge information:
identifying the text or picture content in the marine information database;
and extracting information related to the keyword information in the identified marine information database as an extracted knowledge information base according to the keyword information group.
In this embodiment, the keyword information group includes:
a location keyword group, an organization mechanism keyword group, an academic achievement keyword group, a reference document keyword group, a research field keyword group, a thesis keyword group, a marine field news keyword group, an education experience keyword group, a work experience keyword group, and a name keyword group;
the extracted knowledge information base comprises a location knowledge information base extracted according to location key phrases, an organization knowledge information base extracted according to organization key phrases, an academic achievement knowledge information base extracted according to academic achievement key phrases, a reference literature knowledge information base extracted according to reference literature key phrases, a research field knowledge information base extracted according to work experience key phrases and a name knowledge information base extracted according to name key phrases.
For example, a user needs to know specific information taught by experts in the field of marine organism big data, and the main content of expert information collection comprises three parts of expert basic information, achievement information and communication information. The basic information comprises the name, the sex, the birth year and month, the academic calendar, the academic position, the professional technical title and the like of the expert; the result information comprises the papers, the works, the patents, the undertaking projects and the like of experts;
a user needs to know which experts in the field of the ocean sensor can provide domain knowledge and needs to know a specific mechanism name;
the method for acquiring the expert information in the marine field specifically comprises the following steps:
step 1: constructing an expert semantic model, wherein the expert information semantic model comprises keyword information and incidence relation information; for example, the keyword information includes name (e.g., name of each well-known student may be added), gender (male, female), year and month of birth (e.g., numerical and textual information in order of year and month of birth), academic calendar (e.g., textbook, graduate, doctor), professional title (e.g., researcher, hospital, etc.), treatises, patents (e.g., with ZL sign, or invention, utility model, etc.), address, contact phone, and email.
Step 2: and acquiring expert information according to the semantic model, wherein the acquired main content comprises three parts of expert basic information, achievement information and communication information. The basic information comprises the name, the sex, the birth year and month, the academic calendar, the academic position, the professional technical title and the like of the expert; the result information comprises the papers, the works, the patents, the undertaking projects and the like of experts; the communication information of the expert comprises the communication address, the contact telephone, the electronic mail box and the like of the expert.
This is the core part of the application, and the construction of expert knowledge graph starts with the abstraction of the relationship between the entities and attributes such as experts, patents, etc. According to the attribute analysis of experts and the relation analysis between experts, the mode diagram of the system is basically determined, the experts are main parts of documents, patents, information and projects, the experts belong to the documents, the patents, the information and the projects, the relations of co-workers, cooperation and the like exist among the experts, and the relation between every two experts is established to construct an expert map network. Such as a relational model diagram, an expert as an entity, a patent as an entity, the expert owning the patent, the owner of the patent being the expert, and the entity owning the patent having its own attributes, such as attribute values of the author, content, organization, time, etc. of the patent. For example, if the author attribute of a document is expert 1 and expert 2, then he is in a cooperative relationship; if the organization names in the basic information of expert 1 and expert 2 are the same, they are the same-colleague relationship; if the subjects of the two patents are the same, then the authors, expert 1 and expert 2, are in the same relationship and so on.
And step 3: and generating a visual map according to the associated map information or the patient information. Specifically, in one embodiment, a visualization map, such as a star map, a relationship map, or the like, may be made by mapping software.
It will be appreciated that such mapping is based on the attributes and relationships described above.
In this embodiment, extracting information related to keyword information in the marine information database as extracted knowledge information bases according to keyword information groups, where a group of keyword information groups can extract one extracted knowledge information base, and each extracted knowledge information base includes at least one piece of knowledge information:
identifying the text or picture content in the marine information database;
and extracting information related to the keyword information in the identified marine information database as an extracted knowledge information base according to the keyword information group.
In this embodiment, identifying textual or pictorial content in the marine information database comprises identifying textual content in the marine information database by a Bi-LSTM-CRF algorithm and a vocabulary-based Bidirectional Maximum Matching (BMM) algorithm. Specifically, the text content in the marine information database is identified based on the BilSTM + CRF algorithm (FIG. 5). BilSTM can predict the probability that each word belongs to different labels, and then use Softmax to obtain the label with the highest probability as the predicted value of the position. Thus, the relevance between tags is ignored in the prediction, but BilSTM does not consider the inter-tag association. Thus, BilSTM + CRF adds a CRF to the output layer of BilSTM, so that the model can consider the correlation between class labels, and the correlation between labels is the transition matrix in the CRF and represents the probability of transition from one state to another. The BilSTM + CRF considers the probability of the whole class target path rather than the probability of a single class target, and after the CRF is added to a BilSTM output layer, the identification of the marine field experts is more accurate.
In this embodiment, the keyword information group includes a location keyword group, an organization keyword group, a academic achievement keyword group, a reference document keyword group, a research field keyword group, a thesis keyword group, a marine field news keyword group, an education experience keyword group, a work experience keyword group, and a name keyword group;
the extracted knowledge information base comprises a location knowledge information base extracted according to location key phrases, an organization knowledge information base extracted according to organization key phrases, an academic achievement knowledge information base extracted according to academic achievement key phrases, a reference literature knowledge information base extracted according to reference literature key phrases, a research field knowledge information base extracted according to work experience key phrases and a name knowledge information base extracted according to name key phrases.
In this embodiment, extracting the relationship with other knowledge information for one or more of the knowledge information according to the association relationship information, thereby generating the knowledge graph includes: and extracting the relation between one or more knowledge information and other knowledge information by using the Bi-LSTM-CRF algorithm.
In this embodiment, the generating a visualization graph according to the knowledge graph includes:
establishing a clustering model;
and inputting part of or all information in the knowledge graph to the clustering model so as to generate a visual clustering graph.
The method comprises the steps of firstly, utilizing an improved entity construction method to construct an entity, identifying the entity, the attribute and the relationship based on an NER named entity identification Bi-LSTM-CRF algorithm technology, and achieving efficient and accurate acquisition and extraction of an extracted marine information database. The method and the device realize the visualization of the ocean expert knowledge base by combining the e-charts technology on the basis of the established knowledge base, convert natural language problems into Cypher query language of a Neo4j graphic database on the basis of a corpus matching mode on the basis of the visualization of the knowledge base, and realize the query of knowledge in the ocean field expert knowledge base.
The constructed marine field expert knowledge map takes an expert entity as a center, so that the field, direction and result of research need to be considered for extracting data, and the province and research institutions of different oceans need to be used as an organization structure of classification information, so that an expert list and a corresponding URL list are crawled on a vertical field website, then corresponding list URLs are sequentially visited, expert province and city information and organization institutions related to marine experts are obtained by analyzing page information, and academic results sequentially find related original information needing to be obtained.
According to the obtained original expert data, original marine expert information expert data are stored based on a neo4j database, marine field expert entity recognition needs to be carried out on the data of marine field experts due to the particularity of the marine field expert data, the marine field expert information ontology recognition is carried out by using a Bi-LSTM-CRF algorithm and a vocabulary-based Bidirectional Maximum Matching (BMM) algorithm, and model evaluation is carried out on the accuracy, the recall rate and the F1 value of the entity recognition through the algorithm. Wherein the precision rate is P-TP/(FP + TP), the recall rate is P-TP/(FN + TP), and the F1 value is F1-2P R/(P + R).
And (3) relation extraction of experts in the marine field: the purpose of the relationship extraction is to extract two entities and triples of relationships, and in the patent, 9 relationships are extracted, namely 9 relationships of a marine domain expert and a region, a marine domain expert and an organization relationship, an expert and a research domain relationship, an expert and a domain reference document relationship, an expert and a thesis author relationship, an expert and a marine domain news relationship, an expert relationship and an education experience relationship, an expert and a work experience relationship, and an expert and winning prize condition relationship.
On the basis of the relation extraction, the Bi-LSTM-CRF algorithm is used in the patent.
A clustering model is established for expert information, trained embedding carriers are used for clustering experts to verify the effectiveness of the experts, DBSCAN is used for clustering 200 experts in an embedding space, and then embedding vectors are projected to a two-dimensional space to realize the visualization of expert clustering. As shown in FIG. 2, experts in the same field are grouped into different clusters, and the fact that the learning diagram embedding vector can use expert semantic representation is proved. As shown in FIG. 2, four typical clusters are circled and labeled using the oceanic expert field categories.
After the marine field knowledge information is extracted, the marine field expert information needs to be visually displayed, the knowledge graph is visually displayed by using an E-characters visual scheme, and a specific display effect graph is shown in FIG. 3.
Referring to fig. 4, an integral marine field expert database is constructed for a professional field mechanism, an interface capable of performing human-computer interaction needs to be built according to the requirements of users, and question answering through natural language is a communication mode of habits of people, so that the Chinese language question answering model based on the marine field expert knowledge graph is constructed by applying a machine learning technology.
This patent utilizes natural language to ask and answer to carry out word vector processing, at first converts natural language into the vector sequence through the word vector, carries out attribute linkage through the entity alignment method based on traditional probability model, obtains expert knowledge corpus at last, and the Cypher query language of the pattern database of Neo4j is converted into the natural language problem based on the mode that the corpus matches, accomplishes the inquiry of knowledge in the ocean field expert knowledge map to return visual query result for the user. And the information retrieval and result display of ocean experts are realized. The invention improves the knowledge reasoning in the marine field and provides man-machine interaction service for users.
In summary, the invention establishes a construction process for constructing the marine expert knowledge graph from a large amount of scattered network data, and verifies the effectiveness of using the embedded vector as entity semantic representation through the construction domain clustering algorithm of the marine expert knowledge graph. The method combines a naive Bayes method, deduces the field of the expert engagement in the marine field, and recommends related collaborators and treatises. The patent can also be applied to marine expert papers and team member search indexes. In addition, the vector is embedded into the neural network to realize the transfer of knowledge, and the application of the neural network in the knowledge map is realized.
The application also provides a knowledge graph-based marine field expert database construction device, which comprises a marine field expert semantic model construction module, a marine information database acquisition module, a knowledge graph generation module and a visual graph generation module, wherein,
the ocean field expert semantic model building module is used for building an ocean field expert semantic model, and the ocean field expert semantic model comprises at least one group of keyword information groups and incidence relation information;
the marine information database acquisition module is used for extracting information related to the keyword information groups in the marine information database according to the keyword information groups to serve as an extracted knowledge information base, wherein one extracted knowledge information base can be extracted from one group of keyword information groups, and each extracted knowledge information base comprises at least one piece of knowledge information;
the knowledge graph generating module is used for extracting the relation between one or more knowledge information and other knowledge information according to the incidence relation information so as to generate a knowledge graph;
and the visual map generation module is used for generating a visual map according to the knowledge map.
In this embodiment, the apparatus for constructing an ocean domain expert base based on a knowledge graph further comprises a knowledge question and answer base generating module and a human-computer interaction module, wherein the knowledge question and answer base generating module is used for generating a knowledge question and answer base according to the knowledge graph; and the human-computer interaction module is used for performing human-computer interaction with the user according to the knowledge question-answering base.
It should be noted that the foregoing explanations of the method embodiments also apply to the apparatus of this embodiment, and are not repeated herein.
The application also provides an electronic device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the computer program to realize the above method for constructing the knowledge-graph-based marine domain expert library.
For example, an electronic device includes an input device, an input interface, a central processing unit, a memory, an output interface, and an output device. The input interface, the central processing unit, the memory and the output interface are mutually connected through a bus, and the input equipment and the output equipment are respectively connected with the bus through the input interface and the output interface and further connected with other components of the computing equipment. Specifically, the input device receives input information from the outside and transmits the input information to the central processing unit through the input interface; the central processing unit processes the input information based on the computer executable instructions stored in the memory to generate output information, temporarily or permanently stores the output information in the memory, and then transmits the output information to the output device through the output interface; the output device outputs the output information to an exterior of the computing device for use by a user.
The application also provides a computer readable storage medium, which stores a computer program, and the computer program can realize the above method for constructing the ocean domain expert database based on the knowledge graph when being executed by a processor.
Although the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application, and those skilled in the art can make variations and modifications without departing from the spirit and scope of the present application.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media include both non-transitory and non-transitory, removable and non-removable media that implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Furthermore, it will be obvious that the term "comprising" does not exclude other elements or steps. A plurality of units, modules or devices recited in the device claims may also be implemented by one unit or overall device by software or hardware. The terms first, second, etc. are used to identify names, but not any particular order.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks identified in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The Processor in this embodiment may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may be used to store computer programs and/or modules, and the processor may implement various functions of the apparatus/terminal device by running or executing the computer programs and/or modules stored in the memory, as well as by invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
In this embodiment, the module/unit integrated with the apparatus/terminal device may be stored in a computer-readable storage medium if it is implemented in the form of a software functional unit and sold or used as a separate product. Based on such understanding, all or part of the flow in the method according to the embodiments of the present invention may also be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of the embodiments of the method. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution media, and the like. It should be noted that the computer readable medium may contain content that is appropriately increased or decreased as required by legislation and patent practice in the jurisdiction.
Although the invention has been described in detail hereinabove with respect to a general description and specific embodiments thereof, it will be apparent to those skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.