Disclosure of Invention
Technical problem to be solved
Aiming at the technical problems in the prior art, the invention provides a knowledge base system based on a multi-source knowledge acquisition technology, a construction method thereof, electronic equipment and a storage medium, which are used for at least partially solving the technical problems.
(II) technical scheme
The invention provides a knowledge base system construction method based on a multi-source knowledge acquisition technology, which comprises the following steps: crawling off-source knowledge from a first data source using a distributed crawler technique; obtaining expert knowledge from a second data source via a structured questionnaire; unifying terms and concepts in open source knowledge and expert knowledge, and unifying data formats to obtain standardized knowledge; classifying the standardized knowledge according to grades, creating examples according to professional fields and/or important terms of the standardized knowledge, constructing a field ontology of the knowledge base system, and constructing a parameter template according to the field ontology to obtain the knowledge base system.
Optionally, the open-source knowledge is acquired by using a distributed crawler technology based on a distributed cloud architecture, wherein a bottom architecture of the distributed cloud architecture is constructed by using a Docker container cloud cluster.
Optionally, a master-slave distributed crawler model is used for providing a URL distribution service to acquire open source knowledge, so as to realize multi-mode-based webpage structured data extraction.
Optionally, the expert is assisted in collective discussion through a head-to-head photo method or an electronic head-to-head storm method, and second data is collected from a second data source; and (5) sorting the second data by a Delphi method or a nominal group method to obtain expert knowledge.
Optionally, the domain ontology is constructed using a Prot g ontology construction tool.
Optionally, the method further comprises: maintaining the knowledge base system, including parameter information description and/or adding parameters and/or modifying parameters and/or hiding parameters and/or adjusting parameter sequence and/or applying parameter templates, and creating and/or editing and/or deleting and/or managing hierarchy of classification.
Optionally, the first data source comprises a web page and/or a database and the second data source comprises a summary of the experience of the target knowledge expert.
The invention also provides a knowledge base system based on the multi-source knowledge acquisition technology, which comprises the following steps: the open-source knowledge acquisition module is used for crawling open-source knowledge from a first data source by utilizing a distributed crawler technology; an expert knowledge acquisition module for acquiring expert knowledge from a second data source via a structured questionnaire; the knowledge standardization module is used for unifying terms and concepts in open source knowledge and expert knowledge and unifying data formats to obtain standardized knowledge; and the knowledge base construction and maintenance module is used for classifying the standardized knowledge according to grades, creating examples according to professional fields and/or important terms of the standardized knowledge, constructing a field ontology of the knowledge base system, constructing a parameter template according to the field ontology, obtaining the knowledge base system and maintaining the knowledge base system.
The present invention also provides an electronic device comprising: one or more processors; and a memory for storing one or more programs, wherein when the one or more programs are executed by the one or more processors, the one or more processors implement the above-mentioned knowledge base system construction method based on the multi-source knowledge acquisition technology according to the embodiment of the present invention.
The invention also provides a computer-readable storage medium, which stores computer-executable instructions, and the instructions are used for implementing the knowledge base system construction method based on the multi-source knowledge acquisition technology according to the embodiment of the invention.
(III) advantageous effects
The invention provides a knowledge base system based on a multi-source knowledge acquisition technology and a construction method thereof, which respectively utilize a distributed crawler technology and a structured questionnaire to acquire open-source knowledge and expert knowledge; unifying terms and concepts in open source knowledge and expert knowledge, and unifying data formats to obtain standardized knowledge; the standardized knowledge is classified according to grades, examples are created according to professional fields and/or important terms of the standardized knowledge, a field ontology of the knowledge base system is built, parameter templates are built according to the field ontology, and the knowledge base system is built.
Based on the standardized processing of knowledge, the construction of a domain ontology and the construction and maintenance of a parameter template, the knowledge base system provided by the invention realizes the automatic summary of knowledge, and by uniformly collecting and managing multi-source target knowledge including open source knowledge and expert knowledge, unified verification and standardized construction are realized, so that the third-party output of knowledge is facilitated, the communication of knowledge is facilitated, and the availability of knowledge is greatly improved.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and the accompanying drawings.
It should be noted that in the drawings or description, the same drawing reference numerals are used for similar or identical parts. Features of the embodiments illustrated in the description may be freely combined to form new embodiments without conflict, and each claim may be individually referred to as an embodiment or features of the claims may be combined to form a new embodiment, and in the drawings, the shape or thickness of the embodiment may be enlarged and simplified or conveniently indicated. Further, elements or implementations not shown or described in the drawings are of a form known to those of ordinary skill in the art. Additionally, while exemplifications of parameters including particular values may be provided herein, it is to be understood that the parameters need not be exactly equal to the respective values, but may be approximated to the respective values within acceptable error margins or design constraints.
Unless a technical obstacle or contradiction exists, the above-described various embodiments of the present invention may be freely combined to form further embodiments, which are within the scope of the present invention.
Although the present invention has been described in connection with the accompanying drawings, the embodiments disclosed in the drawings are intended to be illustrative of preferred embodiments of the present invention and should not be construed as limiting the invention. The dimensional proportions in the figures are merely schematic and are not to be understood as limiting the invention.
Although a few embodiments of the present general inventive concept have been shown and described, it would be appreciated by those skilled in the art that changes may be made in these embodiments without departing from the principles and spirit of the general inventive concept, the scope of which is defined in the claims and their equivalents.
Fig. 1 schematically shows a flow chart of a knowledge base system construction method based on a multi-source knowledge acquisition technology according to an embodiment of the present invention, and as shown in fig. 1, the method includes:
s101, crawling off-source knowledge from a first data source by using a distributed crawler technology, and acquiring expert knowledge from a second data source through a structured questionnaire.
According to an embodiment of the invention, the target knowledge mainly comprises two parts, such as network open source data respectively, including for example military forums, military weapons databases and related communities spontaneously organized by experts, such as military enthusiasts, researchers, knowledge summarized by experience. Then, for example, a distributed crawler technology based on a distributed cloud architecture may be used to obtain open source knowledge, where a bottom architecture of the distributed cloud architecture is constructed by replacing a virtual machine with a Docker container cloud cluster, and the container cloud technology can implement rapid deployment, operation and maintenance, and flexible expansion of a crawler service. For example, a master-slave distributed crawler model can be used for providing URL distribution service to obtain open source knowledge, webpage structured data extraction based on multiple modes is achieved, the building process is distributed to slave nodes, pressure of master nodes is relieved, and load balancing is achieved.
According to the embodiment of the invention, an open source target knowledge acquisition technology of a distributed cloud architecture is adopted, automatic distributed deployment and operation are applied to a large-scale data acquisition and analysis system, and the method has the characteristics of simple visual operation and elastic expansion of resources such as calculation, storage and the like. Distributed task distribution adopts a master-slave (generator-slave) distributed crawler model to provide distribution service of URLs. As shown in fig. 2, in order to ensure load balance among the fetch service nodes, a consistent hashing method is proposed to distribute tasks, the fetch service is responsible for downloading a segment URL of a hash ring, and when a certain service node is abnormal, it is responsible for distributing the task on the address segment to the next service node found clockwise. The consistent hashing algorithm satisfies the balance, monotonicity, dispersion and load balance in the distributed system. As shown in FIG. 3, the crawler network uses MD5 deduplication tree to index URL, so that the MD5 deduplication value provided by the deduplication software vendor can be directly constructed into a consistent hash value when URL service distribution is performed, thereby combining deduplication and distribution.
According to embodiments of the present invention, expert-oriented acquisition of, for example, military knowledge requires support of structured forms. The questionnaire follows certain rationality principles enabling the collection and partial collation of, for example, military expert knowledge. In solving the problem of expert knowledge collection, a head-to-brain writing method and an electronic head-to-brain storm method are adopted. The brain writing and photo method is to write ideas on paper media such as cards and to transmit and share ideas among members, and typically includes an acquiescent intelligence motivation method and a card intelligence motivation method; the electronic brain storm rule utilizes the network to communicate, and group members can not only view opinions of other members in time, but also release own opinions without interference in the network environment. The method adopts a Delphi method and a nominal population method for the arrangement of expert knowledge. The Delphi method is an expert questionnaire consultation method and is mainly characterized by anonymity and multi-round circulating feedback. The key to the implementation of the delphi method is to assign the questionnaire in a way that is not ambiguous, the question of inquiry should not be too many at a time, and to sort out the similarities of the collected opinions and count the number of the various similar opinions. Nominal group methods refer to decisions that restrict the discussion or interpersonal communication of group members during the decision-making process to ensure that group members generate independent thoughts.
According to the embodiment of the invention, for example, the construction of the aviation target knowledge base system comprises knowledge acquisition, wherein the knowledge acquisition is divided into two acquisition modes of public knowledge acquisition and expert knowledge acquisition, and the public knowledge acquisition mainly comprises the steps of crawling open-source knowledge such as airplane models, airplane structural characteristics, airplane carrying number and the like from some aviation network stations. Expert knowledge is obtained through a structured questionnaire, and the questioning content of the questionnaire comprises professional knowledge such as airplane performance indexes, airplane advantages and defects and the like.
And S102, unifying terms and concepts in the open source knowledge and the expert knowledge, unifying data formats and obtaining standardized knowledge.
According to an embodiment of the invention, in an airborne target, for example, the concept of the attributes of the target is divided into common attributes and target-specific attributes, the common attributes being attributes common to all airborne targets mainly from open source knowledge acquisition, the target-specific attributes being such as: the aircraft range, the aircraft performance indexes and the like comprise open source knowledge and expert knowledge, the open source knowledge and the expert knowledge are combined together through the specific attributes of the target, and the formats are unified.
According to the embodiment of the invention, knowledge data cleaning is carried out on the acquired target related knowledge, the collected data are in a uniform format, wrongly written characters are deleted, and the like.
S103, classifying the standardized knowledge according to grades, creating examples according to professional fields and/or important terms of the standardized knowledge, constructing a field ontology of the knowledge base system, and constructing parameter templates according to the field ontology to obtain the knowledge base system.
According to the embodiment of the invention, the method for constructing the domain ontology is based on a Prot g e (ontology editing and knowledge acquisition software developed by a Stanford university medical college biological information research center based on Java language) ontology construction tool. A total of 7 steps are involved, hence the name seven-step method, which is described below with the aerial target as an example:
determining the professional field and the category of the aviation target knowledge ontology, such as the civil aviation target field and the like;
the possibility of reusing the existing aviation knowledge body is examined, so that the workload of acquiring knowledge can be reduced;
listing important terms in the body, such as takeoff weight, maximum range, and the like;
defining individuals (referring to objects we are interested in a domain), airborne target individuals such as airbus a 320;
class of definitions (Class) (a collection of individuals), such as propeller aircraft, jet aircraft;
defining attributes of classes (which refers to a binary relation connecting different classes), such as the attributes of the propeller plane and the 2-blade propeller plane, as an inclusion relation;
an instance is created.
According to the embodiment of the invention, after the domain ontology is built, the parameter template of the knowledge base system needs to be further built, for example, the parameter template comprises parameter grouping, parameter name, parameter type and parameter remark. Grouping parameters: for grouping parameters; parameter name: and identifying parameters, wherein the same parameter name should not appear in the same template. The parameter types are as follows: a parameter type is defined. The parameter types include "rich text", "picture", "document", "three-dimensional model", "general file", and "field template", and the like. The domain ontology and the parameter template jointly form a knowledge base system framework, and standardized knowledge classified according to levels is input to obtain the knowledge base system.
According to the embodiment of the invention, in the construction embodiment of the air target knowledge base system, the air target is taken as a top-level concept class, the propeller plane, the jet plane, the piston plane and the like are taken as a second-level class, and the 2-blade propeller plane, the 3-blade propeller plane, the turbojet jet plane, the turbofan jet plane, the light piston plane, the ultra-light piston plane and the like are taken as three-level classes to be subdivided downwards step by step.
According to the embodiment of the invention, after the knowledge base system is constructed, the knowledge base system is required to be maintained, and the maintenance comprises parameter information description and/or parameter adding and/or parameter modifying and/or parameter hiding and/or parameter sequence adjusting and/or parameter template applying, and the creation and/or editing and/or deletion and/or hierarchical management of classification, for example, the classification maintenance, data organization and management, knowledge management and knowledge service of expert knowledge are included.
According to the embodiment of the invention, the classification maintenance technology provides a function of classification management of the file class knowledge information, and supports the creation, editing, deletion and hierarchical management of the classification. The adding classification comprises adding root classification and adding sub classification, and the upper classification of the adding sub classification needs to be selected before the adding sub classification. And editing the classification name as required, and designating a required parameter template to finish the classification addition and entry. When the classification has entered data, modification of the attribute template for the classification will not be allowed. For useless and low versions of the classification template, manual deletion is required.
In summary, the embodiment of the present invention provides a knowledge base system construction method based on a multi-source knowledge acquisition technology. Acquiring open source knowledge and expert knowledge by respectively utilizing a distributed crawler technology and a structured questionnaire; unifying terms and concepts in open source knowledge and expert knowledge, and unifying data formats to obtain standardized knowledge; the standardized knowledge is classified according to levels, examples are created according to professional fields and/or important terms of the standardized knowledge, a field ontology of the knowledge base system is built, parameter templates are built according to the field ontology, and the knowledge base system based on the multisource knowledge acquisition technology is built. By the method, automatic conclusion of knowledge can be realized, unified collection, unified management, unified verification and standardized construction are carried out on multi-source target knowledge including open source knowledge and expert knowledge, third-party output of knowledge is facilitated, exchange of knowledge is facilitated, and availability of knowledge is greatly improved.
Fig. 4 is a block diagram schematically illustrating a knowledge base system based on a multi-source knowledge acquisition technology according to an embodiment of the present invention, and as shown in fig. 4, the knowledge base system 400 includes: an open source knowledge acquisition module 410, an expert knowledge acquisition module 420, a knowledge standardization module 430, and a knowledge base construction and maintenance module 440. The knowledge base system may perform the methods described above with reference to the method embodiments, and will not be described further herein.
Specifically, the open-source knowledge collection module 410 is configured to crawl open-source knowledge from a first data source using distributed crawler technology.
An expert knowledge acquisition module 420 for obtaining expert knowledge from a second data source via a structured questionnaire.
And the knowledge standardization module 430 is used for unifying terms and concepts in the open source knowledge and the expert knowledge and unifying data formats to obtain standardized knowledge.
The knowledge base construction and maintenance module 440 is used for classifying the standardized knowledge in stages, creating examples according to professional fields and/or important terms of the standardized knowledge, constructing a field ontology of the knowledge base system, constructing parameter templates according to the field ontology, obtaining the knowledge base system, and maintaining the knowledge base system.
It should be noted that the embodiments of the apparatus portion and the method portion are similar to each other, and the achieved technical effects are also similar to each other, which are not described herein again.
Any of the modules according to embodiments of the present disclosure, or at least part of the functionality of any of them, may be implemented in one module. Any one or more of the modules according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules according to the embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in any other reasonable manner of hardware or firmware by integrating or packaging the circuit, or in any one of three implementations, or in any suitable combination of any of the software, hardware, and firmware. Alternatively, one or more of the modules according to embodiments of the disclosure may be implemented at least partly as computer program modules which, when executed, may perform corresponding functions.
For example, any of the open source knowledge acquisition module 410, the expert knowledge acquisition module 420, the knowledge normalization module 430, and the knowledge base construction and maintenance module 440 may be combined into one module for implementation, or any one of the modules may be split into multiple modules. Alternatively, at least some of the functions of one or more of these modules may be implemented at least partially as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or any other reasonable manner of integrating or packaging a circuit, or any one of or any suitable combination of software, hardware, and firmware, as other modules, such as the open source knowledge acquisition module 410, the expert knowledge acquisition module 420, the knowledge normalization module 430, and the knowledge base construction and maintenance module 440. Alternatively, at least one of the open source knowledge collection module 410, the expert knowledge collection module 420, the knowledge normalization module 430, and the knowledge base construction and maintenance module 440 may be implemented at least in part as a computer program module that, when executed, may perform corresponding functions.
Fig. 5 schematically shows a block diagram of an electronic device according to an embodiment of the invention. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 5, the electronic device 500 includes a processor 510, a computer-readable storage medium 520. The electronic device 500 may perform a method according to an embodiment of the present disclosure.
In particular, processor 510 may include, for example, a general purpose microprocessor, an instruction set processor and/or related chip set and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), and/or the like. The processor 510 may also include on-board memory for caching purposes. Processor 510 may be a single processing unit or a plurality of processing units for performing different actions of a method flow according to embodiments of the disclosure.
Computer-readable storage media 520, for example, may be non-volatile computer-readable storage media, specific examples including, but not limited to: magnetic storage devices, such as magnetic tape or Hard Disk Drives (HDDs); optical storage devices, such as compact disks (CD-ROMs); a memory, such as a Random Access Memory (RAM) or a flash memory; and so on.
The computer-readable storage medium 520 may include a computer program 521, which computer program 521 may include code/computer-executable instructions that, when executed by the processor 510, cause the processor 510 to perform a method according to an embodiment of the disclosure, or any variation thereof.
The computer program 521 may be configured with, for example, computer program code comprising computer program modules. For example, in an example embodiment, code in computer program 521 may include one or more program modules, including for example 521A, modules 521B, … …. It should be noted that the division and number of modules are not fixed, and those skilled in the art may use suitable program modules or program module combinations according to actual situations, and when these program modules are executed by the processor 510, the processor 510 may execute the method according to the embodiment of the present disclosure or any variation thereof.
According to an embodiment of the present disclosure, at least one of the open-source knowledge collection module 410, the expert knowledge collection module 420, the knowledge normalization module 430, and the knowledge base construction and maintenance module 440 may be implemented as a computer program module described with reference to fig. 5, which when executed by the processor 510, may perform the corresponding operations described above.
The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.