CN112667606A - Knowledge base system based on multi-source knowledge acquisition technology and construction method thereof - Google Patents

Knowledge base system based on multi-source knowledge acquisition technology and construction method thereof Download PDF

Info

Publication number
CN112667606A
CN112667606A CN202110059145.9A CN202110059145A CN112667606A CN 112667606 A CN112667606 A CN 112667606A CN 202110059145 A CN202110059145 A CN 202110059145A CN 112667606 A CN112667606 A CN 112667606A
Authority
CN
China
Prior art keywords
knowledge
source
base system
knowledge base
expert
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110059145.9A
Other languages
Chinese (zh)
Inventor
孙显
金力
李树超
张泽群
刘庆
刘康
张雅楠
张敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aerospace Information Research Institute of CAS
Original Assignee
Aerospace Information Research Institute of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aerospace Information Research Institute of CAS filed Critical Aerospace Information Research Institute of CAS
Priority to CN202110059145.9A priority Critical patent/CN112667606A/en
Publication of CN112667606A publication Critical patent/CN112667606A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a knowledge base system based on a multi-source knowledge acquisition technology and a construction method thereof. The method comprises the following steps: respectively acquiring open source knowledge and expert knowledge by using a distributed crawler technology and a structured questionnaire; unifying terms and concepts in open source knowledge and expert knowledge, and unifying data formats to obtain standardized knowledge; the standardized knowledge is classified according to levels, examples are created according to professional fields and/or important terms of the standardized knowledge, a field ontology of the knowledge base system is built, parameter templates are built according to the field ontology, and the knowledge base system based on the multisource knowledge acquisition technology is built. By the knowledge base system and the construction method thereof, automatic conclusion and summary of knowledge can be realized, unified collection, unified management, unified verification and standardized construction are carried out on multi-source target knowledge including open source knowledge and expert knowledge, third-party output of knowledge is facilitated, knowledge exchange is facilitated, and the availability of knowledge is greatly improved.

Description

Knowledge base system based on multi-source knowledge acquisition technology and construction method thereof
Technical Field
The invention relates to the technical field of system management, in particular to a knowledge base system based on a multi-source knowledge acquisition technology and a construction method thereof.
Background
When solving problems using artificial intelligence, it is not sufficient to use only procedural methods, but sometimes it is necessary to use explanatory methods and empirical knowledge accumulated in the past. Meanwhile, when the problem is solved, it is impractical to simply find an efficient search solution method without considering the increase of the search amount due to the infinite increase of the number of combinations. Therefore, compared with an efficient search method, knowledge shows unique importance, knowledge type artificial intelligence is started from a knowledge system, the knowledge system is a system for solving actual problems based on knowledge, and compared with simple retrieval and sequencing, the knowledge system can automatically summarize according to the knowledge, so that the core of the knowledge system is a knowledge base. Based on the background, a knowledge base system is necessary to be established for target knowledge.
The operation mode of the existing target knowledge base is mostly based on the mode of self operation of manufacturers or enthusiasts, and the knowledge submitted by contributors is audited and verified in a manual auditing mode. The potential safety hazards of excessive autonomy of community operation, lack of third-party verification and verification, low verification and verification efficiency, unreliability and the like exist. Meanwhile, the target knowledge belongs to the field of high specialization, and a lot of empirical knowledge is dispersed in each target specialist.
The types of knowledge files generated during the acquisition of the target knowledge are not uniform, the content formats are not uniform, the knowledge types are not uniform, the configuration is complex, and the data is complex. In addition, knowledge data is strong in heterogeneity, different data sources contain different knowledge representation forms, such as texts, tables and the like, and a uniform specification constraint is needed for the collection of knowledge.
The current mature target knowledge bases are not many, and the main reasons are that the target knowledge cannot be widely shared due to single source of the target knowledge, low efficiency of target knowledge auditing and verification and blockage of the target knowledge base.
Disclosure of Invention
Technical problem to be solved
Aiming at the technical problems in the prior art, the invention provides a knowledge base system based on a multi-source knowledge acquisition technology, a construction method thereof, electronic equipment and a storage medium, which are used for at least partially solving the technical problems.
(II) technical scheme
The invention provides a knowledge base system construction method based on a multi-source knowledge acquisition technology, which comprises the following steps: crawling off-source knowledge from a first data source using a distributed crawler technique; obtaining expert knowledge from a second data source via a structured questionnaire; unifying terms and concepts in open source knowledge and expert knowledge, and unifying data formats to obtain standardized knowledge; classifying the standardized knowledge according to grades, creating examples according to professional fields and/or important terms of the standardized knowledge, constructing a field ontology of the knowledge base system, and constructing a parameter template according to the field ontology to obtain the knowledge base system.
Optionally, the open-source knowledge is acquired by using a distributed crawler technology based on a distributed cloud architecture, wherein a bottom architecture of the distributed cloud architecture is constructed by using a Docker container cloud cluster.
Optionally, a master-slave distributed crawler model is used for providing a URL distribution service to acquire open source knowledge, so as to realize multi-mode-based webpage structured data extraction.
Optionally, the expert is assisted in collective discussion through a head-to-head photo method or an electronic head-to-head storm method, and second data is collected from a second data source; and (5) sorting the second data by a Delphi method or a nominal group method to obtain expert knowledge.
Optionally, the domain ontology is constructed using a Prot g ontology construction tool.
Optionally, the method further comprises: maintaining the knowledge base system, including parameter information description and/or adding parameters and/or modifying parameters and/or hiding parameters and/or adjusting parameter sequence and/or applying parameter templates, and creating and/or editing and/or deleting and/or managing hierarchy of classification.
Optionally, the first data source comprises a web page and/or a database and the second data source comprises a summary of the experience of the target knowledge expert.
The invention also provides a knowledge base system based on the multi-source knowledge acquisition technology, which comprises the following steps: the open-source knowledge acquisition module is used for crawling open-source knowledge from a first data source by utilizing a distributed crawler technology; an expert knowledge acquisition module for acquiring expert knowledge from a second data source via a structured questionnaire; the knowledge standardization module is used for unifying terms and concepts in open source knowledge and expert knowledge and unifying data formats to obtain standardized knowledge; and the knowledge base construction and maintenance module is used for classifying the standardized knowledge according to grades, creating examples according to professional fields and/or important terms of the standardized knowledge, constructing a field ontology of the knowledge base system, constructing a parameter template according to the field ontology, obtaining the knowledge base system and maintaining the knowledge base system.
The present invention also provides an electronic device comprising: one or more processors; and a memory for storing one or more programs, wherein when the one or more programs are executed by the one or more processors, the one or more processors implement the above-mentioned knowledge base system construction method based on the multi-source knowledge acquisition technology according to the embodiment of the present invention.
The invention also provides a computer-readable storage medium, which stores computer-executable instructions, and the instructions are used for implementing the knowledge base system construction method based on the multi-source knowledge acquisition technology according to the embodiment of the invention.
(III) advantageous effects
The invention provides a knowledge base system based on a multi-source knowledge acquisition technology and a construction method thereof, which respectively utilize a distributed crawler technology and a structured questionnaire to acquire open-source knowledge and expert knowledge; unifying terms and concepts in open source knowledge and expert knowledge, and unifying data formats to obtain standardized knowledge; the standardized knowledge is classified according to grades, examples are created according to professional fields and/or important terms of the standardized knowledge, a field ontology of the knowledge base system is built, parameter templates are built according to the field ontology, and the knowledge base system is built.
Based on the standardized processing of knowledge, the construction of a domain ontology and the construction and maintenance of a parameter template, the knowledge base system provided by the invention realizes the automatic summary of knowledge, and by uniformly collecting and managing multi-source target knowledge including open source knowledge and expert knowledge, unified verification and standardized construction are realized, so that the third-party output of knowledge is facilitated, the communication of knowledge is facilitated, and the availability of knowledge is greatly improved.
Drawings
FIG. 1 is a flow chart of a knowledge base system construction method based on multi-source knowledge acquisition technology according to an embodiment of the invention;
fig. 2 schematically shows an overall structure diagram of a cloud crawler architecture according to an embodiment of the present invention;
FIG. 3 schematically illustrates a task distribution policy architecture diagram of an embodiment of the invention;
FIG. 4 schematically illustrates a block diagram of a knowledge base system based on multi-source knowledge acquisition techniques in accordance with an embodiment of the invention;
fig. 5 schematically shows a block diagram of an electronic device according to an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and the accompanying drawings.
It should be noted that in the drawings or description, the same drawing reference numerals are used for similar or identical parts. Features of the embodiments illustrated in the description may be freely combined to form new embodiments without conflict, and each claim may be individually referred to as an embodiment or features of the claims may be combined to form a new embodiment, and in the drawings, the shape or thickness of the embodiment may be enlarged and simplified or conveniently indicated. Further, elements or implementations not shown or described in the drawings are of a form known to those of ordinary skill in the art. Additionally, while exemplifications of parameters including particular values may be provided herein, it is to be understood that the parameters need not be exactly equal to the respective values, but may be approximated to the respective values within acceptable error margins or design constraints.
Unless a technical obstacle or contradiction exists, the above-described various embodiments of the present invention may be freely combined to form further embodiments, which are within the scope of the present invention.
Although the present invention has been described in connection with the accompanying drawings, the embodiments disclosed in the drawings are intended to be illustrative of preferred embodiments of the present invention and should not be construed as limiting the invention. The dimensional proportions in the figures are merely schematic and are not to be understood as limiting the invention.
Although a few embodiments of the present general inventive concept have been shown and described, it would be appreciated by those skilled in the art that changes may be made in these embodiments without departing from the principles and spirit of the general inventive concept, the scope of which is defined in the claims and their equivalents.
Fig. 1 schematically shows a flow chart of a knowledge base system construction method based on a multi-source knowledge acquisition technology according to an embodiment of the present invention, and as shown in fig. 1, the method includes:
s101, crawling off-source knowledge from a first data source by using a distributed crawler technology, and acquiring expert knowledge from a second data source through a structured questionnaire.
According to an embodiment of the invention, the target knowledge mainly comprises two parts, such as network open source data respectively, including for example military forums, military weapons databases and related communities spontaneously organized by experts, such as military enthusiasts, researchers, knowledge summarized by experience. Then, for example, a distributed crawler technology based on a distributed cloud architecture may be used to obtain open source knowledge, where a bottom architecture of the distributed cloud architecture is constructed by replacing a virtual machine with a Docker container cloud cluster, and the container cloud technology can implement rapid deployment, operation and maintenance, and flexible expansion of a crawler service. For example, a master-slave distributed crawler model can be used for providing URL distribution service to obtain open source knowledge, webpage structured data extraction based on multiple modes is achieved, the building process is distributed to slave nodes, pressure of master nodes is relieved, and load balancing is achieved.
According to the embodiment of the invention, an open source target knowledge acquisition technology of a distributed cloud architecture is adopted, automatic distributed deployment and operation are applied to a large-scale data acquisition and analysis system, and the method has the characteristics of simple visual operation and elastic expansion of resources such as calculation, storage and the like. Distributed task distribution adopts a master-slave (generator-slave) distributed crawler model to provide distribution service of URLs. As shown in fig. 2, in order to ensure load balance among the fetch service nodes, a consistent hashing method is proposed to distribute tasks, the fetch service is responsible for downloading a segment URL of a hash ring, and when a certain service node is abnormal, it is responsible for distributing the task on the address segment to the next service node found clockwise. The consistent hashing algorithm satisfies the balance, monotonicity, dispersion and load balance in the distributed system. As shown in FIG. 3, the crawler network uses MD5 deduplication tree to index URL, so that the MD5 deduplication value provided by the deduplication software vendor can be directly constructed into a consistent hash value when URL service distribution is performed, thereby combining deduplication and distribution.
According to embodiments of the present invention, expert-oriented acquisition of, for example, military knowledge requires support of structured forms. The questionnaire follows certain rationality principles enabling the collection and partial collation of, for example, military expert knowledge. In solving the problem of expert knowledge collection, a head-to-brain writing method and an electronic head-to-brain storm method are adopted. The brain writing and photo method is to write ideas on paper media such as cards and to transmit and share ideas among members, and typically includes an acquiescent intelligence motivation method and a card intelligence motivation method; the electronic brain storm rule utilizes the network to communicate, and group members can not only view opinions of other members in time, but also release own opinions without interference in the network environment. The method adopts a Delphi method and a nominal population method for the arrangement of expert knowledge. The Delphi method is an expert questionnaire consultation method and is mainly characterized by anonymity and multi-round circulating feedback. The key to the implementation of the delphi method is to assign the questionnaire in a way that is not ambiguous, the question of inquiry should not be too many at a time, and to sort out the similarities of the collected opinions and count the number of the various similar opinions. Nominal group methods refer to decisions that restrict the discussion or interpersonal communication of group members during the decision-making process to ensure that group members generate independent thoughts.
According to the embodiment of the invention, for example, the construction of the aviation target knowledge base system comprises knowledge acquisition, wherein the knowledge acquisition is divided into two acquisition modes of public knowledge acquisition and expert knowledge acquisition, and the public knowledge acquisition mainly comprises the steps of crawling open-source knowledge such as airplane models, airplane structural characteristics, airplane carrying number and the like from some aviation network stations. Expert knowledge is obtained through a structured questionnaire, and the questioning content of the questionnaire comprises professional knowledge such as airplane performance indexes, airplane advantages and defects and the like.
And S102, unifying terms and concepts in the open source knowledge and the expert knowledge, unifying data formats and obtaining standardized knowledge.
According to an embodiment of the invention, in an airborne target, for example, the concept of the attributes of the target is divided into common attributes and target-specific attributes, the common attributes being attributes common to all airborne targets mainly from open source knowledge acquisition, the target-specific attributes being such as: the aircraft range, the aircraft performance indexes and the like comprise open source knowledge and expert knowledge, the open source knowledge and the expert knowledge are combined together through the specific attributes of the target, and the formats are unified.
According to the embodiment of the invention, knowledge data cleaning is carried out on the acquired target related knowledge, the collected data are in a uniform format, wrongly written characters are deleted, and the like.
S103, classifying the standardized knowledge according to grades, creating examples according to professional fields and/or important terms of the standardized knowledge, constructing a field ontology of the knowledge base system, and constructing parameter templates according to the field ontology to obtain the knowledge base system.
According to the embodiment of the invention, the method for constructing the domain ontology is based on a Prot g e (ontology editing and knowledge acquisition software developed by a Stanford university medical college biological information research center based on Java language) ontology construction tool. A total of 7 steps are involved, hence the name seven-step method, which is described below with the aerial target as an example:
determining the professional field and the category of the aviation target knowledge ontology, such as the civil aviation target field and the like;
the possibility of reusing the existing aviation knowledge body is examined, so that the workload of acquiring knowledge can be reduced;
listing important terms in the body, such as takeoff weight, maximum range, and the like;
defining individuals (referring to objects we are interested in a domain), airborne target individuals such as airbus a 320;
class of definitions (Class) (a collection of individuals), such as propeller aircraft, jet aircraft;
defining attributes of classes (which refers to a binary relation connecting different classes), such as the attributes of the propeller plane and the 2-blade propeller plane, as an inclusion relation;
an instance is created.
According to the embodiment of the invention, after the domain ontology is built, the parameter template of the knowledge base system needs to be further built, for example, the parameter template comprises parameter grouping, parameter name, parameter type and parameter remark. Grouping parameters: for grouping parameters; parameter name: and identifying parameters, wherein the same parameter name should not appear in the same template. The parameter types are as follows: a parameter type is defined. The parameter types include "rich text", "picture", "document", "three-dimensional model", "general file", and "field template", and the like. The domain ontology and the parameter template jointly form a knowledge base system framework, and standardized knowledge classified according to levels is input to obtain the knowledge base system.
According to the embodiment of the invention, in the construction embodiment of the air target knowledge base system, the air target is taken as a top-level concept class, the propeller plane, the jet plane, the piston plane and the like are taken as a second-level class, and the 2-blade propeller plane, the 3-blade propeller plane, the turbojet jet plane, the turbofan jet plane, the light piston plane, the ultra-light piston plane and the like are taken as three-level classes to be subdivided downwards step by step.
According to the embodiment of the invention, after the knowledge base system is constructed, the knowledge base system is required to be maintained, and the maintenance comprises parameter information description and/or parameter adding and/or parameter modifying and/or parameter hiding and/or parameter sequence adjusting and/or parameter template applying, and the creation and/or editing and/or deletion and/or hierarchical management of classification, for example, the classification maintenance, data organization and management, knowledge management and knowledge service of expert knowledge are included.
According to the embodiment of the invention, the classification maintenance technology provides a function of classification management of the file class knowledge information, and supports the creation, editing, deletion and hierarchical management of the classification. The adding classification comprises adding root classification and adding sub classification, and the upper classification of the adding sub classification needs to be selected before the adding sub classification. And editing the classification name as required, and designating a required parameter template to finish the classification addition and entry. When the classification has entered data, modification of the attribute template for the classification will not be allowed. For useless and low versions of the classification template, manual deletion is required.
In summary, the embodiment of the present invention provides a knowledge base system construction method based on a multi-source knowledge acquisition technology. Acquiring open source knowledge and expert knowledge by respectively utilizing a distributed crawler technology and a structured questionnaire; unifying terms and concepts in open source knowledge and expert knowledge, and unifying data formats to obtain standardized knowledge; the standardized knowledge is classified according to levels, examples are created according to professional fields and/or important terms of the standardized knowledge, a field ontology of the knowledge base system is built, parameter templates are built according to the field ontology, and the knowledge base system based on the multisource knowledge acquisition technology is built. By the method, automatic conclusion of knowledge can be realized, unified collection, unified management, unified verification and standardized construction are carried out on multi-source target knowledge including open source knowledge and expert knowledge, third-party output of knowledge is facilitated, exchange of knowledge is facilitated, and availability of knowledge is greatly improved.
Fig. 4 is a block diagram schematically illustrating a knowledge base system based on a multi-source knowledge acquisition technology according to an embodiment of the present invention, and as shown in fig. 4, the knowledge base system 400 includes: an open source knowledge acquisition module 410, an expert knowledge acquisition module 420, a knowledge standardization module 430, and a knowledge base construction and maintenance module 440. The knowledge base system may perform the methods described above with reference to the method embodiments, and will not be described further herein.
Specifically, the open-source knowledge collection module 410 is configured to crawl open-source knowledge from a first data source using distributed crawler technology.
An expert knowledge acquisition module 420 for obtaining expert knowledge from a second data source via a structured questionnaire.
And the knowledge standardization module 430 is used for unifying terms and concepts in the open source knowledge and the expert knowledge and unifying data formats to obtain standardized knowledge.
The knowledge base construction and maintenance module 440 is used for classifying the standardized knowledge in stages, creating examples according to professional fields and/or important terms of the standardized knowledge, constructing a field ontology of the knowledge base system, constructing parameter templates according to the field ontology, obtaining the knowledge base system, and maintaining the knowledge base system.
It should be noted that the embodiments of the apparatus portion and the method portion are similar to each other, and the achieved technical effects are also similar to each other, which are not described herein again.
Any of the modules according to embodiments of the present disclosure, or at least part of the functionality of any of them, may be implemented in one module. Any one or more of the modules according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules according to the embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in any other reasonable manner of hardware or firmware by integrating or packaging the circuit, or in any one of three implementations, or in any suitable combination of any of the software, hardware, and firmware. Alternatively, one or more of the modules according to embodiments of the disclosure may be implemented at least partly as computer program modules which, when executed, may perform corresponding functions.
For example, any of the open source knowledge acquisition module 410, the expert knowledge acquisition module 420, the knowledge normalization module 430, and the knowledge base construction and maintenance module 440 may be combined into one module for implementation, or any one of the modules may be split into multiple modules. Alternatively, at least some of the functions of one or more of these modules may be implemented at least partially as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or any other reasonable manner of integrating or packaging a circuit, or any one of or any suitable combination of software, hardware, and firmware, as other modules, such as the open source knowledge acquisition module 410, the expert knowledge acquisition module 420, the knowledge normalization module 430, and the knowledge base construction and maintenance module 440. Alternatively, at least one of the open source knowledge collection module 410, the expert knowledge collection module 420, the knowledge normalization module 430, and the knowledge base construction and maintenance module 440 may be implemented at least in part as a computer program module that, when executed, may perform corresponding functions.
Fig. 5 schematically shows a block diagram of an electronic device according to an embodiment of the invention. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 5, the electronic device 500 includes a processor 510, a computer-readable storage medium 520. The electronic device 500 may perform a method according to an embodiment of the present disclosure.
In particular, processor 510 may include, for example, a general purpose microprocessor, an instruction set processor and/or related chip set and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), and/or the like. The processor 510 may also include on-board memory for caching purposes. Processor 510 may be a single processing unit or a plurality of processing units for performing different actions of a method flow according to embodiments of the disclosure.
Computer-readable storage media 520, for example, may be non-volatile computer-readable storage media, specific examples including, but not limited to: magnetic storage devices, such as magnetic tape or Hard Disk Drives (HDDs); optical storage devices, such as compact disks (CD-ROMs); a memory, such as a Random Access Memory (RAM) or a flash memory; and so on.
The computer-readable storage medium 520 may include a computer program 521, which computer program 521 may include code/computer-executable instructions that, when executed by the processor 510, cause the processor 510 to perform a method according to an embodiment of the disclosure, or any variation thereof.
The computer program 521 may be configured with, for example, computer program code comprising computer program modules. For example, in an example embodiment, code in computer program 521 may include one or more program modules, including for example 521A, modules 521B, … …. It should be noted that the division and number of modules are not fixed, and those skilled in the art may use suitable program modules or program module combinations according to actual situations, and when these program modules are executed by the processor 510, the processor 510 may execute the method according to the embodiment of the present disclosure or any variation thereof.
According to an embodiment of the present disclosure, at least one of the open-source knowledge collection module 410, the expert knowledge collection module 420, the knowledge normalization module 430, and the knowledge base construction and maintenance module 440 may be implemented as a computer program module described with reference to fig. 5, which when executed by the processor 510, may perform the corresponding operations described above.
The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A knowledge base system construction method based on multi-source knowledge acquisition technology is characterized by comprising the following steps:
crawling off-source knowledge from a first data source using a distributed crawler technique;
obtaining expert knowledge from a second data source via a structured questionnaire;
unifying terms and concepts in the open source knowledge and the expert knowledge, and unifying data formats to obtain standardized knowledge;
classifying the standardized knowledge according to grades, creating examples according to professional fields and/or important terms of the standardized knowledge, constructing a field ontology of the knowledge base system, and constructing a parameter template according to the field ontology to obtain the knowledge base system.
2. The knowledge base system construction method based on the multi-source knowledge acquisition technology according to claim 1, characterized in that the open-source knowledge is acquired by a distributed crawler technology based on a distributed cloud architecture, wherein a bottom architecture of the distributed cloud architecture is constructed by a Docker container cloud cluster.
3. The knowledge base system construction method based on the multi-source knowledge acquisition technology according to claim 1, characterized in that a master-slave distributed crawler model is used for providing URL distribution service to obtain the open-source knowledge, so as to realize multi-mode-based webpage structured data extraction.
4. The knowledge base system construction method based on the multi-source knowledge acquisition technology according to claim 1, characterized in that the expert is assisted in collective discussion through a head-to-head photo method or an electronic head-to-head storm method, and second data are collected from the second data source;
and sorting the second data by a Delphi method or a nominal group method to obtain the expert knowledge.
5. The method for constructing the knowledge base system based on the multi-source knowledge acquisition technology according to claim 1, characterized in that the domain ontology is constructed by using a Prot g e ontology construction tool.
6. The knowledge base system construction method based on the multi-source knowledge acquisition technology according to claim 1, characterized in that the method further comprises:
and maintaining the knowledge base system, wherein the maintenance comprises parameter information description and/or parameter adding and/or parameter modifying and/or parameter hiding and/or parameter sequence adjusting and/or parameter template application, and classification creation and/or editing and/or deletion and/or hierarchical management.
7. The knowledge base system construction method based on the multi-source knowledge acquisition technology according to claim 1, wherein the first data source comprises a webpage and/or a database, and the second data source comprises experience summaries of target knowledge experts.
8. A knowledge base system based on multi-source knowledge acquisition technology, comprising:
an open-source knowledge acquisition module for crawling open-source knowledge from a first data source using a distributed crawler technology;
an expert knowledge acquisition module for acquiring the expert knowledge from a second data source via a structured questionnaire;
the knowledge standardization module is used for unifying terms and concepts in the open source knowledge and the expert knowledge and unifying data formats to obtain standardized knowledge;
and the knowledge base construction and maintenance module is used for classifying the standardized knowledge in grades, creating examples according to professional fields and/or important terms of the standardized knowledge, constructing a field ontology of the knowledge base system, constructing a parameter template according to the field ontology, obtaining the knowledge base system and maintaining the knowledge base system.
9. An electronic device, comprising:
one or more processors;
a memory for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of knowledge base system construction based on multi-source knowledge acquisition techniques of any of claims 1-7.
10. A computer-readable storage medium storing computer-executable instructions for implementing the method of knowledge base system construction based on multi-source knowledge acquisition techniques of any one of claims 1 to 7 when executed.
CN202110059145.9A 2021-01-15 2021-01-15 Knowledge base system based on multi-source knowledge acquisition technology and construction method thereof Pending CN112667606A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110059145.9A CN112667606A (en) 2021-01-15 2021-01-15 Knowledge base system based on multi-source knowledge acquisition technology and construction method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110059145.9A CN112667606A (en) 2021-01-15 2021-01-15 Knowledge base system based on multi-source knowledge acquisition technology and construction method thereof

Publications (1)

Publication Number Publication Date
CN112667606A true CN112667606A (en) 2021-04-16

Family

ID=75415463

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110059145.9A Pending CN112667606A (en) 2021-01-15 2021-01-15 Knowledge base system based on multi-source knowledge acquisition technology and construction method thereof

Country Status (1)

Country Link
CN (1) CN112667606A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113282731A (en) * 2021-06-09 2021-08-20 中国农业银行股份有限公司 Knowledge data maintenance method and device
CN117271795A (en) * 2023-09-20 2023-12-22 四川大学 Cross-domain migration knowledge management method and knowledge base system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012091541A1 (en) * 2010-12-28 2012-07-05 Mimos Berhad A semantic web constructor system and a method thereof
CN104182454A (en) * 2014-07-04 2014-12-03 重庆科技学院 Multi-source heterogeneous data semantic integration model constructed based on domain ontology and method
US20150134573A1 (en) * 2013-11-13 2015-05-14 International Business Machines Corporation Method for enhancing a mind map with different streams of information
CN105868381A (en) * 2016-04-06 2016-08-17 无锡中科富农物联科技有限公司 Knowledge base retrieval system for agricultural information service
CN109284394A (en) * 2018-09-12 2019-01-29 青岛大学 A method of Company Knowledge map is constructed from multi-source data integration visual angle
CN109460460A (en) * 2018-11-05 2019-03-12 国家计算机网络与信息安全管理中心 A kind of Methodologies for Building Domain Ontology towards intelligent use
CN109581981A (en) * 2018-12-06 2019-04-05 山东大学 A kind of data fusion system and its working method based on data assessment Yu system coordination module
CN110209589A (en) * 2019-06-05 2019-09-06 北京百度网讯科技有限公司 Knowledge base system test method, device, equipment and medium
CN110489395A (en) * 2019-07-27 2019-11-22 西南电子技术研究所(中国电子科技集团公司第十研究所) Automatically the method for multi-source heterogeneous data knowledge is obtained

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012091541A1 (en) * 2010-12-28 2012-07-05 Mimos Berhad A semantic web constructor system and a method thereof
US20150134573A1 (en) * 2013-11-13 2015-05-14 International Business Machines Corporation Method for enhancing a mind map with different streams of information
CN104182454A (en) * 2014-07-04 2014-12-03 重庆科技学院 Multi-source heterogeneous data semantic integration model constructed based on domain ontology and method
CN105868381A (en) * 2016-04-06 2016-08-17 无锡中科富农物联科技有限公司 Knowledge base retrieval system for agricultural information service
CN109284394A (en) * 2018-09-12 2019-01-29 青岛大学 A method of Company Knowledge map is constructed from multi-source data integration visual angle
CN109460460A (en) * 2018-11-05 2019-03-12 国家计算机网络与信息安全管理中心 A kind of Methodologies for Building Domain Ontology towards intelligent use
CN109581981A (en) * 2018-12-06 2019-04-05 山东大学 A kind of data fusion system and its working method based on data assessment Yu system coordination module
CN110209589A (en) * 2019-06-05 2019-09-06 北京百度网讯科技有限公司 Knowledge base system test method, device, equipment and medium
CN110489395A (en) * 2019-07-27 2019-11-22 西南电子技术研究所(中国电子科技集团公司第十研究所) Automatically the method for multi-source heterogeneous data knowledge is obtained

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
赖茹等: "基于众包的维吾尔语多源语义知识库构建研究", 《计算机应用与软件》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113282731A (en) * 2021-06-09 2021-08-20 中国农业银行股份有限公司 Knowledge data maintenance method and device
CN117271795A (en) * 2023-09-20 2023-12-22 四川大学 Cross-domain migration knowledge management method and knowledge base system

Similar Documents

Publication Publication Date Title
US10740396B2 (en) Representing enterprise data in a knowledge graph
CN106663101A (en) Ontology mapping method and apparatus
Breitman et al. Open government data in Brazil
DE102013216273A1 (en) Transformation of database table formats based on user data access patterns in a networked computing environment
US20140351241A1 (en) Identifying and invoking applications based on data in a knowledge graph
Sawadogo et al. Metadata management for textual documents in data lakes
Xiao et al. SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming
CN112667606A (en) Knowledge base system based on multi-source knowledge acquisition technology and construction method thereof
CN108776672A (en) Knowledge Management System based on SOLR
Chen et al. A big data analysis and application platform for civil aircraft health management
US11003640B2 (en) Mining of policy data source description based on file, storage and application meta-data
CN113626447B (en) Civil aviation data management platform and method
CN116992887A (en) Metadata data catalog processing method, device and processing equipment
DE112021005210T5 (en) Indexing metadata to manage information
Liu et al. A general multi-source data fusion framework
DE112021000621T5 (en) MULTIVALUE PRIMARY KEYS FOR MULTIPLE UNIQUE IDENTIFIERS OF ENTITIES
Engle et al. Evaluation Criteria for Selecting NoSQL Databases in a Single Box Environment
Fischer et al. Timely semantics: a study of a stream-based ranking system for entity relationships
CN108205564B (en) Knowledge system construction method and system
CN112488642B (en) Cloud file management method based on structured labels and taking object as core
CN113986545A (en) Method and device for associating user with role
Engle A Methodology for Evaluating Relational and NoSQL Databases for Small-Scale Storage and Retrieval
CN103106556A (en) Model automatic combination method and model automatic combination system based on artificial intelligent planning
de Souza Campos et al. Review and comparison of works on heterogeneous data and semantic analysis in big data
Maw et al. Efficient approach to database integration for an aerospace vehicle design and certification framework

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210416