CN112328812B

CN112328812B - Domain knowledge extraction method and system based on self-adjusting parameters and electronic equipment

Info

Publication number: CN112328812B
Application number: CN202110006928.0A
Authority: CN
Inventors: 姚苗; 查琳; 冶莎; 张晨; 周智海; 王芳杰; 覃晨; 黄庆娇; 王振宇; 陈刚; 何青松; 向波; 杨志勤; 邢尚合; 周凡吟
Original assignee: Chengdu Business Big Data Technology Co Ltd
Current assignee: Chengdu Business Big Data Technology Co Ltd
Priority date: 2021-01-05
Filing date: 2021-01-05
Publication date: 2021-03-26
Anticipated expiration: 2041-01-05
Also published as: CN112328812A

Abstract

The invention relates to a field knowledge extraction method and system based on self-adjusting parameters and electronic equipment, comprising the following steps: constructing a domain ontology knowledge base through the collected domain data, wherein the domain ontology knowledge base comprises a domain entity base, a domain relation base and a domain attribute base; vectorizing the constructed domain ontology knowledge base, and then taking the vectorized domain ontology knowledge base as the input of a pre-training model to train the pre-training model to obtain a knowledge extraction model; acquiring adjustable parameters in a knowledge extraction model, adjusting the adjustable parameters according to business data, preprocessing the business data, and inputting the preprocessed business data and the adjusted adjustable parameters into the knowledge extraction model to obtain an extraction result of the business data; the extraction result is a three-tuple list of < entity, relationship, attribute >. The scheme can complete a knowledge base, uniformly extract entities, relations and attributes, and quickly respond to different business requirements.

Description

Domain knowledge extraction method and system based on self-adjusting parameters and electronic equipment

Technical Field

The invention relates to the technical field of knowledge maps, in particular to a method and a system for extracting domain knowledge based on self-adjusting parameters and electronic equipment.

Background

Knowledge map (Knowledge Graph) is a series of different graphs displaying Knowledge development process and structure relationship in the book intelligence field, describing Knowledge resources and carriers thereof by using visualization technology, mining, analyzing, constructing, drawing and displaying Knowledge and mutual relation between Knowledge resources and Knowledge carriers. The knowledge extraction is a preposed step of knowledge graph construction, the quantity and quality of knowledge extraction results directly influence the quality of the generated knowledge graph, and the knowledge extraction is an essential important link particularly for the mapping of unstructured data.

The content of the knowledge extraction comprises entity extraction, relation extraction and attribute extraction. In the industrial field, the application of knowledge extraction mainly focuses on entity extraction, for example, in the field of text data, the knowledge extraction is mainly applied to a named body identification technology for identifying a person name, an organization name and the like; in the field of image data, the method is mainly applied to OCR technology and combines a template to generate a knowledge graph. And the relation extraction and the attribute extraction are mainly based on the corresponding rules generated by the field to extract the knowledge.

However, the current knowledge extraction has the following problems:

first, the public knowledge base is difficult to complete

The accuracy of knowledge extraction has strong dependence on the integrity and the standard of a knowledge base, and an open-source public knowledge base only collects general concepts, so that the knowledge in a specific service field is incomplete, and the open-source knowledge base has no way of being directly used in an actual service scene.

Second, knowledge extraction and cleavage

The entity extraction, the relation extraction and the attribute extraction are divided into different subtasks and need to be executed in sequence, at present, no complete knowledge extraction model can simultaneously meet the requirements of the entity extraction, the relation extraction and the attribute extraction, but the entity extraction, the relation extraction and the attribute extraction are dependent and associated in a knowledge graph, and the task division inevitably causes local information loss.

Thirdly, it is difficult to respond to changes quickly

The existing knowledge extraction model is difficult to multiplex, retraining, verifying and testing are required to be carried out according to specific business data when knowledge extraction is carried out every time, the construction period of the knowledge extraction model is long, and rapid change of business requirements is difficult to deal with.

Disclosure of Invention

The invention aims to solve the three problems, namely a first complete domain ontology knowledge base, a second complete domain ontology knowledge base, and a third complete domain ontology knowledge base, and can uniformly extract entities, relations and attributes, and quickly respond to different business requirements.

In order to achieve the purpose of the invention of the above three problems, the embodiment of the invention provides the following technical solutions:

the field knowledge extraction method based on the self-adjusting parameters is characterized by comprising the following steps: the method comprises the following steps:

preprocessing collected domain data according to an open source knowledge base, and constructing a domain ontology knowledge base through the preprocessed domain data, wherein the domain ontology knowledge base comprises a domain ontology, and the domain ontology comprises a domain entity base, a domain relation base and a domain attribute base;

vectorizing the constructed domain ontology knowledge base, and then taking the vectorized domain ontology knowledge base as the input of a pre-training model to train the pre-training model to obtain a knowledge extraction model;

acquiring adjustable parameters in a knowledge extraction model, adjusting the adjustable parameters according to business data, preprocessing the business data, and inputting the preprocessed business data and the adjusted adjustable parameters into the knowledge extraction model to obtain an extraction result of the business data; the extraction result is a three-tuple list of < entity, relationship, attribute >.

In the scheme, the method can be built in any platform based on knowledge extraction, the domain ontology knowledge base is built according to the open source knowledge base, and semantic features represented by knowledge data are supplemented, so that the knowledge base rich in more semantic information is obtained, the problem of semantic sparseness existing when the existing open source knowledge base represents business data is solved, and the purpose of completing the domain ontology knowledge base is achieved. For different fields, a field ontology knowledge base is constructed, knowledge extraction models belonging to the field are trained, when the business data accessed to the field is extracted, only the adjustable parameters in the knowledge extraction model need to be changed, the extraction result can be obtained, the problem that the knowledge extraction model needs to be trained repeatedly aiming at the changed business data in the same field is solved, thereby causing the problem of computing resource waste, for example, after constructing a domain ontology knowledge base and a knowledge extraction model in the financial field, in the financial business scene, the business data in the same field, such as financial credit business data, financial fraud business data and the like, can obtain the extraction result of the current business data only by setting some adjustable parameters, and the entity, the relation and the attribute can be uniformly extracted, the entity, the relation and the attribute do not need to be divided into three modules for carrying out, and the integrity of the data is ensured. Meanwhile, the development and implementation period of the whole service data is greatly prolonged, different service data are responded quickly, the floor process of knowledge application is accelerated, the service line is better served, and the real value application of knowledge is realized.

The method comprises the following steps of preprocessing collected field data according to an open source knowledge base, and constructing a field ontology knowledge base through the preprocessed field data, wherein the steps comprise:

collecting domain data based on the domain keywords;

performing word segmentation and cleaning processing on the acquired field data by combining an open source knowledge base;

inputting the field data subjected to word segmentation and cleaning into a labeling model, and labeling the field data to obtain labeled data; the marked content comprises one or more pieces of domain data which are marked as entities, relations and attributes of the domain data, and unique ID is added to each piece of marked domain data;

and loading the domain data marked as the entities into the domain entity library, loading the domain data marked as the relations into the domain relation library, and loading the domain data marked as the attributes into the domain attribute library, thereby constructing a domain ontology knowledge base.

In the scheme, before the collected field data are labeled, the field data are subjected to word segmentation and cleaning processing by combining the open source knowledge base corresponding to the field keywords, so that the semantic information of the field data is enriched, and the problem of sparse semantics represented by the existing open source knowledge base is solved. The marked field data is one or a plurality of pieces of field data in the form of < entity, relation and attribute >, a unique ID is added to each piece of marked field data, and when the marked field data is called later, the unique ID is directly searched according to the unique ID, so that a piece of complete field data can be obtained, and the problem that data information is lost due to the fact that the entity, the relation and the attribute need to be extracted separately in the prior art is solved.

The marking model is a field data template of < entity, relation, attribute 1, attribute 2, > and attribute n >, wherein n is the number of attributes of the entity, and n is more than or equal to 1; or a field data template of < entity 1, entity 2,. > entity i, relationship, attribute 1, attribute 2, > attribute j >, wherein i is the number of entities, i is more than or equal to 2, j is the number of attributes of relationship, and j is more than or equal to 1.

In the scheme, the field data is labeled according to the template mode of the labeling model, one or a plurality of labeled field data can be obtained by the two templates, and each field data contains data information of entities, relations and attributes, so that the integrity of the data is ensured, and the problem of data information loss caused by the fact that the entities, the relations and the attributes need to be extracted separately in the prior art is solved.

The method comprises the following steps of combining an open source knowledge base, performing word segmentation and cleaning processing on collected field data, and comprises the following steps:

performing word segmentation processing on the collected field data by combining an open source knowledge base to obtain field data after word segmentation processing;

and cleaning the field data after word segmentation processing by using a public stop list, and filtering stop words to form the field data in a vocabulary table form.

In the scheme, the existing open source knowledge base is combined to perform word segmentation processing on the field data, so that semantic information of the field data can be enriched, and the problem of semantic sparseness expressed by the existing open source knowledge base is solved. The stop words in the domain data are filtered by using the public stop list, so that the domain data are more effective.

The step of training the pre-training model to obtain the knowledge extraction model after vectorizing the constructed domain ontology knowledge base as the input of the pre-training model comprises the following steps of:

inputting the constructed domain ontology knowledge base into a word2vec model to obtain vectorization expression of knowledge;

and taking the vectorization expression of the knowledge as the input of the pre-training model, carrying out iterative training on the pre-training model, and testing the pre-training model after the iterative training is completed to obtain the knowledge extraction model.

In the scheme, the constructed domain ontology knowledge base is complete knowledge, the knowledge is expressed in a vectorization mode, a pre-training model is input to carry out training and testing, and finally a universal knowledge extraction model in the domain is obtained.

The step of taking the vectorization representation of the knowledge as the input of the pre-training model, performing iterative training on the pre-training model, and testing the pre-training model after completing the iterative training to obtain the knowledge extraction model comprises the following steps:

taking the vectorization expression of knowledge as the input of a pre-training model based on the combination of a Bi-LSTM model, an Attention model and a CRF model, and carrying out iterative batch training on the pre-training model;

after one period of iterative training is finished, verifying the pre-training model, and optimizing weight parameters connected between neurons in the pre-training model by using a BP algorithm and an Adam optimizer; repeating the training and verifying process of the pre-training model until the iterative training is completed;

and testing the pre-training model after the iterative training through the accuracy, the recall rate and the F1 value so as to generate a knowledge extraction model.

In the scheme, the specific training process of the pre-training model is obtained, and finally the knowledge extraction model which can be commonly used in the field is obtained.

The adjustable parameters in the acquired knowledge extraction model comprise: batch size-batch _ size, learning rate-learn _ rate, number of Bi-LSTM layers-r _ layer, number of neurons per layer of Bi-LSTM-r _ nums, number of Attention layers-a _ layer.

In the scheme, the adjustable parameters are collected and displayed in a list form, and the adjustable parameters can be changed more quickly when the subsequent service data is input.

The step of preprocessing the service data includes:

performing word segmentation processing on the service data by combining the constructed domain ontology knowledge base to obtain the service data after the word segmentation processing;

and cleaning the service data after word segmentation processing by using a public stop table, and filtering stop words to obtain the preprocessed service data.

In the above scheme, since the constructed domain ontology knowledge base is already complete, the accessed business data can be preprocessed by word segmentation and cleaning only by using the domain ontology knowledge base, and finally, the preprocessed business data and the adjusted adjustable parameters are input into the knowledge extraction model, so that the extraction result of the business data can be obtained. When any different business data in the same field is input, the extraction result can be obtained only by changing the adjustable parameters, and the training of a knowledge extraction model is not required to be carried out on each different business data, so that the model can be reused, and the problem of computing resource waste is solved.

A domain knowledge extraction system based on self-adjusting parameters comprises:

the knowledge base construction system is used for preprocessing the acquired field data according to the open source knowledge base and constructing a field ontology knowledge base through the preprocessed field data; the constructed domain ontology knowledge base comprises a domain ontology, and the domain ontology comprises a domain entity base, a domain relation base and a domain attribute base;

the data input end of the extraction model training system is connected with the data output end of the knowledge base construction system, and the extraction model training system is used for vectorizing the constructed domain ontology knowledge base and then used as the input of a pre-training model to train the pre-training model to obtain a knowledge extraction model;

the data input end of the business application system is connected with the data output end of the extraction model training system and is used for acquiring adjustable parameters in the knowledge extraction model, adjusting the adjustable parameters according to business data, preprocessing the business data, and inputting the preprocessed business data and the adjusted adjustable parameters into the knowledge extraction model so as to obtain an extraction result of the business data; the extraction result is a three-tuple list of < entity, relationship, attribute >.

The knowledge base construction system comprises a data acquisition unit, a first preprocessing unit and a labeling unit, wherein,

the data acquisition unit is used for acquiring field data based on the field keywords;

the data input end of the first preprocessing unit is connected with the data output end of the data acquisition unit, and the first preprocessing unit is used for performing word segmentation and cleaning processing on acquired field data;

the data input end of the labeling unit is connected with the data output end of the first preprocessing unit, and the labeling unit is used for inputting field data subjected to word segmentation and cleaning into the labeling model to obtain labeling data; the marked content comprises one or more pieces of domain data which are marked as entities, relations and attributes of the domain data, and unique ID is added to each piece of marked domain data; loading the domain data marked as entities into the domain entity library, loading the domain data marked as relations into the domain relation library, and loading the domain data marked as attributes into the domain attribute library, thereby constructing a domain ontology knowledge base;

The first preprocessing unit is used for carrying out word segmentation processing on the collected field data by combining an open source knowledge base when carrying out word segmentation processing on the collected field data so as to obtain the field data after word segmentation processing; when the first preprocessing unit cleans collected field data, the public stop list is used for cleaning the field data after word segmentation, and stop words are filtered to form the field data in a vocabulary form.

The extraction model training system comprises a vector processing unit, a training unit and a testing unit, wherein,

the vector processing unit is used for inputting the constructed domain ontology knowledge base into a word2vec model to obtain vectorization expression of knowledge;

the data input end of the training unit is connected with the data output end of the vector processing unit, and the training unit is used for taking the vectorization representation of knowledge as the input of a pre-training model and carrying out iterative training on the pre-training model;

and the data input end of the test unit is connected with the data output end of the training unit, and the test unit is used for testing the pre-training model after the iterative training is completed to obtain a knowledge extraction model.

When the training unit trains the pre-training model, the vectorization expression of knowledge is used as the input of the pre-training model based on the combination of the Bi-LSTM model, the Attention model and the CRF model, and the pre-training model is subjected to iterative batch training; after one period of iterative training is finished, verifying the pre-training model, and optimizing weight parameters connected between neurons in the pre-training model by using a BP algorithm and an Adam optimizer; and repeating the training and verifying processes of the pre-training model until the iterative training is completed.

When the testing unit tests the pre-training model which completes the iterative training, the testing unit tests the pre-training model which completes the iterative training through the accuracy, the recall rate and the F1 value, and therefore the knowledge extraction model is generated.

The business application system comprises a second preprocessing unit, a parameter adjusting unit and an extracting unit, wherein,

the data input end of the second preprocessing unit is accessed with service data and used for preprocessing the service data;

the data input end of the parameter adjusting unit is connected with the data output end of the second preprocessing unit, and the parameter adjusting unit is used for acquiring adjustable parameters in the knowledge extraction model and adjusting the adjustable parameters according to service data; the adjustable parameters include: batch size-batch _ size, learning rate-learn _ rate, number of Bi-LSTM layers-r _ layer, number of neurons of each layer of Bi-LSTM-r _ nums and number of Attention layers-a _ layer;

the data input end of the extraction unit is respectively connected with the data output ends of the second preprocessing unit and the parameter adjusting unit, and the extraction unit is used for inputting the preprocessed business data and the adjusted adjustable parameters into the knowledge extraction model so as to obtain the extraction result of the business data; the extraction result is a three-tuple list of < entity, relationship, attribute >.

When the second preprocessing unit preprocesses the accessed service data, the second preprocessing unit performs word segmentation processing on the service data by combining the established domain ontology knowledge base to obtain the service data after the word segmentation processing; and cleaning the service data after word segmentation processing by using a public stop table, and filtering stop words to obtain the preprocessed service data.

An electronic device, comprising:

a memory storing program instructions;

and the processor is connected with the memory and executes the program instructions in the memory to realize the steps of the self-adjusting parameter-based domain knowledge extraction method in any embodiment of the invention.

Compared with the prior art, the invention has the beneficial effects that:

(1) the method can be built on any platform based on knowledge extraction, the domain ontology knowledge base is built according to the open source knowledge base, semantic features represented by knowledge data are supplemented, the knowledge base rich in more semantic information is obtained, the problem that semantics are sparse when the existing open source knowledge base represents business data is solved, and the purpose of completing the domain ontology knowledge base is achieved.

(2) For different fields, a field ontology knowledge base is constructed, knowledge extraction models belonging to the field are trained, when the business data accessed to the field is extracted, only the adjustable parameters in the knowledge extraction model need to be changed, the extraction result can be obtained, the problem that the knowledge extraction model needs to be trained repeatedly aiming at the changed business data in the same field is solved, thereby causing the problem of computing resource waste, for example, after constructing a domain ontology knowledge base and a knowledge extraction model in the financial field, in the financial business scene, the business data in the same field, such as financial credit business data, financial fraud business data and the like, can obtain the extraction result of the current business data only by setting some adjustable parameters, and the entity, the relation and the attribute can be uniformly extracted, the entity, the relation and the attribute do not need to be divided into three modules for carrying out, and the integrity of the data is ensured.

(3) The scheme also improves the development implementation period of the whole service data to a great extent, quickly responds to different service data, accelerates the landing process of knowledge application, and better serves a service line, thereby realizing the real value application of knowledge.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

FIG. 1 is a flow chart of a knowledge extraction method of the present invention;

FIG. 2 is a schematic diagram illustrating training of a pre-training model according to an embodiment of the present invention;

FIG. 3 is a block diagram of the knowledge extraction system of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

The invention is realized by the following technical scheme, as shown in figure 1, the method for extracting the domain knowledge based on the self-adjusting parameters comprises the following three steps:

step S1: and constructing a domain ontology knowledge base through the collected domain data.

The constructed domain ontology knowledge base comprises a domain entity base, a domain relation base and a domain attribute base.

Data acquisition sources corresponding to the field can be crawled through field keywords by using a crawler technology to serve as field data, for example, the data acquisition sources in the financial field include but are not limited to financial finance and finance news websites such as securities daily reports, blue whale finance and economics, network lenders, Chinese times reports and the like; as another example, data in the educational domain, data in the sports domain, data in the clothing domain, and the like may be collected.

And combining the open source knowledge base to perform word segmentation processing on the collected field data to obtain the field data after word segmentation processing. And then, the public stop list is utilized to clean the field data after the word segmentation processing, stop words are filtered, and the field data in the form of a vocabulary table is formed.

Inputting the field data subjected to word segmentation and cleaning into a labeling model, and labeling the field data to obtain labeled data; the marked content comprises one or more pieces of domain data which are marked as entities, relations and attributes, and unique ID is added to each piece of marked domain data. The template of the labeling model comprises two types:

1.< entity, relationship, attribute 1, attribute 2,. > attribute n >

< entity 1, entity 2,. > entity i, relationship, attribute 1, attribute 2, > entity j >

For example, a piece of financial data is labeled, and after labeling, a piece of domain data of < enterprise 1, enterprise 2, partnership, 12/month/4/2020, Chengdu > can be obtained, that is, a unique ID is added to the piece of domain data.

And then loading the domain data marked as entities into the domain entity library, loading the domain data marked as relations into the domain relation library, and loading the domain data marked as attributes into the domain attribute library, thereby constructing a domain ontology knowledge base.

Step S2: and vectorizing the constructed domain ontology knowledge base, and then taking the vectorized domain ontology knowledge base as the input of a pre-training model to train the pre-training model to obtain a knowledge extraction model.

Inputting the constructed domain ontology knowledge base into a word2vec model to obtain vectorization expression of knowledge, wherein adjustable parameters in the word2vec model comprise: word vector dimension-vec _ size.

Referring to fig. 2, a vectorized representation of knowledge is used as an input of a pre-training model based on a combination of a Bi-LSTM model, an Attention model and a CRF model, and the pre-training model is iteratively trained in batches; after one period of iterative training is finished, verifying the pre-training model, and optimizing weight parameters connected between neurons in the pre-training model by using a BP algorithm and an Adam optimizer; repeating the training and verifying process of the pre-training model until the iterative training is completed; and testing the pre-training model after the iterative training through the accuracy, the recall rate and the F1 value so as to generate a knowledge extraction model.

The adjustable parameters of the knowledge extraction model comprise: batch size-batch _ size, learning rate-learning _ rate, number of Bi-LSTM layers-r _ layer, number of neurons per Bi-LSTM layer-r _ nums, number of Attention layers-a _ layer, etc., which are collected and presented in a list.

Step S3: and acquiring adjustable parameters in the knowledge extraction model, adjusting the adjustable parameters according to the business data, preprocessing the business data, and inputting the preprocessed business data and the adjusted adjustable parameters into the knowledge extraction model to obtain an extraction result of the business data.

Before the knowledge extraction process of the business data, the business data needs to be preprocessed, and word segmentation processing is carried out on the business data by combining a constructed domain ontology knowledge base to obtain the business data after word segmentation processing; and cleaning the service data after word segmentation processing by using a public stop table, and filtering stop words to obtain the preprocessed service data.

The domain ontology knowledge base can optimize the word segmentation result to a certain extent, for example, if the domain entity base has an entity of 'listed company', if the domain entity base is not combined, the word segmentation result is 'listed' and 'company', and the 'listed company' can be divided into two words. Therefore, the analysis effect of the special words in the field can be obviously improved by combining the domain ontology knowledge base.

Because the business data in the same field are continuously updated and changed, but the business data have the common characteristics of the field, the quick response to the quick change of the business data can be realized by adjusting the adjustable parameters in the knowledge extraction model, and the construction process of the knowledge extraction is simplified.

For example, the service data 1, the service data 2, the service data n are output at present, the service data are preprocessed and then input into the knowledge extraction model, and then corresponding or required adjustable parameters in the adjustable parameter list are adjusted to meet the requirements of the current service data. And finally, outputting the extraction result in the form of a ternary group list of < entity, relation and attribute > from the knowledge extraction model.

The embodiment further provides a domain knowledge extraction system based on self-adjusting parameters, please refer to fig. 3, which includes:

the data input end of the first preprocessing unit is connected with the data output end of the data acquisition unit, and the first preprocessing unit is used for performing word segmentation and cleaning processing on acquired field data.

the data input end of the training unit is connected with the data output end of the vector processing unit, and the training unit is used for performing iterative training on the pre-training model by taking the vectorization representation of knowledge as the input of the pre-training model.

the data input end of the second preprocessing unit is accessed with the service data and used for preprocessing the service data.

Referring to fig. 4, the present embodiment also provides an electronic device, which may include a processor 71 and a memory 72, wherein the memory 72 is coupled to the processor 71. It is noted that this figure is exemplary and that other types of structures may be used in addition to or in place of this structure.

As shown in fig. 4, the electronic device may further include: an input unit 73, a display unit 74, and a power supply 75. It is to be noted that the electronic device does not necessarily have to comprise all the components shown in fig. 4. Furthermore, the electronic device may also comprise components not shown in fig. 4, reference being made to the prior art.

The processor 71, sometimes referred to as a controller or operational control, may comprise a microprocessor or other processor device and/or logic device, the processor 71 receiving input and controlling operation of the various components of the electronic device.

The memory 72 may be one or more of a buffer, a flash memory, a hard drive, a removable medium, a volatile memory, a non-volatile memory, or other suitable devices, and may store the configuration information of the processor 71, the instructions executed by the processor 71, the recorded table data, and other information. The processor 71 may execute programs stored in the memory 72 to implement information storage or processing, and the like. In one embodiment, memory 72 also includes a buffer memory, i.e., a buffer, to store intermediate information.

The input unit 73 is used, for example, to provide the processor 71 with data of the body or data owned by the data holder. The display unit 74 is used for displaying various results in the processing procedure, such as entities, relationships, attributes, etc. shown in the page, and may be, for example, an LCD display, but the present invention is not limited thereto. The power supply 75 is used to provide power to the electronic device.

Embodiments of the present invention further provide a computer readable instruction, where when the instruction is executed in an electronic device, the program causes the electronic device to execute the operation steps included in the method of the present invention.

Embodiments of the present invention further provide a storage medium storing computer-readable instructions, where the computer-readable instructions cause an electronic device to execute the operation steps included in the method of the present invention.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. The field knowledge extraction method based on the self-adjusting parameters is characterized by comprising the following steps: the method comprises the following steps:

loading the domain data marked as entities into the domain entity library, loading the domain data marked as relations into the domain relation library, and loading the domain data marked as attributes into the domain attribute library, thereby constructing a domain ontology knowledge base;

2. The self-adjusting parameter based domain knowledge extraction method as claimed in claim 1, wherein: the method comprises the following steps of preprocessing collected field data according to an open source knowledge base, and before the step of constructing a field ontology knowledge base through the preprocessed field data, further comprising the following steps:

collecting domain data based on the domain keywords;

and (4) combining an open source knowledge base to perform word segmentation and cleaning treatment on the collected field data.

3. The self-adjusting parameter based domain knowledge extraction method as claimed in claim 2, wherein: the marking model is a field data template of < entity, relation, attribute 1, attribute 2, > and attribute n >, wherein n is the number of attributes of the entity, and n is more than or equal to 1; or a field data template of < entity 1, entity 2,. > entity i, relationship, attribute 1, attribute 2, > attribute j >, wherein i is the number of entities, i is more than or equal to 2, j is the number of attributes of relationship, and j is more than or equal to 1.

4. The self-adjusting parameter based domain knowledge extraction method of claim 3, wherein: the method comprises the following steps of combining an open source knowledge base, performing word segmentation and cleaning processing on collected field data, and comprises the following steps:

5. The self-adjusting parameter based domain knowledge extraction method as claimed in claim 1, wherein: the step of training the pre-training model to obtain the knowledge extraction model after vectorizing the constructed domain ontology knowledge base as the input of the pre-training model comprises the following steps of:

6. The self-adjusting parameter based domain knowledge extraction method of claim 5, wherein: the step of taking the vectorization representation of the knowledge as the input of the pre-training model, performing iterative training on the pre-training model, and testing the pre-training model after completing the iterative training to obtain the knowledge extraction model comprises the following steps:

7. The self-adjusting parameter based domain knowledge extraction method as claimed in claim 1, wherein: the adjustable parameters in the acquired knowledge extraction model comprise: batch size-batch _ size, learning rate-learn _ rate, number of Bi-LSTM layers-r _ layer, number of neurons per layer of Bi-LSTM-r _ nums, number of Attention layers-a _ layer.

8. The self-adjusting parameter based domain knowledge extraction method as claimed in claim 1, wherein: the step of preprocessing the service data includes:

9. The field knowledge extraction system based on self-adjusting parameters is characterized in that: the method comprises the following steps:

the knowledge base construction system comprises a labeling unit, wherein the labeling unit is used for inputting field data subjected to word segmentation and cleaning into a labeling model to obtain labeled data; the marked content comprises one or more pieces of domain data which are marked as entities, relations and attributes of the domain data, and unique ID is added to each piece of marked domain data; loading the domain data marked as entities into the domain entity library, loading the domain data marked as relations into the domain relation library, and loading the domain data marked as attributes into the domain attribute library, thereby constructing a domain ontology knowledge base;

10. The self-tuning parameter based domain knowledge extraction system of claim 9, wherein: the knowledge base construction system also comprises a data acquisition unit and a first preprocessing unit, wherein,

the data input end of the first preprocessing unit is connected with the data output end of the data acquisition unit, the data output end of the first preprocessing unit is connected with the data input end of the labeling unit, and the first preprocessing unit is used for performing word segmentation and cleaning processing on acquired field data;

11. The self-tuning parameter based domain knowledge extraction system of claim 10, wherein: the first preprocessing unit is used for carrying out word segmentation processing on the collected field data by combining an open source knowledge base when carrying out word segmentation processing on the collected field data so as to obtain the field data after word segmentation processing; when the first preprocessing unit cleans collected field data, the public stop list is used for cleaning the field data after word segmentation, and stop words are filtered to form the field data in a vocabulary form.

12. The self-tuning parameter based domain knowledge extraction system of claim 9, wherein: the extraction model training system comprises a vector processing unit, a training unit and a testing unit, wherein,

13. The self-tuning parameter based domain knowledge extraction system of claim 12, wherein: when the training unit trains the pre-training model, the vectorization expression of knowledge is used as the input of the pre-training model based on the combination of the Bi-LSTM model, the Attention model and the CRF model, and the pre-training model is subjected to iterative batch training; after one period of iterative training is finished, verifying the pre-training model, and optimizing weight parameters connected between neurons in the pre-training model by using a BP algorithm and an Adam optimizer; and repeating the training and verifying processes of the pre-training model until the iterative training is completed.

14. The self-tuning parameter based domain knowledge extraction system of claim 12, wherein: when the testing unit tests the pre-training model which completes the iterative training, the testing unit tests the pre-training model which completes the iterative training through the accuracy, the recall rate and the F1 value, and therefore the knowledge extraction model is generated.

15. The self-tuning parameter based domain knowledge extraction system of claim 9, wherein: the business application system comprises a second preprocessing unit, a parameter adjusting unit and an extracting unit, wherein,

16. The self-tuning parameter based domain knowledge extraction system of claim 15, wherein: when the second preprocessing unit preprocesses the accessed service data, the second preprocessing unit performs word segmentation processing on the service data by combining the established domain ontology knowledge base to obtain the service data after the word segmentation processing; and cleaning the service data after word segmentation processing by using a public stop table, and filtering stop words to obtain the preprocessed service data.

17. An electronic device, comprising:

a memory storing program instructions;

a processor, connected to the memory, for executing the program instructions in the memory to implement the steps of the method for extracting domain knowledge based on self-adjusting parameters as claimed in any one of claims 1 to 8.