CN112328812B - Domain knowledge extraction method and system based on self-adjusting parameters and electronic equipment - Google Patents

Domain knowledge extraction method and system based on self-adjusting parameters and electronic equipment Download PDF

Info

Publication number
CN112328812B
CN112328812B CN202110006928.0A CN202110006928A CN112328812B CN 112328812 B CN112328812 B CN 112328812B CN 202110006928 A CN202110006928 A CN 202110006928A CN 112328812 B CN112328812 B CN 112328812B
Authority
CN
China
Prior art keywords
data
domain
model
training
knowledge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110006928.0A
Other languages
Chinese (zh)
Other versions
CN112328812A (en
Inventor
姚苗
查琳
冶莎
张晨
周智海
王芳杰
覃晨
黄庆娇
王振宇
陈刚
何青松
向波
杨志勤
邢尚合
周凡吟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Business Big Data Technology Co Ltd
Original Assignee
Chengdu Business Big Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Business Big Data Technology Co Ltd filed Critical Chengdu Business Big Data Technology Co Ltd
Priority to CN202110006928.0A priority Critical patent/CN112328812B/en
Publication of CN112328812A publication Critical patent/CN112328812A/en
Application granted granted Critical
Publication of CN112328812B publication Critical patent/CN112328812B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a field knowledge extraction method and system based on self-adjusting parameters and electronic equipment, comprising the following steps: constructing a domain ontology knowledge base through the collected domain data, wherein the domain ontology knowledge base comprises a domain entity base, a domain relation base and a domain attribute base; vectorizing the constructed domain ontology knowledge base, and then taking the vectorized domain ontology knowledge base as the input of a pre-training model to train the pre-training model to obtain a knowledge extraction model; acquiring adjustable parameters in a knowledge extraction model, adjusting the adjustable parameters according to business data, preprocessing the business data, and inputting the preprocessed business data and the adjusted adjustable parameters into the knowledge extraction model to obtain an extraction result of the business data; the extraction result is a three-tuple list of < entity, relationship, attribute >. The scheme can complete a knowledge base, uniformly extract entities, relations and attributes, and quickly respond to different business requirements.

Description

Domain knowledge extraction method and system based on self-adjusting parameters and electronic equipment
Technical Field
The invention relates to the technical field of knowledge maps, in particular to a method and a system for extracting domain knowledge based on self-adjusting parameters and electronic equipment.
Background
Knowledge map (Knowledge Graph) is a series of different graphs displaying Knowledge development process and structure relationship in the book intelligence field, describing Knowledge resources and carriers thereof by using visualization technology, mining, analyzing, constructing, drawing and displaying Knowledge and mutual relation between Knowledge resources and Knowledge carriers. The knowledge extraction is a preposed step of knowledge graph construction, the quantity and quality of knowledge extraction results directly influence the quality of the generated knowledge graph, and the knowledge extraction is an essential important link particularly for the mapping of unstructured data.
The content of the knowledge extraction comprises entity extraction, relation extraction and attribute extraction. In the industrial field, the application of knowledge extraction mainly focuses on entity extraction, for example, in the field of text data, the knowledge extraction is mainly applied to a named body identification technology for identifying a person name, an organization name and the like; in the field of image data, the method is mainly applied to OCR technology and combines a template to generate a knowledge graph. And the relation extraction and the attribute extraction are mainly based on the corresponding rules generated by the field to extract the knowledge.
However, the current knowledge extraction has the following problems:
first, the public knowledge base is difficult to complete
The accuracy of knowledge extraction has strong dependence on the integrity and the standard of a knowledge base, and an open-source public knowledge base only collects general concepts, so that the knowledge in a specific service field is incomplete, and the open-source knowledge base has no way of being directly used in an actual service scene.
Second, knowledge extraction and cleavage
The entity extraction, the relation extraction and the attribute extraction are divided into different subtasks and need to be executed in sequence, at present, no complete knowledge extraction model can simultaneously meet the requirements of the entity extraction, the relation extraction and the attribute extraction, but the entity extraction, the relation extraction and the attribute extraction are dependent and associated in a knowledge graph, and the task division inevitably causes local information loss.
Thirdly, it is difficult to respond to changes quickly
The existing knowledge extraction model is difficult to multiplex, retraining, verifying and testing are required to be carried out according to specific business data when knowledge extraction is carried out every time, the construction period of the knowledge extraction model is long, and rapid change of business requirements is difficult to deal with.
Disclosure of Invention
The invention aims to solve the three problems, namely a first complete domain ontology knowledge base, a second complete domain ontology knowledge base, and a third complete domain ontology knowledge base, and can uniformly extract entities, relations and attributes, and quickly respond to different business requirements.
In order to achieve the purpose of the invention of the above three problems, the embodiment of the invention provides the following technical solutions:
the field knowledge extraction method based on the self-adjusting parameters is characterized by comprising the following steps: the method comprises the following steps:
preprocessing collected domain data according to an open source knowledge base, and constructing a domain ontology knowledge base through the preprocessed domain data, wherein the domain ontology knowledge base comprises a domain ontology, and the domain ontology comprises a domain entity base, a domain relation base and a domain attribute base;
vectorizing the constructed domain ontology knowledge base, and then taking the vectorized domain ontology knowledge base as the input of a pre-training model to train the pre-training model to obtain a knowledge extraction model;
acquiring adjustable parameters in a knowledge extraction model, adjusting the adjustable parameters according to business data, preprocessing the business data, and inputting the preprocessed business data and the adjusted adjustable parameters into the knowledge extraction model to obtain an extraction result of the business data; the extraction result is a three-tuple list of < entity, relationship, attribute >.
In the scheme, the method can be built in any platform based on knowledge extraction, the domain ontology knowledge base is built according to the open source knowledge base, and semantic features represented by knowledge data are supplemented, so that the knowledge base rich in more semantic information is obtained, the problem of semantic sparseness existing when the existing open source knowledge base represents business data is solved, and the purpose of completing the domain ontology knowledge base is achieved. For different fields, a field ontology knowledge base is constructed, knowledge extraction models belonging to the field are trained, when the business data accessed to the field is extracted, only the adjustable parameters in the knowledge extraction model need to be changed, the extraction result can be obtained, the problem that the knowledge extraction model needs to be trained repeatedly aiming at the changed business data in the same field is solved, thereby causing the problem of computing resource waste, for example, after constructing a domain ontology knowledge base and a knowledge extraction model in the financial field, in the financial business scene, the business data in the same field, such as financial credit business data, financial fraud business data and the like, can obtain the extraction result of the current business data only by setting some adjustable parameters, and the entity, the relation and the attribute can be uniformly extracted, the entity, the relation and the attribute do not need to be divided into three modules for carrying out, and the integrity of the data is ensured. Meanwhile, the development and implementation period of the whole service data is greatly prolonged, different service data are responded quickly, the floor process of knowledge application is accelerated, the service line is better served, and the real value application of knowledge is realized.
The method comprises the following steps of preprocessing collected field data according to an open source knowledge base, and constructing a field ontology knowledge base through the preprocessed field data, wherein the steps comprise:
collecting domain data based on the domain keywords;
performing word segmentation and cleaning processing on the acquired field data by combining an open source knowledge base;
inputting the field data subjected to word segmentation and cleaning into a labeling model, and labeling the field data to obtain labeled data; the marked content comprises one or more pieces of domain data which are marked as entities, relations and attributes of the domain data, and unique ID is added to each piece of marked domain data;
and loading the domain data marked as the entities into the domain entity library, loading the domain data marked as the relations into the domain relation library, and loading the domain data marked as the attributes into the domain attribute library, thereby constructing a domain ontology knowledge base.
In the scheme, before the collected field data are labeled, the field data are subjected to word segmentation and cleaning processing by combining the open source knowledge base corresponding to the field keywords, so that the semantic information of the field data is enriched, and the problem of sparse semantics represented by the existing open source knowledge base is solved. The marked field data is one or a plurality of pieces of field data in the form of < entity, relation and attribute >, a unique ID is added to each piece of marked field data, and when the marked field data is called later, the unique ID is directly searched according to the unique ID, so that a piece of complete field data can be obtained, and the problem that data information is lost due to the fact that the entity, the relation and the attribute need to be extracted separately in the prior art is solved.
The marking model is a field data template of < entity, relation, attribute 1, attribute 2, > and attribute n >, wherein n is the number of attributes of the entity, and n is more than or equal to 1; or a field data template of < entity 1, entity 2,. > entity i, relationship, attribute 1, attribute 2, > attribute j >, wherein i is the number of entities, i is more than or equal to 2, j is the number of attributes of relationship, and j is more than or equal to 1.
In the scheme, the field data is labeled according to the template mode of the labeling model, one or a plurality of labeled field data can be obtained by the two templates, and each field data contains data information of entities, relations and attributes, so that the integrity of the data is ensured, and the problem of data information loss caused by the fact that the entities, the relations and the attributes need to be extracted separately in the prior art is solved.
The method comprises the following steps of combining an open source knowledge base, performing word segmentation and cleaning processing on collected field data, and comprises the following steps:
performing word segmentation processing on the collected field data by combining an open source knowledge base to obtain field data after word segmentation processing;
and cleaning the field data after word segmentation processing by using a public stop list, and filtering stop words to form the field data in a vocabulary table form.
In the scheme, the existing open source knowledge base is combined to perform word segmentation processing on the field data, so that semantic information of the field data can be enriched, and the problem of semantic sparseness expressed by the existing open source knowledge base is solved. The stop words in the domain data are filtered by using the public stop list, so that the domain data are more effective.
The step of training the pre-training model to obtain the knowledge extraction model after vectorizing the constructed domain ontology knowledge base as the input of the pre-training model comprises the following steps of:
inputting the constructed domain ontology knowledge base into a word2vec model to obtain vectorization expression of knowledge;
and taking the vectorization expression of the knowledge as the input of the pre-training model, carrying out iterative training on the pre-training model, and testing the pre-training model after the iterative training is completed to obtain the knowledge extraction model.
In the scheme, the constructed domain ontology knowledge base is complete knowledge, the knowledge is expressed in a vectorization mode, a pre-training model is input to carry out training and testing, and finally a universal knowledge extraction model in the domain is obtained.
The step of taking the vectorization representation of the knowledge as the input of the pre-training model, performing iterative training on the pre-training model, and testing the pre-training model after completing the iterative training to obtain the knowledge extraction model comprises the following steps:
taking the vectorization expression of knowledge as the input of a pre-training model based on the combination of a Bi-LSTM model, an Attention model and a CRF model, and carrying out iterative batch training on the pre-training model;
after one period of iterative training is finished, verifying the pre-training model, and optimizing weight parameters connected between neurons in the pre-training model by using a BP algorithm and an Adam optimizer; repeating the training and verifying process of the pre-training model until the iterative training is completed;
and testing the pre-training model after the iterative training through the accuracy, the recall rate and the F1 value so as to generate a knowledge extraction model.
In the scheme, the specific training process of the pre-training model is obtained, and finally the knowledge extraction model which can be commonly used in the field is obtained.
The adjustable parameters in the acquired knowledge extraction model comprise: batch size-batch _ size, learning rate-learn _ rate, number of Bi-LSTM layers-r _ layer, number of neurons per layer of Bi-LSTM-r _ nums, number of Attention layers-a _ layer.
In the scheme, the adjustable parameters are collected and displayed in a list form, and the adjustable parameters can be changed more quickly when the subsequent service data is input.
The step of preprocessing the service data includes:
performing word segmentation processing on the service data by combining the constructed domain ontology knowledge base to obtain the service data after the word segmentation processing;
and cleaning the service data after word segmentation processing by using a public stop table, and filtering stop words to obtain the preprocessed service data.
In the above scheme, since the constructed domain ontology knowledge base is already complete, the accessed business data can be preprocessed by word segmentation and cleaning only by using the domain ontology knowledge base, and finally, the preprocessed business data and the adjusted adjustable parameters are input into the knowledge extraction model, so that the extraction result of the business data can be obtained. When any different business data in the same field is input, the extraction result can be obtained only by changing the adjustable parameters, and the training of a knowledge extraction model is not required to be carried out on each different business data, so that the model can be reused, and the problem of computing resource waste is solved.
A domain knowledge extraction system based on self-adjusting parameters comprises:
the knowledge base construction system is used for preprocessing the acquired field data according to the open source knowledge base and constructing a field ontology knowledge base through the preprocessed field data; the constructed domain ontology knowledge base comprises a domain ontology, and the domain ontology comprises a domain entity base, a domain relation base and a domain attribute base;
the data input end of the extraction model training system is connected with the data output end of the knowledge base construction system, and the extraction model training system is used for vectorizing the constructed domain ontology knowledge base and then used as the input of a pre-training model to train the pre-training model to obtain a knowledge extraction model;
the data input end of the business application system is connected with the data output end of the extraction model training system and is used for acquiring adjustable parameters in the knowledge extraction model, adjusting the adjustable parameters according to business data, preprocessing the business data, and inputting the preprocessed business data and the adjusted adjustable parameters into the knowledge extraction model so as to obtain an extraction result of the business data; the extraction result is a three-tuple list of < entity, relationship, attribute >.
The knowledge base construction system comprises a data acquisition unit, a first preprocessing unit and a labeling unit, wherein,
the data acquisition unit is used for acquiring field data based on the field keywords;
the data input end of the first preprocessing unit is connected with the data output end of the data acquisition unit, and the first preprocessing unit is used for performing word segmentation and cleaning processing on acquired field data;
the data input end of the labeling unit is connected with the data output end of the first preprocessing unit, and the labeling unit is used for inputting field data subjected to word segmentation and cleaning into the labeling model to obtain labeling data; the marked content comprises one or more pieces of domain data which are marked as entities, relations and attributes of the domain data, and unique ID is added to each piece of marked domain data; loading the domain data marked as entities into the domain entity library, loading the domain data marked as relations into the domain relation library, and loading the domain data marked as attributes into the domain attribute library, thereby constructing a domain ontology knowledge base;
the marking model is a field data template of < entity, relation, attribute 1, attribute 2, > and attribute n >, wherein n is the number of attributes of the entity, and n is more than or equal to 1; or a field data template of < entity 1, entity 2,. > entity i, relationship, attribute 1, attribute 2, > attribute j >, wherein i is the number of entities, i is more than or equal to 2, j is the number of attributes of relationship, and j is more than or equal to 1.
The first preprocessing unit is used for carrying out word segmentation processing on the collected field data by combining an open source knowledge base when carrying out word segmentation processing on the collected field data so as to obtain the field data after word segmentation processing; when the first preprocessing unit cleans collected field data, the public stop list is used for cleaning the field data after word segmentation, and stop words are filtered to form the field data in a vocabulary form.
The extraction model training system comprises a vector processing unit, a training unit and a testing unit, wherein,
the vector processing unit is used for inputting the constructed domain ontology knowledge base into a word2vec model to obtain vectorization expression of knowledge;
the data input end of the training unit is connected with the data output end of the vector processing unit, and the training unit is used for taking the vectorization representation of knowledge as the input of a pre-training model and carrying out iterative training on the pre-training model;
and the data input end of the test unit is connected with the data output end of the training unit, and the test unit is used for testing the pre-training model after the iterative training is completed to obtain a knowledge extraction model.
When the training unit trains the pre-training model, the vectorization expression of knowledge is used as the input of the pre-training model based on the combination of the Bi-LSTM model, the Attention model and the CRF model, and the pre-training model is subjected to iterative batch training; after one period of iterative training is finished, verifying the pre-training model, and optimizing weight parameters connected between neurons in the pre-training model by using a BP algorithm and an Adam optimizer; and repeating the training and verifying processes of the pre-training model until the iterative training is completed.
When the testing unit tests the pre-training model which completes the iterative training, the testing unit tests the pre-training model which completes the iterative training through the accuracy, the recall rate and the F1 value, and therefore the knowledge extraction model is generated.
The business application system comprises a second preprocessing unit, a parameter adjusting unit and an extracting unit, wherein,
the data input end of the second preprocessing unit is accessed with service data and used for preprocessing the service data;
the data input end of the parameter adjusting unit is connected with the data output end of the second preprocessing unit, and the parameter adjusting unit is used for acquiring adjustable parameters in the knowledge extraction model and adjusting the adjustable parameters according to service data; the adjustable parameters include: batch size-batch _ size, learning rate-learn _ rate, number of Bi-LSTM layers-r _ layer, number of neurons of each layer of Bi-LSTM-r _ nums and number of Attention layers-a _ layer;
the data input end of the extraction unit is respectively connected with the data output ends of the second preprocessing unit and the parameter adjusting unit, and the extraction unit is used for inputting the preprocessed business data and the adjusted adjustable parameters into the knowledge extraction model so as to obtain the extraction result of the business data; the extraction result is a three-tuple list of < entity, relationship, attribute >.
When the second preprocessing unit preprocesses the accessed service data, the second preprocessing unit performs word segmentation processing on the service data by combining the established domain ontology knowledge base to obtain the service data after the word segmentation processing; and cleaning the service data after word segmentation processing by using a public stop table, and filtering stop words to obtain the preprocessed service data.
An electronic device, comprising:
a memory storing program instructions;
and the processor is connected with the memory and executes the program instructions in the memory to realize the steps of the self-adjusting parameter-based domain knowledge extraction method in any embodiment of the invention.
Compared with the prior art, the invention has the beneficial effects that:
(1) the method can be built on any platform based on knowledge extraction, the domain ontology knowledge base is built according to the open source knowledge base, semantic features represented by knowledge data are supplemented, the knowledge base rich in more semantic information is obtained, the problem that semantics are sparse when the existing open source knowledge base represents business data is solved, and the purpose of completing the domain ontology knowledge base is achieved.
(2) For different fields, a field ontology knowledge base is constructed, knowledge extraction models belonging to the field are trained, when the business data accessed to the field is extracted, only the adjustable parameters in the knowledge extraction model need to be changed, the extraction result can be obtained, the problem that the knowledge extraction model needs to be trained repeatedly aiming at the changed business data in the same field is solved, thereby causing the problem of computing resource waste, for example, after constructing a domain ontology knowledge base and a knowledge extraction model in the financial field, in the financial business scene, the business data in the same field, such as financial credit business data, financial fraud business data and the like, can obtain the extraction result of the current business data only by setting some adjustable parameters, and the entity, the relation and the attribute can be uniformly extracted, the entity, the relation and the attribute do not need to be divided into three modules for carrying out, and the integrity of the data is ensured.
(3) The scheme also improves the development implementation period of the whole service data to a great extent, quickly responds to different service data, accelerates the landing process of knowledge application, and better serves a service line, thereby realizing the real value application of knowledge.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
FIG. 1 is a flow chart of a knowledge extraction method of the present invention;
FIG. 2 is a schematic diagram illustrating training of a pre-training model according to an embodiment of the present invention;
FIG. 3 is a block diagram of the knowledge extraction system of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The invention is realized by the following technical scheme, as shown in figure 1, the method for extracting the domain knowledge based on the self-adjusting parameters comprises the following three steps:
step S1: and constructing a domain ontology knowledge base through the collected domain data.
The constructed domain ontology knowledge base comprises a domain entity base, a domain relation base and a domain attribute base.
Data acquisition sources corresponding to the field can be crawled through field keywords by using a crawler technology to serve as field data, for example, the data acquisition sources in the financial field include but are not limited to financial finance and finance news websites such as securities daily reports, blue whale finance and economics, network lenders, Chinese times reports and the like; as another example, data in the educational domain, data in the sports domain, data in the clothing domain, and the like may be collected.
And combining the open source knowledge base to perform word segmentation processing on the collected field data to obtain the field data after word segmentation processing. And then, the public stop list is utilized to clean the field data after the word segmentation processing, stop words are filtered, and the field data in the form of a vocabulary table is formed.
Inputting the field data subjected to word segmentation and cleaning into a labeling model, and labeling the field data to obtain labeled data; the marked content comprises one or more pieces of domain data which are marked as entities, relations and attributes, and unique ID is added to each piece of marked domain data. The template of the labeling model comprises two types:
1.< entity, relationship, attribute 1, attribute 2,. > attribute n >
< entity 1, entity 2,. > entity i, relationship, attribute 1, attribute 2, > entity j >
For example, a piece of financial data is labeled, and after labeling, a piece of domain data of < enterprise 1, enterprise 2, partnership, 12/month/4/2020, Chengdu > can be obtained, that is, a unique ID is added to the piece of domain data.
And then loading the domain data marked as entities into the domain entity library, loading the domain data marked as relations into the domain relation library, and loading the domain data marked as attributes into the domain attribute library, thereby constructing a domain ontology knowledge base.
Step S2: and vectorizing the constructed domain ontology knowledge base, and then taking the vectorized domain ontology knowledge base as the input of a pre-training model to train the pre-training model to obtain a knowledge extraction model.
Inputting the constructed domain ontology knowledge base into a word2vec model to obtain vectorization expression of knowledge, wherein adjustable parameters in the word2vec model comprise: word vector dimension-vec _ size.
Referring to fig. 2, a vectorized representation of knowledge is used as an input of a pre-training model based on a combination of a Bi-LSTM model, an Attention model and a CRF model, and the pre-training model is iteratively trained in batches; after one period of iterative training is finished, verifying the pre-training model, and optimizing weight parameters connected between neurons in the pre-training model by using a BP algorithm and an Adam optimizer; repeating the training and verifying process of the pre-training model until the iterative training is completed; and testing the pre-training model after the iterative training through the accuracy, the recall rate and the F1 value so as to generate a knowledge extraction model.
The adjustable parameters of the knowledge extraction model comprise: batch size-batch _ size, learning rate-learning _ rate, number of Bi-LSTM layers-r _ layer, number of neurons per Bi-LSTM layer-r _ nums, number of Attention layers-a _ layer, etc., which are collected and presented in a list.
Step S3: and acquiring adjustable parameters in the knowledge extraction model, adjusting the adjustable parameters according to the business data, preprocessing the business data, and inputting the preprocessed business data and the adjusted adjustable parameters into the knowledge extraction model to obtain an extraction result of the business data.
Before the knowledge extraction process of the business data, the business data needs to be preprocessed, and word segmentation processing is carried out on the business data by combining a constructed domain ontology knowledge base to obtain the business data after word segmentation processing; and cleaning the service data after word segmentation processing by using a public stop table, and filtering stop words to obtain the preprocessed service data.
The domain ontology knowledge base can optimize the word segmentation result to a certain extent, for example, if the domain entity base has an entity of 'listed company', if the domain entity base is not combined, the word segmentation result is 'listed' and 'company', and the 'listed company' can be divided into two words. Therefore, the analysis effect of the special words in the field can be obviously improved by combining the domain ontology knowledge base.
Because the business data in the same field are continuously updated and changed, but the business data have the common characteristics of the field, the quick response to the quick change of the business data can be realized by adjusting the adjustable parameters in the knowledge extraction model, and the construction process of the knowledge extraction is simplified.
For example, the service data 1, the service data 2, the service data n are output at present, the service data are preprocessed and then input into the knowledge extraction model, and then corresponding or required adjustable parameters in the adjustable parameter list are adjusted to meet the requirements of the current service data. And finally, outputting the extraction result in the form of a ternary group list of < entity, relation and attribute > from the knowledge extraction model.
The embodiment further provides a domain knowledge extraction system based on self-adjusting parameters, please refer to fig. 3, which includes:
the knowledge base construction system is used for preprocessing the acquired field data according to the open source knowledge base and constructing a field ontology knowledge base through the preprocessed field data; the constructed domain ontology knowledge base comprises a domain ontology, and the domain ontology comprises a domain entity base, a domain relation base and a domain attribute base;
the data input end of the extraction model training system is connected with the data output end of the knowledge base construction system, and the extraction model training system is used for vectorizing the constructed domain ontology knowledge base and then used as the input of a pre-training model to train the pre-training model to obtain a knowledge extraction model;
the data input end of the business application system is connected with the data output end of the extraction model training system and is used for acquiring adjustable parameters in the knowledge extraction model, adjusting the adjustable parameters according to business data, preprocessing the business data, and inputting the preprocessed business data and the adjusted adjustable parameters into the knowledge extraction model so as to obtain an extraction result of the business data; the extraction result is a three-tuple list of < entity, relationship, attribute >.
The knowledge base construction system comprises a data acquisition unit, a first preprocessing unit and a labeling unit, wherein,
the data acquisition unit is used for acquiring field data based on the field keywords;
the data input end of the first preprocessing unit is connected with the data output end of the data acquisition unit, and the first preprocessing unit is used for performing word segmentation and cleaning processing on acquired field data.
The first preprocessing unit is used for carrying out word segmentation processing on the collected field data by combining an open source knowledge base when carrying out word segmentation processing on the collected field data so as to obtain the field data after word segmentation processing; when the first preprocessing unit cleans collected field data, the public stop list is used for cleaning the field data after word segmentation, and stop words are filtered to form the field data in a vocabulary form.
The data input end of the labeling unit is connected with the data output end of the first preprocessing unit, and the labeling unit is used for inputting field data subjected to word segmentation and cleaning into the labeling model to obtain labeling data; the marked content comprises one or more pieces of domain data which are marked as entities, relations and attributes of the domain data, and unique ID is added to each piece of marked domain data; loading the domain data marked as entities into the domain entity library, loading the domain data marked as relations into the domain relation library, and loading the domain data marked as attributes into the domain attribute library, thereby constructing a domain ontology knowledge base;
the marking model is a field data template of < entity, relation, attribute 1, attribute 2, > and attribute n >, wherein n is the number of attributes of the entity, and n is more than or equal to 1; or a field data template of < entity 1, entity 2,. > entity i, relationship, attribute 1, attribute 2, > attribute j >, wherein i is the number of entities, i is more than or equal to 2, j is the number of attributes of relationship, and j is more than or equal to 1.
The extraction model training system comprises a vector processing unit, a training unit and a testing unit, wherein,
the vector processing unit is used for inputting the constructed domain ontology knowledge base into a word2vec model to obtain vectorization expression of knowledge;
the data input end of the training unit is connected with the data output end of the vector processing unit, and the training unit is used for performing iterative training on the pre-training model by taking the vectorization representation of knowledge as the input of the pre-training model.
When the training unit trains the pre-training model, the vectorization expression of knowledge is used as the input of the pre-training model based on the combination of the Bi-LSTM model, the Attention model and the CRF model, and the pre-training model is subjected to iterative batch training; after one period of iterative training is finished, verifying the pre-training model, and optimizing weight parameters connected between neurons in the pre-training model by using a BP algorithm and an Adam optimizer; and repeating the training and verifying processes of the pre-training model until the iterative training is completed.
And the data input end of the test unit is connected with the data output end of the training unit, and the test unit is used for testing the pre-training model after the iterative training is completed to obtain a knowledge extraction model.
When the testing unit tests the pre-training model which completes the iterative training, the testing unit tests the pre-training model which completes the iterative training through the accuracy, the recall rate and the F1 value, and therefore the knowledge extraction model is generated.
The business application system comprises a second preprocessing unit, a parameter adjusting unit and an extracting unit, wherein,
the data input end of the second preprocessing unit is accessed with the service data and used for preprocessing the service data.
When the second preprocessing unit preprocesses the accessed service data, the second preprocessing unit performs word segmentation processing on the service data by combining the established domain ontology knowledge base to obtain the service data after the word segmentation processing; and cleaning the service data after word segmentation processing by using a public stop table, and filtering stop words to obtain the preprocessed service data.
The data input end of the parameter adjusting unit is connected with the data output end of the second preprocessing unit, and the parameter adjusting unit is used for acquiring adjustable parameters in the knowledge extraction model and adjusting the adjustable parameters according to service data; the adjustable parameters include: batch size-batch _ size, learning rate-learn _ rate, number of Bi-LSTM layers-r _ layer, number of neurons of each layer of Bi-LSTM-r _ nums and number of Attention layers-a _ layer;
the data input end of the extraction unit is respectively connected with the data output ends of the second preprocessing unit and the parameter adjusting unit, and the extraction unit is used for inputting the preprocessed business data and the adjusted adjustable parameters into the knowledge extraction model so as to obtain the extraction result of the business data; the extraction result is a three-tuple list of < entity, relationship, attribute >.
Referring to fig. 4, the present embodiment also provides an electronic device, which may include a processor 71 and a memory 72, wherein the memory 72 is coupled to the processor 71. It is noted that this figure is exemplary and that other types of structures may be used in addition to or in place of this structure.
As shown in fig. 4, the electronic device may further include: an input unit 73, a display unit 74, and a power supply 75. It is to be noted that the electronic device does not necessarily have to comprise all the components shown in fig. 4. Furthermore, the electronic device may also comprise components not shown in fig. 4, reference being made to the prior art.
The processor 71, sometimes referred to as a controller or operational control, may comprise a microprocessor or other processor device and/or logic device, the processor 71 receiving input and controlling operation of the various components of the electronic device.
The memory 72 may be one or more of a buffer, a flash memory, a hard drive, a removable medium, a volatile memory, a non-volatile memory, or other suitable devices, and may store the configuration information of the processor 71, the instructions executed by the processor 71, the recorded table data, and other information. The processor 71 may execute programs stored in the memory 72 to implement information storage or processing, and the like. In one embodiment, memory 72 also includes a buffer memory, i.e., a buffer, to store intermediate information.
The input unit 73 is used, for example, to provide the processor 71 with data of the body or data owned by the data holder. The display unit 74 is used for displaying various results in the processing procedure, such as entities, relationships, attributes, etc. shown in the page, and may be, for example, an LCD display, but the present invention is not limited thereto. The power supply 75 is used to provide power to the electronic device.
Embodiments of the present invention further provide a computer readable instruction, where when the instruction is executed in an electronic device, the program causes the electronic device to execute the operation steps included in the method of the present invention.
Embodiments of the present invention further provide a storage medium storing computer-readable instructions, where the computer-readable instructions cause an electronic device to execute the operation steps included in the method of the present invention.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (17)

1. The field knowledge extraction method based on the self-adjusting parameters is characterized by comprising the following steps: the method comprises the following steps:
preprocessing collected domain data according to an open source knowledge base, and constructing a domain ontology knowledge base through the preprocessed domain data, wherein the domain ontology knowledge base comprises a domain ontology, and the domain ontology comprises a domain entity base, a domain relation base and a domain attribute base;
the method comprises the following steps of preprocessing collected field data according to an open source knowledge base, and constructing a field ontology knowledge base through the preprocessed field data, wherein the steps comprise:
inputting the field data subjected to word segmentation and cleaning into a labeling model, and labeling the field data to obtain labeled data; the marked content comprises one or more pieces of domain data which are marked as entities, relations and attributes of the domain data, and unique ID is added to each piece of marked domain data;
loading the domain data marked as entities into the domain entity library, loading the domain data marked as relations into the domain relation library, and loading the domain data marked as attributes into the domain attribute library, thereby constructing a domain ontology knowledge base;
vectorizing the constructed domain ontology knowledge base, and then taking the vectorized domain ontology knowledge base as the input of a pre-training model to train the pre-training model to obtain a knowledge extraction model;
acquiring adjustable parameters in a knowledge extraction model, adjusting the adjustable parameters according to business data, preprocessing the business data, and inputting the preprocessed business data and the adjusted adjustable parameters into the knowledge extraction model to obtain an extraction result of the business data; the extraction result is a three-tuple list of < entity, relationship, attribute >.
2. The self-adjusting parameter based domain knowledge extraction method as claimed in claim 1, wherein: the method comprises the following steps of preprocessing collected field data according to an open source knowledge base, and before the step of constructing a field ontology knowledge base through the preprocessed field data, further comprising the following steps:
collecting domain data based on the domain keywords;
and (4) combining an open source knowledge base to perform word segmentation and cleaning treatment on the collected field data.
3. The self-adjusting parameter based domain knowledge extraction method as claimed in claim 2, wherein: the marking model is a field data template of < entity, relation, attribute 1, attribute 2, > and attribute n >, wherein n is the number of attributes of the entity, and n is more than or equal to 1; or a field data template of < entity 1, entity 2,. > entity i, relationship, attribute 1, attribute 2, > attribute j >, wherein i is the number of entities, i is more than or equal to 2, j is the number of attributes of relationship, and j is more than or equal to 1.
4. The self-adjusting parameter based domain knowledge extraction method of claim 3, wherein: the method comprises the following steps of combining an open source knowledge base, performing word segmentation and cleaning processing on collected field data, and comprises the following steps:
performing word segmentation processing on the collected field data by combining an open source knowledge base to obtain field data after word segmentation processing;
and cleaning the field data after word segmentation processing by using a public stop list, and filtering stop words to form the field data in a vocabulary table form.
5. The self-adjusting parameter based domain knowledge extraction method as claimed in claim 1, wherein: the step of training the pre-training model to obtain the knowledge extraction model after vectorizing the constructed domain ontology knowledge base as the input of the pre-training model comprises the following steps of:
inputting the constructed domain ontology knowledge base into a word2vec model to obtain vectorization expression of knowledge;
and taking the vectorization expression of the knowledge as the input of the pre-training model, carrying out iterative training on the pre-training model, and testing the pre-training model after the iterative training is completed to obtain the knowledge extraction model.
6. The self-adjusting parameter based domain knowledge extraction method of claim 5, wherein: the step of taking the vectorization representation of the knowledge as the input of the pre-training model, performing iterative training on the pre-training model, and testing the pre-training model after completing the iterative training to obtain the knowledge extraction model comprises the following steps:
taking the vectorization expression of knowledge as the input of a pre-training model based on the combination of a Bi-LSTM model, an Attention model and a CRF model, and carrying out iterative batch training on the pre-training model;
after one period of iterative training is finished, verifying the pre-training model, and optimizing weight parameters connected between neurons in the pre-training model by using a BP algorithm and an Adam optimizer; repeating the training and verifying process of the pre-training model until the iterative training is completed;
and testing the pre-training model after the iterative training through the accuracy, the recall rate and the F1 value so as to generate a knowledge extraction model.
7. The self-adjusting parameter based domain knowledge extraction method as claimed in claim 1, wherein: the adjustable parameters in the acquired knowledge extraction model comprise: batch size-batch _ size, learning rate-learn _ rate, number of Bi-LSTM layers-r _ layer, number of neurons per layer of Bi-LSTM-r _ nums, number of Attention layers-a _ layer.
8. The self-adjusting parameter based domain knowledge extraction method as claimed in claim 1, wherein: the step of preprocessing the service data includes:
performing word segmentation processing on the service data by combining the constructed domain ontology knowledge base to obtain the service data after the word segmentation processing;
and cleaning the service data after word segmentation processing by using a public stop table, and filtering stop words to obtain the preprocessed service data.
9. The field knowledge extraction system based on self-adjusting parameters is characterized in that: the method comprises the following steps:
the knowledge base construction system is used for preprocessing the acquired field data according to the open source knowledge base and constructing a field ontology knowledge base through the preprocessed field data; the constructed domain ontology knowledge base comprises a domain ontology, and the domain ontology comprises a domain entity base, a domain relation base and a domain attribute base;
the knowledge base construction system comprises a labeling unit, wherein the labeling unit is used for inputting field data subjected to word segmentation and cleaning into a labeling model to obtain labeled data; the marked content comprises one or more pieces of domain data which are marked as entities, relations and attributes of the domain data, and unique ID is added to each piece of marked domain data; loading the domain data marked as entities into the domain entity library, loading the domain data marked as relations into the domain relation library, and loading the domain data marked as attributes into the domain attribute library, thereby constructing a domain ontology knowledge base;
the data input end of the extraction model training system is connected with the data output end of the knowledge base construction system, and the extraction model training system is used for vectorizing the constructed domain ontology knowledge base and then used as the input of a pre-training model to train the pre-training model to obtain a knowledge extraction model;
the data input end of the business application system is connected with the data output end of the extraction model training system and is used for acquiring adjustable parameters in the knowledge extraction model, adjusting the adjustable parameters according to business data, preprocessing the business data, and inputting the preprocessed business data and the adjusted adjustable parameters into the knowledge extraction model so as to obtain an extraction result of the business data; the extraction result is a three-tuple list of < entity, relationship, attribute >.
10. The self-tuning parameter based domain knowledge extraction system of claim 9, wherein: the knowledge base construction system also comprises a data acquisition unit and a first preprocessing unit, wherein,
the data acquisition unit is used for acquiring field data based on the field keywords;
the data input end of the first preprocessing unit is connected with the data output end of the data acquisition unit, the data output end of the first preprocessing unit is connected with the data input end of the labeling unit, and the first preprocessing unit is used for performing word segmentation and cleaning processing on acquired field data;
the marking model is a field data template of < entity, relation, attribute 1, attribute 2, > and attribute n >, wherein n is the number of attributes of the entity, and n is more than or equal to 1; or a field data template of < entity 1, entity 2,. > entity i, relationship, attribute 1, attribute 2, > attribute j >, wherein i is the number of entities, i is more than or equal to 2, j is the number of attributes of relationship, and j is more than or equal to 1.
11. The self-tuning parameter based domain knowledge extraction system of claim 10, wherein: the first preprocessing unit is used for carrying out word segmentation processing on the collected field data by combining an open source knowledge base when carrying out word segmentation processing on the collected field data so as to obtain the field data after word segmentation processing; when the first preprocessing unit cleans collected field data, the public stop list is used for cleaning the field data after word segmentation, and stop words are filtered to form the field data in a vocabulary form.
12. The self-tuning parameter based domain knowledge extraction system of claim 9, wherein: the extraction model training system comprises a vector processing unit, a training unit and a testing unit, wherein,
the vector processing unit is used for inputting the constructed domain ontology knowledge base into a word2vec model to obtain vectorization expression of knowledge;
the data input end of the training unit is connected with the data output end of the vector processing unit, and the training unit is used for taking the vectorization representation of knowledge as the input of a pre-training model and carrying out iterative training on the pre-training model;
and the data input end of the test unit is connected with the data output end of the training unit, and the test unit is used for testing the pre-training model after the iterative training is completed to obtain a knowledge extraction model.
13. The self-tuning parameter based domain knowledge extraction system of claim 12, wherein: when the training unit trains the pre-training model, the vectorization expression of knowledge is used as the input of the pre-training model based on the combination of the Bi-LSTM model, the Attention model and the CRF model, and the pre-training model is subjected to iterative batch training; after one period of iterative training is finished, verifying the pre-training model, and optimizing weight parameters connected between neurons in the pre-training model by using a BP algorithm and an Adam optimizer; and repeating the training and verifying processes of the pre-training model until the iterative training is completed.
14. The self-tuning parameter based domain knowledge extraction system of claim 12, wherein: when the testing unit tests the pre-training model which completes the iterative training, the testing unit tests the pre-training model which completes the iterative training through the accuracy, the recall rate and the F1 value, and therefore the knowledge extraction model is generated.
15. The self-tuning parameter based domain knowledge extraction system of claim 9, wherein: the business application system comprises a second preprocessing unit, a parameter adjusting unit and an extracting unit, wherein,
the data input end of the second preprocessing unit is accessed with service data and used for preprocessing the service data;
the data input end of the parameter adjusting unit is connected with the data output end of the second preprocessing unit, and the parameter adjusting unit is used for acquiring adjustable parameters in the knowledge extraction model and adjusting the adjustable parameters according to service data; the adjustable parameters include: batch size-batch _ size, learning rate-learn _ rate, number of Bi-LSTM layers-r _ layer, number of neurons of each layer of Bi-LSTM-r _ nums and number of Attention layers-a _ layer;
the data input end of the extraction unit is respectively connected with the data output ends of the second preprocessing unit and the parameter adjusting unit, and the extraction unit is used for inputting the preprocessed business data and the adjusted adjustable parameters into the knowledge extraction model so as to obtain the extraction result of the business data; the extraction result is a three-tuple list of < entity, relationship, attribute >.
16. The self-tuning parameter based domain knowledge extraction system of claim 15, wherein: when the second preprocessing unit preprocesses the accessed service data, the second preprocessing unit performs word segmentation processing on the service data by combining the established domain ontology knowledge base to obtain the service data after the word segmentation processing; and cleaning the service data after word segmentation processing by using a public stop table, and filtering stop words to obtain the preprocessed service data.
17. An electronic device, comprising:
a memory storing program instructions;
a processor, connected to the memory, for executing the program instructions in the memory to implement the steps of the method for extracting domain knowledge based on self-adjusting parameters as claimed in any one of claims 1 to 8.
CN202110006928.0A 2021-01-05 2021-01-05 Domain knowledge extraction method and system based on self-adjusting parameters and electronic equipment Active CN112328812B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110006928.0A CN112328812B (en) 2021-01-05 2021-01-05 Domain knowledge extraction method and system based on self-adjusting parameters and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110006928.0A CN112328812B (en) 2021-01-05 2021-01-05 Domain knowledge extraction method and system based on self-adjusting parameters and electronic equipment

Publications (2)

Publication Number Publication Date
CN112328812A CN112328812A (en) 2021-02-05
CN112328812B true CN112328812B (en) 2021-03-26

Family

ID=74302237

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110006928.0A Active CN112328812B (en) 2021-01-05 2021-01-05 Domain knowledge extraction method and system based on self-adjusting parameters and electronic equipment

Country Status (1)

Country Link
CN (1) CN112328812B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI807400B (en) * 2021-08-27 2023-07-01 台達電子工業股份有限公司 Apparatus and method for generating an entity-relation extraction model

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108363716A (en) * 2017-12-28 2018-08-03 广州索答信息科技有限公司 Realm information method of generating classification model, sorting technique, equipment and storage medium
CN110287334A (en) * 2019-06-13 2019-09-27 淮阴工学院 A kind of school's domain knowledge map construction method based on Entity recognition and attribute extraction model
CN110334212A (en) * 2019-07-01 2019-10-15 南京审计大学 A kind of territoriality audit knowledge mapping construction method based on machine learning
WO2020007224A1 (en) * 2018-07-06 2020-01-09 中兴通讯股份有限公司 Knowledge graph construction and smart response method and apparatus, device, and storage medium
CN111143536A (en) * 2019-12-30 2020-05-12 腾讯科技(深圳)有限公司 Information extraction method based on artificial intelligence, storage medium and related device
CN111832307A (en) * 2020-07-09 2020-10-27 北京工业大学 Entity relationship extraction method and system based on knowledge enhancement

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250412B (en) * 2016-07-22 2019-04-23 浙江大学 Knowledge mapping construction method based on the fusion of multi-source entity
CN107967267A (en) * 2016-10-18 2018-04-27 中兴通讯股份有限公司 A kind of knowledge mapping construction method, apparatus and system
CN108984683B (en) * 2018-06-29 2021-06-25 北京百度网讯科技有限公司 Method, system, equipment and storage medium for extracting structured data
CN110750652A (en) * 2019-10-21 2020-02-04 广西大学 Story ending generation method combining context entity words and knowledge
CN111192680B (en) * 2019-12-25 2021-06-01 山东众阳健康科技集团有限公司 Intelligent auxiliary diagnosis method based on deep learning and collective classification

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108363716A (en) * 2017-12-28 2018-08-03 广州索答信息科技有限公司 Realm information method of generating classification model, sorting technique, equipment and storage medium
WO2020007224A1 (en) * 2018-07-06 2020-01-09 中兴通讯股份有限公司 Knowledge graph construction and smart response method and apparatus, device, and storage medium
CN110287334A (en) * 2019-06-13 2019-09-27 淮阴工学院 A kind of school's domain knowledge map construction method based on Entity recognition and attribute extraction model
CN110334212A (en) * 2019-07-01 2019-10-15 南京审计大学 A kind of territoriality audit knowledge mapping construction method based on machine learning
CN111143536A (en) * 2019-12-30 2020-05-12 腾讯科技(深圳)有限公司 Information extraction method based on artificial intelligence, storage medium and related device
CN111832307A (en) * 2020-07-09 2020-10-27 北京工业大学 Entity relationship extraction method and system based on knowledge enhancement

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
SeMi: A SEmantic Modeling machIne to build Knowledge Graphs with graph neural networks;Giuseppe Futia 等;《SoftwareX》;20200731;第12卷;1-10 *
大规模企业级知识图谱实践综述;王昊奋 等;《计算机工程》;20200715;第46卷(第7期);1-13 *
构建金融知识图谱的知识抽取服务的设计与实现;安磊;《中国优秀硕士学位论文全文数据库 信息科技辑》;20190715(第7期);I138-1505 *

Also Published As

Publication number Publication date
CN112328812A (en) 2021-02-05

Similar Documents

Publication Publication Date Title
CN109522556B (en) Intention recognition method and device
CN109190110A (en) A kind of training method of Named Entity Extraction Model, system and electronic equipment
CN107766371A (en) A kind of text message sorting technique and its device
CN109299258A (en) A kind of public sentiment event detecting method, device and equipment
CN109408821B (en) Corpus generation method and device, computing equipment and storage medium
CN107368521B (en) Knowledge recommendation method and system based on big data and deep learning
CN111767725A (en) Data processing method and device based on emotion polarity analysis model
CN107844558A (en) The determination method and relevant apparatus of a kind of classification information
CN107463935A (en) Application class methods and applications sorter
US9830533B2 (en) Analyzing and exploring images posted on social media
CN114647713A (en) Knowledge graph question-answering method, device and storage medium based on virtual confrontation
Kortum et al. Dissection of AI job advertisements: A text mining-based analysis of employee skills in the disciplines computer vision and natural language processing
CN112328812B (en) Domain knowledge extraction method and system based on self-adjusting parameters and electronic equipment
Hasanati et al. Implementation of support vector machine with lexicon based for sentimenT ANALYSIS ON TWITter
CN117520503A (en) Financial customer service dialogue generation method, device, equipment and medium based on LLM model
Putra et al. Document Classification using Naïve Bayes for Indonesian Translation of the Quran
CN115934899A (en) IT industry resume recommendation method and device, electronic equipment and storage medium
CN110750712A (en) Software security requirement recommendation method based on data driving
CN115757720A (en) Project information searching method, device, equipment and medium based on knowledge graph
CN117501283A (en) Text-to-question model system
CN112434126B (en) Information processing method, device, equipment and storage medium
Pinto et al. A Systematic Review of Facial Expression Detection Methods
CN113468176A (en) Information input method and device, electronic equipment and computer readable storage medium
CN112395855A (en) Comment-based evaluation method and device
CN109815313A (en) Personalization technology survey data processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant