CN115510246A - Electric power marketing knowledge completion method and system based on depth sequence model - Google Patents

Electric power marketing knowledge completion method and system based on depth sequence model Download PDF

Info

Publication number
CN115510246A
CN115510246A CN202211273644.9A CN202211273644A CN115510246A CN 115510246 A CN115510246 A CN 115510246A CN 202211273644 A CN202211273644 A CN 202211273644A CN 115510246 A CN115510246 A CN 115510246A
Authority
CN
China
Prior art keywords
knowledge
graph
entity
vector
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211273644.9A
Other languages
Chinese (zh)
Inventor
康雨萌
钱旭盛
俞阳
何玮
李雅超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Jiangsu Electric Power Co ltd Marketing Service Center
Original Assignee
State Grid Jiangsu Electric Power Co ltd Marketing Service Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Jiangsu Electric Power Co ltd Marketing Service Center filed Critical State Grid Jiangsu Electric Power Co ltd Marketing Service Center
Priority to CN202211273644.9A priority Critical patent/CN115510246A/en
Publication of CN115510246A publication Critical patent/CN115510246A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply

Abstract

A method and a system for complementing power marketing knowledge based on a depth sequence model are disclosed, wherein triples are obtained from an original knowledge map and are used as training corpora; acquiring a low-dimensional vector representation of the triplet by adopting a graph vector embedding technology; setting the sequence length to be 3 to construct a semantic unit by utilizing the triple characteristics in the knowledge graph; and adding entity relations for triple prediction. The method aims to predict missing parts in the triples, completes the knowledge graph and improves the availability of the graph and the capability of the model.

Description

Electric power marketing knowledge completion method and system based on depth sequence model
Technical Field
The invention belongs to the technical field of natural language processing in the technical field of computers, and particularly relates to a power marketing knowledge completion method and system based on a depth sequence model.
Background
With the continuous development of artificial intelligence technology, knowledge maps have been widely applied in various fields, such as recommendation systems, search engines, intelligent dialogue systems, etc., in the internet industry. The main research directions of the knowledge graph comprise knowledge acquisition, knowledge representation, a time sequence knowledge graph, knowledge application and the like. The invention mainly focuses on a subtask in knowledge acquisition, namely electric power marketing knowledge completion.
Knowledge graphs have been widely used in various tasks of natural language processing, but the lack of relationships among entities in knowledge graphs also causes many problems for practical applications. In the process of constructing the knowledge graph, a great amount of knowledge information is derived from documents and webpage information, and deviations are often generated in the process of extracting knowledge from the documents, and the deviations are from two aspects: (1) There is much noise information in the document, i.e. useless information, the generation of which may come from the knowledge extraction algorithm itself or may be related to the validity of the language words itself; (2) The document information quantity is limited, and all knowledge cannot be covered, especially common knowledge. All the above results in incomplete knowledge graph, so that the completion of the knowledge graph is increasingly important in constructing the knowledge graph.
The completion of the knowledge-graph is to predict the missing part in the triple, so that the knowledge-graph becomes more complete. Triples are not natural languages, they are complex structures modeled by describing relationships between entities in a fixed expression (h, r, t). Such short sequences may not be sufficient to provide sufficient context information for the prediction. Meanwhile, because the number of paths is huge, the construction of a valuable long sequence has high cost and difficulty; second, in triples, relationships and entities are two different types of elements that appear in a fixed order. It is likely inappropriate to treat them as elements of the same type.
The prior art CN110147450A "knowledge complementing method and device for knowledge graph" discloses a knowledge complementing method and device for knowledge graph, the method includes: determining a space vector corresponding to the entity and the relation; calculating a semantic relation according to the entities and the space vectors corresponding to the relation to obtain a new relation between the entities and complete the knowledge graph; randomly generating negative examples by using a generative countermeasure network, and training a first knowledge representation model by combining the derived fact triples; carrying out concept layering on the obtained fact triples, randomly selecting entity construction negative examples under the same sub-concept of the fact triples, and training a second knowledge representation model by combining the derived fact triples and adopting a maximum interval method; and taking the second knowledge representation model as the input of a discriminator of the first knowledge representation model, and optimizing the first knowledge representation model through a resistance generation type network to obtain a target knowledge representation model for knowledge completion.
The CN112148892A "knowledge complementing method and apparatus for dynamic knowledge graph" and computer equipment in the prior art disclose a knowledge complementing method for dynamic knowledge graph, which includes: acquiring structure information and attribute information of an entity in a dynamic knowledge graph in the field of network space, and fusing the entity through a hyper-parameter according to the structure information and the attribute information to obtain incremental representation of the entity in the dynamic knowledge graph; obtaining a sample head entity vector representation according to the incremental representation and the corresponding edge, and obtaining a sample tail entity vector representation of a sample tail entity; determining a relational representation of the sample entity vector representation and the sample tail entity vector representation; and outputting the speculative tail entity according to the relation representation and the input query head entity.
The CN113742488A, "embedded knowledge graph completion method and apparatus based on multitask learning", discloses an embedded knowledge graph completion method based on multitask learning, which inputs any object entity and corresponding relationship entity in the knowledge graph to be completed into an entity embedded representation model, and outputs another corresponding object entity representation vector; the model is obtained by training based on a sample head entity, a sample relation entity and a corresponding tail entity label, a neural network constructed in the training process comprises a global sharing layer, each preset knowledge graph specific task representation layer and each corresponding other object entity representation prediction module, and each preset knowledge graph comprises at least N other knowledge graphs except the to-be-supplemented knowledge graph; and if any object entity, the relation entity and the other object entity are determined to be the new triple in the knowledge graph to be completed, adding.
The knowledge completion method for the knowledge graph in the 3 mentioned patents is based on the knowledge completion of isomorphic data of the same type, while the knowledge in the field of power marketing mostly exists in a multi-source heterogeneous form, and a multi-source heterogeneous site is embodied in a form that data exists in a structured and unstructured coexistence mode, for example, information of conversation personnel in a marketing customer service system exists in a structured form, chat records and worksheet data exist in an unstructured form, so that the advantage that the multi-source heterogeneous knowledge graph can be expressed by means of graph embedding is needed, the global consistency among different types of knowledge objects of power marketing is explored, and the balance of quality inspection of the different types of objects can be obtained from knowledge source processing.
Therefore, the invention provides a knowledge completion technology based on a depth sequence model, strengthens the characterization capability of downstream tasks and improves the overall capability of the model.
Disclosure of Invention
In order to solve the defects in the prior art, the invention provides a method and a system for complementing power marketing knowledge based on a depth sequence model, wherein triples are obtained from an original knowledge map and are used as training corpora; acquiring the low-dimensional vector representation of the triple by adopting a graph vector embedding technology; setting the sequence length to be 3 to construct a semantic unit by utilizing the characteristics of the triples in the knowledge graph; and adding an entity relationship for triple prediction, predicting missing parts in the triples, completing the knowledge graph, and improving the availability of the graph and the capability of the model.
The invention adopts the following technical scheme.
A power marketing knowledge completion method based on a depth sequence model comprises the following steps:
step 1, acquiring triples in an original power knowledge graph as training materials;
step 2, obtaining the low-dimensional vector representation of the triple by adopting a graph vector embedding method;
step 3, setting the sequence length to be 3 by utilizing the characteristics of the triples in the knowledge graph, and constructing a semantic unit;
and 4, adding entity relation prediction for triple prediction to strengthen the characterization capability of downstream tasks and improve the overall capability of the model.
Preferably, in step 1, the extraction of the marketing knowledge triples comprises structured data extraction and unstructured data extraction; the structured data are extracted to define a schema model structure of the electric power marketing knowledge map, and the relational data of electric power marketing are converted into map data through a graph database execution script according to the format of the schema; and extracting the unstructured data, coding the unstructured data through a coding layer, extracting a head entity, coding the head entity, multiplexing text codes, repeating the operation, and predicting a tail entity and a relationship.
In step 1, a semi-pointer semi-labeling method is adopted for predicting the head entity, the tail entity and the relationship.
Preferably, in step 2, the whole power marketing knowledge graph is subjected to low-dimensional vector representation through a graph vector embedding method, namely the whole graph is represented through one vector; specifically, step 2 comprises:
step 2.1, sampling and re-marking all sub-graphs in the electric marketing knowledge graph, wherein the sub-graphs are a group of nodes appearing around graph nodes selected in the marketing knowledge graph, and the depth of the surrounding nodes is 3, and the sub-graph range is removed from a frame;
step 2.2: a context prediction model is trained. The graph is very similar to a document, which is a collection of words, and the graph is a collection of sub-graphs. Thus, training can be performed by a method that maximizes the probability of entering a graph subgraph.
The optimized target log-likelihood function of the context prediction model is defined as:
Figure BDA0003878576140000041
wherein, ω is t-c ,...,ω t+c Representing the current word omega t C words before and after. Finally, a one-hot coded one-dimensional vector representation of the input map can be obtained.
Step 2.3: after the model training is finished, the one-hot coded vector of the graph can be obtained only by providing the ID of the graph, and the hidden layer is the embedding result.
Preferably, in step 4, for knowledge-graph KG = { E, R, T }, entity triplet (h, R, T) ∈ T. h is a total of s 、t s Is a head and tail entity structure representation vector h obtained by training a knowledge representation learning model d 、t d Describing the representation of the text for the head and tail entities; the model loss function can be defined as:
E=E s +αE d
wherein, E s Is a loss function through a knowledge representation model based on entity triplet structures, and E d The hyperparameter alpha is used for measuring the information loss of the description text for the loss function based on the description text; wherein E d Can be defined in different ways, in order to ensure E s And E d Ensures synchronization, and has a function E of text information loss d Is defined as follows:
E d =f r (h s ,t d )+f r (h d ,t s )+f r (h d ,t d )
wherein:
Figure BDA0003878576140000042
Figure BDA0003878576140000043
Figure BDA0003878576140000044
f r (h s ,t d ),f r (h d ,t s ) Respectively representing in head-tail entity vector representation, respectively replacing the original structure representation vector by a text description vector, f r (h d ,t d ) To make twoThe entity representation structure vectors are all replaced by entity description text vectors. The two types of entity vector representations are made to interact through computation of the two entity vector representations in the same vector space.
The power marketing knowledge completion system based on the depth sequence model comprises an original knowledge map acquisition module, a triple low-dimensional vector representation module, a semantic unit construction module and a triple prediction module.
The original knowledge map acquisition module extracts the structured data and the unstructured data of the power marketing, and acquires triples in the original power marketing knowledge map;
the triple low-dimensional vector representation module represents the whole graph through a vector, and the graph vector embedding method uses a context prediction network for training by using the idea of word vector embedding, so that the single-hot coded low-dimensional vector representation is obtained;
the semantic unit construction module maps the actual values of the triples in the knowledge graph according to the properties of entities, attributes, relations and the like;
and the triple mapping module carries out entity prediction on the part with the missing part of the triple based on a vector projection model. And completing knowledge completion of the electric power marketing knowledge graph by minimizing a target loss function.
Compared with the prior art, the method has the advantages that enough context information is provided for prediction, the map is expanded, the problem of a large number of relation losses in the knowledge map is solved, and the map availability is improved. In addition, compared with the traditional knowledge completion method, the method adds entity relation prediction on triple prediction, thereby strengthening the representation capability of downstream tasks and improving the overall capability of the model.
Drawings
Fig. 1 is a flow chart of a power marketing knowledge completion method based on a depth sequence model according to the invention.
FIG. 2 is a flow chart of power marketing unstructured data extraction in the present invention.
FIG. 3 is a schematic diagram of a physical prediction TransH projection vector.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be described clearly and completely in the following with reference to the accompanying drawings in the embodiments of the present invention. The embodiments described in this application are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art without inventive step, are within the scope of protection of the present invention.
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be described clearly and completely in the following with reference to the accompanying drawings in the embodiments of the present invention. The embodiments described herein are only some embodiments of the invention, and not all embodiments. All other embodiments obtained by a person skilled in the art without any inventive step based on the spirit of the present invention are within the scope of the present invention.
A power marketing knowledge completion method based on a depth sequence model specifically comprises the following steps:
step 1: and acquiring triples in the original power knowledge graph as training materials.
A knowledgegraph typically records hundreds of millions of real world facts using triples, which may be written in the form of (h, r, t), where h and t represent entities and r represents the relationship between h and t.
The extraction of the marketing knowledge triples in the power marketing knowledge graph comprises two types of structured data extraction and unstructured data extraction, for example, in customer service business of a marketing domain, user identity information, electricity charge information and the like are structured data, and chat records, work order logs and the like are unstructured data.
(1) The electric marketing structured data extraction method comprises the following steps: and aiming at structured data such as user identity information, electric charge information and the like, defining a schema model structure of the electric power marketing knowledge map. According to the format of the schema, the relational data of the power marketing is converted into graph data through a graph database execution script.
(2) The electric power marketing unstructured data extraction method comprises the following steps: and (4) extracting through the model.
A power marketing unstructured data triplet extraction flow diagram is shown in fig. 2. The input is unstructured text information of electric power marketing, the information is coded through a coding (encoder) layer, a head entity (subject) is extracted, the head entity is coded and multiplexed with text coding, the operation is repeated, and meanwhile, a tail entity (subject) and a relation (predicate) are predicted. It is noted that the head entity, tail entity and relationship prediction can be implemented in various forms, including a BIO manner and a semi-pointer semi-label manner. In order to utilize the advantage of small sample semi-supervised learning and save the model training cost, the embodiment adopts a semi-pointer semi-labeling implementation mode. However, this is only a preferred but non-limiting embodiment, and those skilled in the art can realize the entity prediction in any other way within the spirit of the present invention, and all fall within the scope of the present invention.
The semi-pointer semi-labeled entity prediction mode is that a head entity is predicted first, then the head entity is transmitted into a model to predict a tail entity corresponding to the head entity, and then the head entity and the tail entity are transmitted to predict the relation between the head entity and the tail entity, and the triples of non-structural data can be obtained through the mode. More specifically, the steps of semi-pointer semi-annotation entity prediction are as follows:
step 1.1, performing text embedding on input data by adopting a bert text pre-training model, converting the text embedded data into a one-dimensional vector, and predicting the head and tail positions of corresponding head entities by using a semi-pointer semi-labeling mode;
and step 1.2, splicing the text embedding vector corresponding to the predicted head entity with an intermediate hidden layer in a bert text pre-training process, continuously predicting the relation and the tail entity, and constructing a semi-pointer and semi-labeled structure for each relation to predict the head and tail positions of the corresponding tail entity, namely predicting the tail entity and the relation at the same time.
Step 2: and acquiring the low-dimensional vector representation of the triplet by using a map vector embedding method.
Graph embedding may utilize a set of low-dimensional dense regression quantities that are easy to compute to help efficiently perform graph analysis. The group of low-dimensional dense vectors consists of vector representations of each node in the graph, and is obtained by embedding the structure information around the nodes into a vector space through a graph embedding method, so that the structure characteristics of the nodes in the original graph are kept as much as possible, the structure information around each node is mapped into one low-dimensional feature vector, and the association relationship among the nodes is maintained.
The whole power marketing knowledge graph is subjected to low-dimensional vector representation through a graph vector embedding method, namely the whole graph is represented through one vector, and the graph vector embedding method is trained by using a context prediction network by taking the idea of word vector embedding. Specifically, step 2 comprises the following substeps:
step 2.1: all sub-graphs in the electricity marketing knowledge graph are sampled and re-labeled, a sub-graph is a set of nodes that appear around selected graph nodes in the marketing knowledge graph, and the sub-graph range is unblinded with a surrounding node depth of 3 in the present implementation.
Step 2.2: a context prediction model is trained. A graph is very similar to a document, which is a collection of word constituents, and a graph is a collection of sub-graph constituents. The conditions of the model are defined for training by maximizing the probability of entering the graph subgraph.
The optimized target log-likelihood function of the context prediction model is defined as:
Figure BDA0003878576140000071
wherein, ω is t-c ,...,ω t+c Represents the current word omega t C words before and after. Finally, one-hot (one-hot encoded) one-dimensional vector representation of the input map can be obtained.
Step 2.3: after the model training is completed, only the ID of the graph needs to be provided to obtain the one-hot vector of the graph, and the hidden layer is the embedding result.
And 3, setting the sequence length to be 3 by utilizing the characteristics of the triples in the knowledge graph to construct a semantic unit.
In the field of power marketing, a knowledge graph triple representation mode is used, namely G = (E, R, S), wherein E = { E1, E2, \8230; 8230; E | E | is an entity set in a knowledge base of the marketing field, and the entity set comprises | E | different entities; r = { R, R, \8230;, R } is a set of relationships in the marketing domain knowledge base, collectively encompassing | R | different relationships;
Figure BDA0003878576140000072
Figure BDA0003878576140000073
representing a set of triples in a knowledge base. The basic form of the electricity marketing triple mainly comprises an entity 1, a relation, an entity 2, concepts, attributes, attribute values and the like, wherein the electricity marketing entity is the most basic element in an electricity marketing knowledge graph, and different relations exist among different entities. Concepts refer to categories, such as electricity usage categories, etc.; attributes mainly refer to categories in which categories may exist, such as agricultural production electricity prices and the like; the attribute value mainly refers to a specific value for classification, such as electricity consumption for agricultural production (10 kV), primary processing of agricultural products (10 kV), and the like. The semantic unit is constructed in the power marketing field, namely the triples in the knowledge graph are mapped to actual values according to the characteristics of entities, attributes, relations and the like.
And 4, on the basis of the representation, adding entity relation prediction for triple prediction to strengthen the representation capability of downstream tasks and improve the overall capability of the model.
The triple prediction task is to take a part of triple data with a missing part, such as a triple (h, r,.
Furthermore, the triple prediction is mainly entity prediction, and the entity prediction is that a hyperplane w related to the relation r is established through a TransH model (knowledge representation learning model based on hyperplane) r Projecting the head solid vector and the tail solid vectors h and t on the hyperplane respectively to obtain projection vectorsIs h r 、t r As shown in fig. 3. The projection vector of the head-tail entity expected by the TransH model is represented by the vector of the relation r on the plane r And are connected with a small error. Wherein, the vector of the head-tail entity projection is represented as:
h s =l h -W r T l h W r
t s =l t -W r T l t W r
and then can pass through the loss function h s +l r -t s || 2 The accuracy of the newly obtained entity triples is judged. To this end, the objective function is defined as:
min∑ (h,r,t)∈T(h’,r’,t’)∈T’ [f r (h,t)+γ-f r′ (h’,t’)]
wherein, T is the positive sample correct triplet, T' is the negative sample error triplet, and gamma is the interval between the scores of the positive sample and the negative sample. The target function expects that the difference between the scores of the negative samples and the scores of the positive samples is larger than the interval gamma, and the target function is used for updating the parameters and improving the distinguishing capability of the model on the positive samples and the negative samples.
Further, for knowledge-graph KG = { E, R, T }, the entity triplet (h, R, T) ∈ T. h is a total of s 、t s Is a head and tail entity structure representation vector h obtained by training a knowledge representation learning model d 、t d The representation of the text is described for head and tail entities. The loss function of the model is defined as:
E=E s +αE d
wherein, E s Is a loss function through a knowledge representation model based on entity triplet structures, and E d The hyper-parameter α is used to measure the loss of information describing the text, for a text-based loss function. Wherein E d Can be defined in different ways, in order to ensure E s And E d Ensures synchronization, and has a function E of text information loss d Is defined as follows:
E d =f r (h s ,t d )+f r (h d ,t s )+f r (h d ,t d )
wherein:
Figure BDA0003878576140000091
Figure BDA0003878576140000092
Figure BDA0003878576140000093
f r (h s ,t d ),f r (h d ,t s ) Respectively representing in head-tail entity vector representation, respectively replacing the original structure representation vector by a text description vector, f r (h d ,t d ) To replace both entity representation structure vectors with entity description text vectors. The two types of entity vector representations are made to interact through computation of the two entity vector representations in the same vector space.
The prediction of the triples can be completed through the process, so that the completion of knowledge of the marketing knowledge graph is completed.
A power marketing knowledge completion system based on a depth sequence model comprises an original knowledge map acquisition module, a triple low-dimensional vector representation module, a semantic unit construction module and a triple prediction module.
The original knowledge map acquisition module extracts the structured data and the unstructured data of the power marketing to acquire triples in the original power marketing knowledge map;
the triple low-dimensional vector representation module represents the whole graph through a vector, and the graph vector embedding method uses a context prediction network for training by using the idea of word vector embedding, so that the single-hot coded low-dimensional vector representation is obtained;
the semantic unit construction module maps the actual values of the triples in the knowledge graph according to the properties of entities, attributes, relations and the like;
and the triple mapping module carries out entity prediction on the part with the missing part of the triple based on a vector projection model. And completing knowledge completion of the electric power marketing knowledge graph by minimizing a target loss function.
The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.
The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as a punch card or an in-groove protruding structure with instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be interpreted as a transitory signal per se, such as a radio wave or other freely propagating electromagnetic wave, an electromagnetic wave propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or an electrical signal transmitted through an electrical wire.
The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.
Computer program instructions for carrying out operations of the present disclosure may be assembler instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims (8)

1. A power marketing knowledge completion method based on a depth sequence model is characterized by comprising the following steps:
step 1, acquiring triples in an original power knowledge graph as training materials;
step 2, obtaining the low-dimensional vector representation of the triple by adopting a graph vector embedding method;
step 3, setting the sequence length to be 3 by utilizing the characteristics of the triples in the knowledge graph, and constructing a semantic unit;
and 4, adding entity relation prediction for triple prediction to strengthen the characterization capability of downstream tasks and improve the overall capability of the model.
2. The electric power marketing knowledge completion method based on the depth sequence model, according to claim 1, is characterized in that:
in the step 1, extracting marketing knowledge triples comprises structured data extraction and unstructured data extraction; the structured data is extracted to define a schema model structure of the electric power marketing knowledge map, and the relational data of the electric power marketing is converted into map data through a map database execution script according to the format of the schema; and extracting the unstructured data, coding the unstructured data through a coding layer, extracting a head entity, coding the head entity, multiplexing text codes, repeating the operation, and predicting a tail entity and a relationship.
3. The electric power marketing knowledge completion method based on the depth sequence model, according to claim 1, is characterized in that:
in step 1, a semi-pointer semi-labeling method is adopted for predicting the head entity, the tail entity and the relationship.
4. The electric power marketing knowledge completion method based on the depth sequence model, according to claim 1, is characterized in that:
step 2, carrying out low-dimensional vector representation on the whole power marketing knowledge graph by a graph vector embedding method, namely representing the whole graph by one vector; specifically, step 2 comprises:
step 2.1, sampling and re-marking all sub-graphs in the electric marketing knowledge graph, wherein the sub-graphs are a group of nodes appearing around graph nodes selected in the marketing knowledge graph, and the depth of the surrounding nodes is 3, and the sub-graph range is removed from a frame;
step 2.2: training a context prediction model by a method of maximizing the probability of inputting a graph subgraph;
the optimized target log-likelihood function of the context prediction model is defined as:
Figure FDA0003878576130000021
wherein, ω is t-c ,...,ω t+c Representing the current word omega t C words before and after the word; finally, a one-hot coded one-dimensional vector representation of the input graph can be obtained;
step 2.3: after the model training is completed, the ID of the graph is provided to obtain the one-hot coded vector of the graph, and the hidden layer is the embedding result.
5. The electric power marketing knowledge completion method based on the depth sequence model, according to claim 1, is characterized in that:
in step 4, regarding a knowledge graph KG = { E, R, T }, an entity triplet (h, R, T) belongs to T; h is s 、t s Is a head and tail entity structure representation vector h obtained by training a knowledge representation learning model d 、t d Describing the representation of the text for the head and tail entities; the model loss function can be defined as:
E=E s +αE d
wherein, E s Is a loss function through a knowledge representation model based on entity triplet structures, and E d The hyperparameter alpha is used for measuring the information loss of the description text for the loss function based on the description text; wherein E d Can be differentFormula (II) is defined to ensure E s And E d Ensures synchronization, and has a function E of text information loss d Is defined as follows:
E d =f r (h s ,t d )+f r (h d ,t s )+f r (h d ,t d )
wherein:
Figure FDA0003878576130000022
Figure FDA0003878576130000023
Figure FDA0003878576130000024
f r (h s ,t d ),f r (h d ,t s ) Respectively representing in head-tail entity vector representation, respectively replacing the original structure representation vector by a text description vector, f r (h d ,t d ) Replacing the two entity representation structure vectors with entity description text vectors; the two types of entity vector representations are made to interact through computation of the two entity vector representations in the same vector space.
6. A power marketing knowledge complementing system based on a depth sequence model runs the power marketing knowledge complementing method based on the depth sequence model according to claims 1-5, and comprises an original knowledge map obtaining module, a triple low-dimensional vector representation module, a semantic unit constructing module and a triple predicting module; the method is characterized in that:
the original knowledge map acquisition module extracts the structured data and the unstructured data of the power marketing to acquire triples in the original power marketing knowledge map;
the triple low-dimensional vector representation module represents the whole graph through a vector, and the graph vector embedding method uses a context prediction network for training by using the idea of word vector embedding, so that the single-hot coded low-dimensional vector representation is obtained;
the semantic unit construction module maps the actual values of the triples in the knowledge graph according to the properties of entities, attributes, relations and the like;
and the triple mapping module carries out entity prediction on the part with the missing part of the triple based on a vector projection model. And completing the completion of knowledge of the power marketing knowledge graph by minimizing the target loss function.
7. A terminal comprising a processor and a storage medium; the method is characterized in that:
the storage medium is to store instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method according to any one of claims 1 to 5.
8. Computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.
CN202211273644.9A 2022-10-08 2022-10-08 Electric power marketing knowledge completion method and system based on depth sequence model Pending CN115510246A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211273644.9A CN115510246A (en) 2022-10-08 2022-10-08 Electric power marketing knowledge completion method and system based on depth sequence model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211273644.9A CN115510246A (en) 2022-10-08 2022-10-08 Electric power marketing knowledge completion method and system based on depth sequence model

Publications (1)

Publication Number Publication Date
CN115510246A true CN115510246A (en) 2022-12-23

Family

ID=84510028

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211273644.9A Pending CN115510246A (en) 2022-10-08 2022-10-08 Electric power marketing knowledge completion method and system based on depth sequence model

Country Status (1)

Country Link
CN (1) CN115510246A (en)

Similar Documents

Publication Publication Date Title
CN109960810B (en) Entity alignment method and device
CN111026842B (en) Natural language processing method, natural language processing device and intelligent question-answering system
CN106502985B (en) neural network modeling method and device for generating titles
CN113505244B (en) Knowledge graph construction method, system, equipment and medium based on deep learning
CN110889556A (en) Enterprise operation risk prediction method and system
US11907675B2 (en) Generating training datasets for training neural networks
CN111444340A (en) Text classification and recommendation method, device, equipment and storage medium
Olmezogullari et al. Representation of click-stream datasequences for learning user navigational behavior by using embeddings
CN114820871B (en) Font generation method, model training method, device, equipment and medium
CN116661805B (en) Code representation generation method and device, storage medium and electronic equipment
CN114330966A (en) Risk prediction method, device, equipment and readable storage medium
CN111339765A (en) Text quality evaluation method, text recommendation method and device, medium and equipment
CN115238045B (en) Method, system and storage medium for extracting generation type event argument
CN115563297A (en) Food safety knowledge graph construction and completion method based on graph neural network
CN115688920A (en) Knowledge extraction method, model training method, device, equipment and medium
CN112528654A (en) Natural language processing method and device and electronic equipment
CN111597816A (en) Self-attention named entity recognition method, device, equipment and storage medium
CN116956929B (en) Multi-feature fusion named entity recognition method and device for bridge management text data
CN116776881A (en) Active learning-based domain entity identification system and identification method
CN116383741A (en) Model training method and cross-domain analysis method based on multi-source domain data
CN115700492A (en) Program semantic representation learning and prediction method based on inter-graph neural network
CN115510246A (en) Electric power marketing knowledge completion method and system based on depth sequence model
CN114897183A (en) Problem data processing method, and deep learning model training method and device
CN114328956A (en) Text information determination method and device, electronic equipment and storage medium
CN113535946A (en) Text identification method, device and equipment based on deep learning and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination