CN115982385A

CN115982385A - Knowledge graph based relation graph neural network patent quality assessment method

Info

Publication number: CN115982385A
Application number: CN202310094703.4A
Authority: CN
Inventors: 雷方元; 周木平; 蒋健健; 梁敏靖
Original assignee: Guangdong Polytechnic Normal University
Current assignee: Guangdong Polytechnic Normal University
Priority date: 2023-02-07
Filing date: 2023-02-07
Publication date: 2023-04-18

Abstract

The invention discloses a knowledge graph-based relation graph neural network patent quality assessment method, belongs to the technical field of patent quality assessment, and solves the technical problem that an attribute network used in the current patent quality assessment is single, and the method comprises the following steps: acquiring a patent data sample set; dividing a patent data sample set into a training set and a verification set; constructing a patent knowledge graph, and initializing embedding of a relation entity of the patent knowledge graph by using a knowledge graph embedding model; defining a forward propagation process of a relation graph neural network model based on a patent knowledge graph, and defining a loss function for iterative optimization; inputting the training set into a relational graph neural network model for training; inputting the verification set into a trained relational graph neural network model to obtain a patent quality evaluation model; inputting a patent to be evaluated into a patent quality evaluation model to obtain a patent entity node characteristic representation, and performing Softmax prediction labeling on the patent entity node characteristic representation to obtain a prediction result.

Description

Knowledge graph based relation graph neural network patent quality assessment method

Technical Field

The invention relates to the technical field of patent quality assessment, in particular to a knowledge graph-based relation graph neural network patent quality assessment method.

Background

Patents are one of the major sources of information in the big data era, and are also one of the important digital assets of individuals and businesses. High-quality patents are beneficial to improving the competitiveness of individuals and companies, and efficient patent quality assessment is beneficial to rapid policy making. Currently, each year of patent applications are millions, and huge patent databases provide opportunities and challenges for automatic patent quality assessment.

Recent advances in computer science and technology, such as machine learning and deep learning, have made a significant impact in many areas of automation, bringing solutions to the automated analysis of patent data. These latest techniques have been used in a limited way to explore the field of patent analysis. The dominant method is to input the patent characteristics, the manually made comprehensive characteristics or the text data of the patent into an artificial neural network model to predict the patent quality, and other methods are to utilize the citation network of the patent to evaluate the patent quality.

The attribute network used in the current patent quality evaluation is single. Patent data not only contains patent citation networks, but also index networks such as patent inventors, patent applicants, classification numbers and the like exist, and the networks can be connected by taking patents as links to construct patent knowledge maps. The knowledge graph is essentially graph structure data, rich semantic information of the patent knowledge graph can be extracted by means of an advanced graph neural network model, and better support is provided for quality evaluation of patents. The traditional graph neural network aims at a homogeneous graph, namely, each relation of nodes is treated equally, and a patent knowledge graph is considered to be a heterogeneous graph, so that a plurality of kinds of relation information exist. Therefore, in order to better utilize the relation information of the knowledge graph, a relation graph neural network model based on the knowledge graph is provided, and by coding the relation of the knowledge graph and the neighbor entity information, the method can capture the high-order neighborhood semantic information of different relations of the patent to better predict the classification of the patent quality.

Disclosure of Invention

The invention aims to solve the technical problem of the prior art, and aims to provide a relation graph neural network patent quality assessment method based on a knowledge graph.

The technical scheme of the invention is as follows: a relation graph neural network patent quality assessment method based on a knowledge graph comprises the following steps:

s1, acquiring a patent data sample set;

s2, dividing the patent data sample set into a training set and a verification set;

s3, constructing a patent knowledge graph, and initializing embedding of a relation entity of the patent knowledge graph by using a knowledge graph embedding model;

s4, defining a forward propagation process of a relation graph neural network model based on the patent knowledge graph, and defining a loss function for iterative optimization;

s5, inputting the training set into the relational graph neural network model for training;

inputting the verification set into a trained relational graph neural network model; adjusting the network of the relational graph neural network model to obtain an optimized relational graph neural network model, and taking the optimized relational graph neural network model as a patent quality evaluation model;

and S6, inputting the patent to be evaluated into the patent quality evaluation model to obtain patent entity node characteristic representation, and performing Softmax prediction labeling on the patent entity node characteristic representation to obtain a prediction result.

As a further improvement, in step S1, the patents are classified into several quality classes according to the number of forward citations of the patents within 5 years after release, and the greater the number of forward citations, the better the quality of the patents is indicated.

Further, in step S2, a training set and a verification set are divided according to the ratio of the patent quality labels of different grades.

Further, the step S3 includes the following steps:

s31, firstly, defining an ontology model of the patent knowledge graph, wherein the ontology model comprises entities, relations and attribute types contained in the patent knowledge graph;

then, performing example-ontology mapping, mapping the patent data into < head entity, relation or attribute, tail entity > triple format storage of the knowledge graph, and completing construction of the patent knowledge graph;

and S32, initializing the entity relationship embedding of the patent knowledge graph constructed in the step S31 by using the existing knowledge graph embedding model TransE.

Further, the step S31 includes the steps of:

s311, defining 6 entity concepts, 12 relationships and 11 patent attributes by using a patent entity as a core through the ontology model of the patent knowledge graph;

the 6 entity concepts comprise a patent identification, an applicant, an inventor, a country, a classification number and a release time entity;

the triplet form of the 12 relations includes < patent, family, patent >, < patent, back reference, patent >, < patent, first inventor, inventor >, < patent, secondary inventor, inventor >, < inventor, study partnership, inventor >, < patent, first applicant, applicant >, < patent, secondary applicant, applicant >, < applicant, application partnership, applicant >, < patent, applicant country, country >, < patent, first IPC class number, class number >, < patent, secondary IPC class number, class number > and < patent, published at, published time >;

the 11 patent attributes comprise the number of independent claims, the number of dependent claims, the number of inventors, the number of patent families, the number of family countries, the number of back-references, the number of applicants, the number of CPCs, the number of IPCs, the length of abstract, and the length of claim;

step S312, after the ontology model of the patent knowledge graph is established, mapping the patent data into triples; the patent data are structured and semi-structured, and can be directly converted into triples through a data mapping mechanism based on a mode, and the triples are stored in a Neo4j graph database to complete the construction of the patent knowledge graph.

Further, the following steps are included in step S4:

s41, obtaining map node embedding characteristics of the patent knowledge map constructed in the S3, and representing the patent knowledge map as a set

Wherein h and t of each triple (h, r and t) in the set respectively represent a head entity and a tail entity, and r represents the relationship or attribute between h and t; epsilon,. Sup>

A collection of entities and relationships, respectively;

s42, defining a propagation sampling process of a relation graph neural network model based on the knowledge graph in the patent knowledge graph G, and acquiring a training neighbor triple set of the patent entity;

s43, defining a relation aggregation process of a relation graph neural network model based on the knowledge graph; defining first-order neighbor entities of the h nodes and relationship embedding for encoding to extract the contribution of different relationships to the h nodes;

s44, defining a prediction process of a relation graph neural network model based on a knowledge graph;

and S45, defining a loss function according to the forward propagation process of the relation graph neural network model based on the knowledge graph.

Further, in step S42,

propagation starts with a patent node A as the head entity h, using N _h Denotes "{ (h, r, t) | (h, r, t) ∈ G } for specialityUtilizing a first-order neighbor triple set of the node A; then, taking the tail entity t in the set as a new head entity, and obtaining a second-order neighbor triple set of the patent node A; continuously iterating and propagating to obtain a k-order neighbor triple set of the patent node A;

by sampling a limited number of node neighbors, the computational pattern of model training can be kept unchanged and more efficient.

Further, the following steps are included in step S43:

step S431, coding neighbor and relation information of the node h through nonlinear conversion, wherein the expression is as follows:

e _rt ＝W·Concat(e _t ,e _r )+b

where W and b are trainable weights and biases, concat () represents a splice operation, e _t And e _r Embedding tail entities and relations in the triples respectively;

step S432, using a neighborhood representation of the mean aggregation function aggregation node, wherein the expression is as follows:

step S433, after the neighborhood representation of the node is obtained, the representation of the node and the neighborhood representation are spliced together, and then the first-order implicit representation of the node is obtained through a linear layer:

wherein e _h Is the embedding of the header entity, σ () is a non-linear activation function, here a relu activation function is used;

step S434, further stacking multi-order neighbor information transmission to explore the high-order representation of the node; when the number of iterations k > =2, the entity embedding will be recursively expressed as:

where f () is an abbreviation of the formula of the procedure of step S433,

is an embedding of layer k-1 head entities, is based on the fact that the number of layer k-1 head entities is greater than or equal to>

Is the k-1 layer neighborhood embedding, with k set to 2 by default.

Further, in step S44, the patent entity is represented in high order

Inputting the probability distribution P on the classification label of the patent entity h into an MLP network, and then obtaining the probability distribution P on the classification label of the patent entity h through a softmax function _h ；

The prediction category y' of the patent entity h is the category corresponding to the index with the maximum probability, and the formula is as follows:

y′＝argmax(P _h )。

further, in step S45, the final loss function is expressed as:

wherein Y is _i Is the one-hot vector of the classification label for patent quality,

is a training set.

Advantageous effects

Compared with the prior art, the invention has the advantages that:

compared with the traditional patent quality evaluation method, the patent quality is predicted by constructing the patent knowledge graph to utilize semantic information of patent data, graph information aggregation is carried out on the relation of the model coding knowledge graph and the neighbor entities, different semantic information of each neighbor entity of the node is sensed, the accuracy rate is improved in a patent quality evaluation experiment, and the method has important significance for effective and rapid evaluation of the patent quality.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a flow chart of a relational graph neural network of the present invention.

Detailed Description

The invention will be further described with reference to specific embodiments shown in the drawings.

Referring to fig. 1 to 2, a knowledge-graph-based relation neural network patent quality assessment method includes the following steps S1 to S6:

s1, acquiring a patent data sample set, wherein the patent data can be acquired in the same field or in different fields, the patents can be classified into a plurality of quality grades according to the forward reference times of the patents within 5 years after release, and the more the forward reference times, the better the patent quality is.

And S2, dividing the patent data sample set into a training set and a verification set, wherein the training set and the verification set are divided according to the proportion of patent quality labels of different grades in practical application due to different acquisition difficulties of the patent quality labels of different grades.

S3, constructing a patent knowledge graph, and initializing embedding of a relation entity of the patent knowledge graph by using a knowledge graph embedding model, wherein the method specifically comprises the following steps S31-S32:

and then carrying out example-ontology mapping, and mapping the patent data into a triple format of < head entity, relation or attribute, tail entity > of the knowledge graph for storage, thereby completing construction of the patent knowledge graph.

Step S31 specifically includes the following steps S311 to S312:

step S311, the ontology model of the patent knowledge graph takes the patent entity as a core, and defines 6 entity concepts, 12 relations and 11 patent attributes.

The 6 entities concept includes patent identification, applicant, inventor, country, classification number and release time entities.

The triple form of 12 relationships includes < patent, family, patent >, < patent, back reference, patent >, < patent, first inventor, inventor >, < patent, minor inventor, inventor >, < inventor, study partnership, inventor >, < patent, first applicant, applicant >, < patent, minor applicant, applicant >, < applicant, application partnership, applicant >, < patent, applicant country, country >, < patent, first IPC classification, classification number >, < patent, minor IPC classification, classification number > and < patent, published at publication time >.

The 11 patent attributes include the number of independent claims, the number of dependent claims, the number of inventors, the number of patent families, the number of family countries, the number of back-references, the number of applicants, the number of CPCs, the number of IPCs, the length of abstract, and the length of claim.

S4, defining a forward propagation process of a relation graph neural network model based on a patent knowledge graph, and defining a loss function for iterative optimization, wherein the method specifically comprises the following steps S41-S45:

s41, acquiring map node embedding characteristics of the patent knowledge map constructed in the S3, and enabling the patent to be embedded in the map node embedding characteristicsKnowledge graph representation as a set

A collection of entities and relationships, respectively.

And S42, defining a propagation sampling process of the knowledge graph-based relation graph neural network model in the patent knowledge graph G, and acquiring a training neighbor triple set of the patent entity. Propagation starts with a patent node A as the head entity h, using N _h The first-order neighbor triplet set of the patent node A is represented by { (h, r, t) | (h, r, t) ∈ G }; then, taking the tail entity t in the set as a new head entity, and obtaining a second-order neighbor triple set of the patent node A; and continuously iterating and propagating to obtain a k-order neighbor triple set of the patent node A.

In reality, the number of neighbors of the node is greatly different, and the computation mode of model training can be kept unchanged and more efficient by sampling the node neighbors in a limited quantity.

S43, defining a relation aggregation process of a relation graph neural network model based on the knowledge graph; defining coding for first-order neighbor entities of h nodes and relationship embedding to extract contributions of different relationships to the h nodes, and comprising the following steps S431 to S434:

e _rt ＝W·Concat(e _t ,e _r )+b

where W and b are trainable weights and offsets, concat () represents a splice operation, e _t And e _r Respectively, the embedding of tail entities and relationships in triples.

Step S432, using a neighborhood representation of the aggregation node of the mean aggregation function, wherein the expression is as follows:

step S433, after the neighborhood representation of the node is obtained, the representation of the node and the neighborhood representation are spliced together, and then a first-order implicit representation of the node is obtained through a linear layer:

wherein e _h Is the embedding of the header entity and σ () is a non-linear activation function, here a relu activation function is used.

where f () is an abbreviation of the process formula of step S433,

Is a k-1 layer neighborhood embedding, with k set to 2 by default.

And S44, defining a prediction process of the relation graph neural network model based on the knowledge graph. According to step S434, k-order representation of the patent entity can be obtained

High order representation of patent entity>

y′＝argmax(P _h )。

and S45, defining a loss function according to the forward propagation process of the relation graph neural network model based on the knowledge graph. The final loss function is expressed as:

wherein Y is _i Is the one-hot vector of the classification label of patent quality,

is a training set.

And S5, inputting the training set into a relation graph neural network model for training.

and S6, inputting the patent to be evaluated into a patent quality evaluation model to obtain a patent entity node characteristic representation, and performing a Softmax prediction label on the patent entity node characteristic representation to obtain a prediction result.

Practical application

The following is the procedure for evaluating the field of "digital information transmission":

s1, obtaining 'digital information transmission' field application patent with main IPC classification number H04L as sample data, including forward citation patent, IPC classification number, applicant, inventor, applicant country, release time, number of independent claims, number of dependent claims, number of inventor, number of patent family, number of family country, number of reverse citation, number of applicant, CPC number, IPC number, abstract length and claim length information of the patent, and forming a data set.

Processing the forward citation patent of the patent, and labeling the patent data; the method comprises the steps of obtaining the number of forward citations of patents within 5 years after the patents are issued, dividing the patents into 3 quality levels, and labeling the patents according to the following rules: more than or equal to 10 is L1 grade, 2-9 is L2 grade, and 0-1 is L3 grade.

S2, dividing a patent data sample set into a training set and a verification set, and setting the proportion of each classification label as 8:2.

s3, building a patent knowledge graph and initializing embedding of a graph relation entity by using a knowledge graph embedding model; directly converting the patent data into triples through a data mapping mechanism based on a mode according to a defined patent knowledge map ontology model; for example, the publication time of the invention patent with publication number CN205829662U is 2016-12-21, and can be directly mapped to < CN205829662U, published in 2016-12-21> triplets.

And S4, defining a forward propagation process of the relation graph neural network model based on the knowledge graph, and defining a loss function for iterative optimization.

S5, inputting the training set into a relation graph neural network model based on the knowledge graph; and then inputting the verification set into the trained relation graph neural network model based on the knowledge graph, adjusting the network, and obtaining the optimized relation graph neural network model based on the knowledge graph.

Inputting the training set into a relation graph neural network model based on the knowledge graph, performing parameter training on the network, and acquiring the trained parameter weight.

Inputting the verification set into a trained relation graph neural network model based on the knowledge graph, optimizing the trained parameter weight, and obtaining the optimized parameter weight.

And S6, inputting the patent to be evaluated into the optimized relation graph neural network model based on the knowledge graph, obtaining node feature representation of the test set according to the optimized parameter weight, and performing Softmax prediction labeling on the node features to obtain a prediction result.

The above is only a preferred embodiment of the present invention, and it should be noted that it is obvious to those skilled in the art that several variations and modifications can be made without departing from the structure of the present invention, which will not affect the effect of the implementation of the present invention and the utility of the patent.

Claims

1. A relation graph neural network patent quality assessment method based on a knowledge graph is characterized by comprising the following steps:

s1, acquiring a patent data sample set;

and S6, inputting the patent to be evaluated into the patent quality evaluation model to obtain a patent entity node characteristic representation, and performing a Softmax prediction label on the patent entity node characteristic representation to obtain a prediction result.

2. The method for patent quality assessment based on a knowledge-graph neural network as claimed in claim 1, wherein in step S1, the patents are classified into several quality grades according to the number of forward citations of the patents within 5 years after release, and the greater the number of forward citations, the better the patent quality is.

3. The knowledge-graph-based relation neural network patent quality assessment method according to claim 2, wherein in step S2, a training set and a verification set are divided according to the proportion of patent quality labels of different grades.

4. The knowledge-graph-based relational neural network patent quality assessment method according to claim 1, wherein the step S3 comprises the following steps:

then, performing example-ontology mapping, and mapping the patent data into a triple format of < head entity, relation or attribute, tail entity > of the knowledge graph for storage to complete construction of the patent knowledge graph;

5. The method for evaluating the patent quality based on the relational graph neural network of the knowledge graph as claimed in claim 4, wherein the step S31 comprises the following steps:

s311, defining 6 entity concepts, 12 relations and 11 patent attributes by using a patent entity as a core through the ontology model of the patent knowledge graph;

the triple form of the 12 relations comprises < patent, family, patent >, < patent, back citation, patent >, < patent, first inventor, inventor >, < patent, secondary inventor, inventor >, < inventor, research cooperative relation, inventor >, < patent, first applicant, applicant >, < patent, secondary applicant, applicant >, < applicant, application cooperative relation, applicant >, < patent, applicant country, country >, < patent, first IPC classification number, classification number >, < patent, secondary IPC classification number, classification number > and < patent, issued in, issue time >;

the 11 patent attributes comprise the number of independent claims, the number of dependent claims, the number of inventors, the number of patent families, the number of family countries, the number of back references, the number of applicants, the number of CPCs, the number of IPCs, the length of abstract and the length of claims;

step S312, after the ontology model of the patent knowledge graph is established, mapping the patent data into triples; the patent data is structured and semi-structured, and can be directly converted into triples through a data mapping mechanism based on a mode, and the triples are stored in a Neo4j database to complete the construction of a patent knowledge graph.

6. The knowledge-graph-based relational neural network patent quality assessment method according to claim 4, wherein the step S4 comprises the following steps:

s41, obtaining graph node embedding characteristics of the patent knowledge graph constructed in the step S3, and representing the patent knowledge graph as a set

Wherein h and t of each triplet (h, r and t) in the set respectively represent a head entity and a tail entity, and r represents a relationship or an attribute between h and t; epsilon,. Sup>

A collection of entities and relationships, respectively;

s43, defining a relation aggregation process of a relation graph neural network model based on a knowledge graph; defining a first-order neighbor entity of the h node and embedding the relationship to carry out coding so as to extract the contribution of different relationships to the h node;

7. The method for assessing the patent quality of the neural network based on the knowledge-graph as claimed in claim 6, wherein in step S42,

propagation begins with a patent node A as the head entity h, using N _h The first-order neighbor triplet set of the patent node A is represented by { (h, r, t) | (h, r, t) ∈ G }; then, a second-order neighbor triple set of the patent node A can be obtained by taking the tail entity t in the set as a new head entity; continuously iterating and propagating to obtain a k-order neighbor triple set of the patent node A;

by sampling a defined number of node neighbors, the computational pattern of model training can be kept unchanged and more efficient.

8. The method for assessing the patent quality of the knowledge-graph-based relational graph neural network according to claim 7, wherein the step S43 comprises the following steps:

e _rt ＝W·Concat(e _t ,e _r )+b

where W and b are trainable weights and offsets, concat () represents a splice operation, e _t And e _r Embedding tail entities and relations in the triples respectively;

wherein e _h Is the embedding of the header entity, σ () is a nonlinear activation function, here, a relu activation function is used;

where f () is an abbreviation of the process formula of step S433,

Is a k-1 layer neighborhood embedding, with k set to 2 by default.

9. The method for knowledge-graph-based neural network patent quality assessment according to claim 8, wherein in step S44, the patent entity is represented in high order

Inputting the probability distribution P on the classification label of the patent entity h through an MLP network and then obtaining the probability distribution P through a softmax function _h ；

y′＝argmax(P _h )。

10. the method for assessing the patent quality of the knowledge-graph-based relational neural network according to the claim 9, wherein in the step S45, the final loss function is expressed as:

is a training set. />