CN112307165A

CN112307165A - Core patent judgment method and device

Info

Publication number: CN112307165A
Application number: CN202011178049.8A
Authority: CN
Inventors: 程艳
Original assignee: Wuhan Chanlue Technology Co ltd
Current assignee: Wuhan Chanlue Technology Co ltd
Priority date: 2020-10-29
Filing date: 2020-10-29
Publication date: 2021-02-02

Abstract

The invention discloses a method and a device for judging a core patent, wherein the method comprises the following steps: obtaining an authorized patent database in the same field, extracting technical characteristics of independent claims of each patent document in the database, and constructing a knowledge graph for each patent document according to the technical characteristics; fusing the knowledge maps of the patent documents to form a total knowledge map of a database; counting the number of branches of each node in the total knowledge graph, and screening out nodes with the number of branches larger than a first threshold value to form a first node set; randomly screening multiple groups of candidate nodes from a first node set according to the sequence of the number of the nodes from multiple to few, and filtering a patent document which simultaneously contains a certain group of candidate nodes and has the minimum sum of the branch number of the candidate nodes from a database to serve as a core patent corresponding to the current group of candidate nodes; and sequencing the core patents. The method screens out core technical points in the field by constructing the knowledge map, screens out core patents based on the core technical points, and has higher accuracy.

Description

Core patent judgment method and device

Technical Field

The invention relates to the technical field of patent text mining, in particular to a method and a device for judging a core patent.

Background

With the development of the patent industry in China, the number and the quality of patents become important parts for measuring the core competitiveness of enterprises or units, and the core patents refer to patents corresponding to technologies which must be used for manufacturing certain products in a certain technical field and cannot be bypassed by some evasive design means. The core patent of a certain technical field is mined from a large number of patent databases, and the foundation for technical development context, industry technical analysis and technical development planning is established.

Most of the patent mining technologies at the present stage provide functions of patent retrieval, statistical analysis and the like, most of the patent retrieval is performed through keywords, classification numbers or retrieval formulas generated according to screening condition combinations, and the statistical objects of the patent statistical analysis are generally performed from different angles based on patent classification, patentees, years, countries and the like by taking the number of patents as a unit. The patent retrieval and analysis methods cannot effectively extract the core patent from the database, and the manual analysis has the problem of accuracy.

Disclosure of Invention

In view of this, the present invention provides a core patent determination method for extracting a core patent from a target database.

In a first aspect of the present invention, a method for determining a core patent is disclosed, the method comprising:

obtaining an authorized patent database in the same field, extracting technical characteristics of independent claims of each patent document in the database, and constructing a knowledge graph for each patent document according to the technical characteristics;

fusing the knowledge maps of the patent documents to form a total knowledge map of a database;

counting the number of branches of each node in the total knowledge graph, and screening out nodes with the number of branches larger than a first threshold value to form a first node set;

randomly screening multiple groups of candidate nodes from a first node set according to the sequence of the number of the nodes from multiple to few, and filtering a patent document which simultaneously contains a certain group of candidate nodes and has the minimum sum of the branch number of the candidate nodes from a database to serve as a core patent corresponding to the current group of candidate nodes;

and sorting the core patents according to the order of the number of the candidate nodes.

Preferably, the method for acquiring the patent database authorized in the same field includes: analyzing the input retrieval instruction and generating a retrieval formula, and acquiring an authorized patent composition database in the target field according to the retrieval formula.

Preferably, the extracting technical features of the claims of each patent document in the database, and constructing a knowledge graph for each patent document according to the technical features specifically includes:

performing word segmentation and word stop processing on independent claims of each patent document, extracting key words to form key features, and extracting association relations among the key features;

extracting one or more first sub-features further defining the key feature from the independent claim according to the key feature;

establishing a knowledge graph of each patent document by taking the key features as entity nodes and taking first sub-features corresponding to the key features as attribute nodes;

if the first sub-feature further comprises one or more second sub-features which are further defined, taking the second sub-features as attribute nodes of the first sub-features; all technical features of the same independent claim are extracted in this way until all technical features are added to the knowledge-graph.

Preferably, the step of fusing the knowledge maps of the patent documents to form the total knowledge map of the database specifically comprises:

performing reference resolution, entity disambiguation and entity linking on the knowledge graphs of different patent documents, determining the same entity node or attribute node, and combining the same entity node or attribute node to form the total knowledge graph of the database.

Preferably, the number of branches of each node is the number of attribute nodes associated with each node in the knowledge-graph.

Preferably, the step of filtering out, from the database, a patent document that contains a group of candidate nodes at the same time and has the smallest sum of the branch numbers of the candidate nodes as a core patent corresponding to the current group of candidate nodes includes:

determining the total number of nodes N in the first node set, and taking N as N, N-1, …, 2 and 1 respectively;

taking n candidate nodes from the first node set, screening patent documents simultaneously containing the n candidate nodes from a database, and forming a corresponding first document set;

when N is less than N, for each N, performing non-repeated sampling from the first node set, and extracting N candidate nodes each time until all nodes in the first node set are covered;

and taking a patent document with the least sum of the branch number of the candidate nodes from the first document set as a core patent corresponding to the current group of candidate nodes.

Preferably, in the process of sorting the core patents according to the order of the number of the candidate nodes from top to bottom, for the core patents with the same number of the candidate nodes, the core patents are sorted according to the sum of the branch numbers of the candidate nodes from bottom to top.

In a second aspect of the present invention, a core patent determination apparatus is disclosed, the apparatus comprising:

the map building module: obtaining an authorized patent database in the same field, extracting technical characteristics of independent claims of each patent document in the database, and constructing a knowledge graph for each patent document according to the technical characteristics;

a map fusion module: fusing the knowledge maps of the patent documents to form a total knowledge map of a database;

a node screening module: counting the number of branches of each node in the total knowledge graph, and screening out nodes with the number of branches larger than a first threshold value to form a first node set;

the document filtering module: randomly screening multiple groups of candidate nodes from a first node set according to the sequence of the number of the nodes from multiple to few, and filtering a patent document which simultaneously contains a certain group of candidate nodes and has the minimum sum of the branch number of the candidate nodes from a database to serve as a core patent corresponding to the current group of candidate nodes;

a document ranking module: and sorting the core patents according to the order of the number of the candidate nodes.

Compared with the prior art, the invention has the following beneficial effects:

1) the invention extracts the technical characteristics of the independent patent claim, constructs the knowledge graph for each patent document according to the technical characteristics, each node in the knowledge graph represents one technical characteristic, fuses the knowledge graphs of the patent documents to form the total knowledge graph of the database, and can clearly and intuitively reflect the technical points and the incidence relation of the technical points in the field; counting the number of branches of each node in the total knowledge graph, wherein the more the number of branches of the node is, the more improved technology surrounding the technical feature is, the more possible the technical feature is the core technical point in the field, and the accurate positioning of the core technical point in the field can be realized.

2) The method filters the patent documents simultaneously containing a plurality of core technical points from the database, extracts one patent document with the least total number of branches as a corresponding core patent, takes the patent simultaneously containing a plurality of core technical points and with the least number of branches of the core technical points as the core patent, and screens the core patents based on the core technical points, so that the accuracy is high.

3) When a plurality of core patents exist, the core patents are sorted according to the sequence of the number of the core technology points contained at the same time, and corresponding core patents are screened out aiming at different core technology point combinations, so that reliable bases are provided for the technical research in different directions and the research in the development direction of the technical branch in the field.

4) The core patents are finally judged through multiple screening and multiple sorting, and the accuracy is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flow chart of a core patent determination method of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

As shown in fig. 1, the present invention provides a method for determining a core patent, the method comprising:

s1, obtaining an authorized patent database in the same field, extracting technical features of independent claims of each patent document in the database, and constructing a knowledge graph for each patent document according to the technical features;

specifically, an input retrieval instruction is analyzed and a retrieval formula is generated, an authorized patent composition database in the same field is obtained from a patent database according to the retrieval formula, and for example, an authorized patent composition target database in the last 10 years of a certain field is retrieved in a keyword combination mode.

Extracting independent claims of each patent document from the target database, performing semantic analysis on the independent claims respectively, including word segmentation, stop word removal and other processing, extracting key features and extracting association relations among the key features; particularly, keywords can be extracted through TF-IDF as key features,

specifically, the independent claims of most issued patents are composed of a plurality of technical features, each technical feature may further include further defined sub-features, for example, the technical features of each independent patent claim include A, B and C, wherein a further includes two sub-features, namely a1 and a2, and the invention extracts all the technical features and the association relationship between them through semantic analysis. When the sub-features which are further defined are extracted through semantic analysis, the sub-features which are further defined for the key features of the trial run can be analyzed through positioning the key features and/or setting specific words such as 'including', 'dividing', and the like.

The invention extracts all technical features of independent claims of each patent document and the incidence relation among the technical features through semantic analysis, and constructs the knowledge graph by taking the technical features as nodes.

S2, fusing the knowledge maps of the patent documents to form a total knowledge map of the database;

S3, counting the number of branches of each node in the total knowledge graph, and screening out nodes with the number of branches larger than a first threshold value to form a first node set;

in the total knowledge graph, each node represents a technical feature, the branch number of each node in the total knowledge graph is counted, the branch number of each node is the number of attribute nodes associated with each node in the knowledge graph, the more the branch number of the node is, the more improved technology surrounding the technical feature is, the more possible the technical feature is the core technical point in the field, the nodes with the branch number larger than a first threshold value are screened out to form a first node set, and the first node set is the core technical point set in the field.

S4, randomly screening multiple groups of candidate nodes from the first node set according to the sequence of the number of the nodes from multiple to few, and filtering a patent document which simultaneously contains a certain group of candidate nodes and has the least total number of the selected node branches from a database to serve as a core patent corresponding to the current group of candidate nodes;

firstly, determining the total number N of nodes in a first node set, and respectively taking N as N, N-1, …, 2 and 1;

when N < N, for each N, non-oversampling from the first set of nodes is performed, each time extracting N candidate nodes until all nodes in the first set of nodes are covered.

Extracting a patent document with the least sum of the branch number of the candidate nodes from the first document set as a core patent corresponding to the current group of candidate nodes;

specifically, if the node with the maximum number of 3 node branches is screened out from the knowledge graph, the 3 nodes are supposed to be 3 core technical points in the field, and a patent containing the 3 core technical points is retrieved from the target database. If a patent includes 3 core technology points at the same time, and the total number of branches of the 3 core technology points is the least, then the patent is the core patent. If the patent containing the 3 core technology points at the same time is not retrieved from the target database, the number of the core technology points is reduced, 2 core technology points are selected from the 3 core technology points, the patent containing the 2 core technology points at the same time is retrieved from the target database, and the patent with the least sum of the branch number is further selected to be the core patent, and so on.

S5, sorting the core patents corresponding to each group of candidate nodes according to the sequence of the number of the candidate nodes. For the core patents with the same number of candidate nodes, the order is changed from few to many according to the sum of the branch numbers of the candidate nodes.

Since different core technology point combinations may correspond to different technical branches in the field, more than one core patent in the field may exist, and thus each group of candidate nodes has a corresponding core patent. And sequencing the core patents corresponding to each group of candidate nodes according to the sequence of the number of the candidate nodes. For the core patents with the same number of candidate nodes, the order is changed from few to many according to the sum of the branch numbers of the candidate nodes. The core patents are finally judged through multiple screening and multiple sorting, so that the final sorting of the core patents is obtained, the accuracy is improved, the more important the patents are, and reliable bases are provided for technical researches in different directions and technical branch researches in the field.

Corresponding to the embodiment of the method, the invention also discloses a core patent judgment device, which comprises the following steps:

the document filtering module: randomly screening multiple groups of candidate nodes from a first node set according to the sequence of the number of the nodes from multiple to few, and filtering patent documents which simultaneously contain a certain group of candidate nodes from a database to form a corresponding first document set; extracting a patent document with the least sum of the branch number of the candidate nodes from the first document set as a core patent corresponding to the current group of candidate nodes;

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A core patent judgment method is characterized by comprising the following steps:

counting the number of branches of each node in the total knowledge graph, and screening out nodes with the number of branches larger than a first threshold value to form a core technology point set;

and sequencing the core patents corresponding to each group of candidate nodes according to the sequence of the number of the candidate nodes.

2. The method for judging core patent according to claim 1, wherein the method for obtaining the authorized patent database in the same field comprises: analyzing the input retrieval instruction and generating a retrieval formula, and acquiring an authorized patent composition database in the target field according to the retrieval formula.

3. The method for judging core patents according to claim 1, wherein the extracting technical features of the claims of each patent document in the database, and constructing a knowledge graph for each patent document according to the technical features specifically comprises:

4. The method for judging core patents according to claim 3, wherein the fusing the knowledge maps of the patent documents to form the total knowledge map of the database specifically comprises:

5. The method for judging core patents of claim 1, wherein each node in the knowledge-graph represents a technical feature, and the branch number of each node is the number of attribute nodes associated with each node in the knowledge-graph.

6. The method for determining core patents according to claim 3, wherein the step of filtering out from the database a patent document that contains a group of candidate nodes and has the least total number of branches of the candidate nodes as the core patent corresponding to the current group of candidate nodes comprises:

7. The method for judging core patents according to claim 6, wherein in the process of sorting the core patents according to the sequence of the number of the candidate nodes from the largest to the smallest, for the core patents with the same number of the candidate nodes, the core patents are sorted according to the total number of the branches of the candidate nodes from the smallest to the largest.

8. A core patent determination apparatus, comprising:

the map building module: the system is used for acquiring an authorized patent database in the same field, extracting technical characteristics of independent claims of each patent document in the database, and constructing a knowledge graph for each patent document according to the technical characteristics;

a map fusion module: the knowledge maps of the patent documents are fused to form a total knowledge map of the database;

a node screening module: the node selection method comprises the steps of counting the number of branches of each node in the total knowledge graph, and screening out nodes with the number of branches larger than a first threshold value to form a first node set;

the document filtering module: the system comprises a database, a first node set and a second node set, wherein the first node set is used for randomly screening a plurality of groups of candidate nodes from the first node set according to the sequence of the number of the nodes from high to low, and one patent document which simultaneously contains a certain group of candidate nodes and has the minimum sum of the branch number of the candidate nodes is filtered from the database and is used as a core patent corresponding to the current group of candidate nodes;

a document ranking module: the core patents corresponding to each group of candidate nodes are sorted according to the sequence of the number of the candidate nodes from top to bottom.