CN112307165A - Core patent judgment method and device - Google Patents

Core patent judgment method and device Download PDF

Info

Publication number
CN112307165A
CN112307165A CN202011178049.8A CN202011178049A CN112307165A CN 112307165 A CN112307165 A CN 112307165A CN 202011178049 A CN202011178049 A CN 202011178049A CN 112307165 A CN112307165 A CN 112307165A
Authority
CN
China
Prior art keywords
core
nodes
candidate nodes
database
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202011178049.8A
Other languages
Chinese (zh)
Inventor
程艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Chanlue Technology Co ltd
Original Assignee
Wuhan Chanlue Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Chanlue Technology Co ltd filed Critical Wuhan Chanlue Technology Co ltd
Priority to CN202011178049.8A priority Critical patent/CN112307165A/en
Publication of CN112307165A publication Critical patent/CN112307165A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3335Syntactic pre-processing, e.g. stopword elimination, stemming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services; Handling legal documents
    • G06Q50/184Intellectual property management

Abstract

The invention discloses a method and a device for judging a core patent, wherein the method comprises the following steps: obtaining an authorized patent database in the same field, extracting technical characteristics of independent claims of each patent document in the database, and constructing a knowledge graph for each patent document according to the technical characteristics; fusing the knowledge maps of the patent documents to form a total knowledge map of a database; counting the number of branches of each node in the total knowledge graph, and screening out nodes with the number of branches larger than a first threshold value to form a first node set; randomly screening multiple groups of candidate nodes from a first node set according to the sequence of the number of the nodes from multiple to few, and filtering a patent document which simultaneously contains a certain group of candidate nodes and has the minimum sum of the branch number of the candidate nodes from a database to serve as a core patent corresponding to the current group of candidate nodes; and sequencing the core patents. The method screens out core technical points in the field by constructing the knowledge map, screens out core patents based on the core technical points, and has higher accuracy.

Description

Core patent judgment method and device
Technical Field
The invention relates to the technical field of patent text mining, in particular to a method and a device for judging a core patent.
Background
With the development of the patent industry in China, the number and the quality of patents become important parts for measuring the core competitiveness of enterprises or units, and the core patents refer to patents corresponding to technologies which must be used for manufacturing certain products in a certain technical field and cannot be bypassed by some evasive design means. The core patent of a certain technical field is mined from a large number of patent databases, and the foundation for technical development context, industry technical analysis and technical development planning is established.
Most of the patent mining technologies at the present stage provide functions of patent retrieval, statistical analysis and the like, most of the patent retrieval is performed through keywords, classification numbers or retrieval formulas generated according to screening condition combinations, and the statistical objects of the patent statistical analysis are generally performed from different angles based on patent classification, patentees, years, countries and the like by taking the number of patents as a unit. The patent retrieval and analysis methods cannot effectively extract the core patent from the database, and the manual analysis has the problem of accuracy.
Disclosure of Invention
In view of this, the present invention provides a core patent determination method for extracting a core patent from a target database.
In a first aspect of the present invention, a method for determining a core patent is disclosed, the method comprising:
obtaining an authorized patent database in the same field, extracting technical characteristics of independent claims of each patent document in the database, and constructing a knowledge graph for each patent document according to the technical characteristics;
fusing the knowledge maps of the patent documents to form a total knowledge map of a database;
counting the number of branches of each node in the total knowledge graph, and screening out nodes with the number of branches larger than a first threshold value to form a first node set;
randomly screening multiple groups of candidate nodes from a first node set according to the sequence of the number of the nodes from multiple to few, and filtering a patent document which simultaneously contains a certain group of candidate nodes and has the minimum sum of the branch number of the candidate nodes from a database to serve as a core patent corresponding to the current group of candidate nodes;
and sorting the core patents according to the order of the number of the candidate nodes.
Preferably, the method for acquiring the patent database authorized in the same field includes: analyzing the input retrieval instruction and generating a retrieval formula, and acquiring an authorized patent composition database in the target field according to the retrieval formula.
Preferably, the extracting technical features of the claims of each patent document in the database, and constructing a knowledge graph for each patent document according to the technical features specifically includes:
performing word segmentation and word stop processing on independent claims of each patent document, extracting key words to form key features, and extracting association relations among the key features;
extracting one or more first sub-features further defining the key feature from the independent claim according to the key feature;
establishing a knowledge graph of each patent document by taking the key features as entity nodes and taking first sub-features corresponding to the key features as attribute nodes;
if the first sub-feature further comprises one or more second sub-features which are further defined, taking the second sub-features as attribute nodes of the first sub-features; all technical features of the same independent claim are extracted in this way until all technical features are added to the knowledge-graph.
Preferably, the step of fusing the knowledge maps of the patent documents to form the total knowledge map of the database specifically comprises:
performing reference resolution, entity disambiguation and entity linking on the knowledge graphs of different patent documents, determining the same entity node or attribute node, and combining the same entity node or attribute node to form the total knowledge graph of the database.
Preferably, the number of branches of each node is the number of attribute nodes associated with each node in the knowledge-graph.
Preferably, the step of filtering out, from the database, a patent document that contains a group of candidate nodes at the same time and has the smallest sum of the branch numbers of the candidate nodes as a core patent corresponding to the current group of candidate nodes includes:
determining the total number of nodes N in the first node set, and taking N as N, N-1, …, 2 and 1 respectively;
taking n candidate nodes from the first node set, screening patent documents simultaneously containing the n candidate nodes from a database, and forming a corresponding first document set;
when N is less than N, for each N, performing non-repeated sampling from the first node set, and extracting N candidate nodes each time until all nodes in the first node set are covered;
and taking a patent document with the least sum of the branch number of the candidate nodes from the first document set as a core patent corresponding to the current group of candidate nodes.
Preferably, in the process of sorting the core patents according to the order of the number of the candidate nodes from top to bottom, for the core patents with the same number of the candidate nodes, the core patents are sorted according to the sum of the branch numbers of the candidate nodes from bottom to top.
In a second aspect of the present invention, a core patent determination apparatus is disclosed, the apparatus comprising:
the map building module: obtaining an authorized patent database in the same field, extracting technical characteristics of independent claims of each patent document in the database, and constructing a knowledge graph for each patent document according to the technical characteristics;
a map fusion module: fusing the knowledge maps of the patent documents to form a total knowledge map of a database;
a node screening module: counting the number of branches of each node in the total knowledge graph, and screening out nodes with the number of branches larger than a first threshold value to form a first node set;
the document filtering module: randomly screening multiple groups of candidate nodes from a first node set according to the sequence of the number of the nodes from multiple to few, and filtering a patent document which simultaneously contains a certain group of candidate nodes and has the minimum sum of the branch number of the candidate nodes from a database to serve as a core patent corresponding to the current group of candidate nodes;
a document ranking module: and sorting the core patents according to the order of the number of the candidate nodes.
Compared with the prior art, the invention has the following beneficial effects:
1) the invention extracts the technical characteristics of the independent patent claim, constructs the knowledge graph for each patent document according to the technical characteristics, each node in the knowledge graph represents one technical characteristic, fuses the knowledge graphs of the patent documents to form the total knowledge graph of the database, and can clearly and intuitively reflect the technical points and the incidence relation of the technical points in the field; counting the number of branches of each node in the total knowledge graph, wherein the more the number of branches of the node is, the more improved technology surrounding the technical feature is, the more possible the technical feature is the core technical point in the field, and the accurate positioning of the core technical point in the field can be realized.
2) The method filters the patent documents simultaneously containing a plurality of core technical points from the database, extracts one patent document with the least total number of branches as a corresponding core patent, takes the patent simultaneously containing a plurality of core technical points and with the least number of branches of the core technical points as the core patent, and screens the core patents based on the core technical points, so that the accuracy is high.
3) When a plurality of core patents exist, the core patents are sorted according to the sequence of the number of the core technology points contained at the same time, and corresponding core patents are screened out aiming at different core technology point combinations, so that reliable bases are provided for the technical research in different directions and the research in the development direction of the technical branch in the field.
4) The core patents are finally judged through multiple screening and multiple sorting, and the accuracy is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flow chart of a core patent determination method of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
As shown in fig. 1, the present invention provides a method for determining a core patent, the method comprising:
s1, obtaining an authorized patent database in the same field, extracting technical features of independent claims of each patent document in the database, and constructing a knowledge graph for each patent document according to the technical features;
specifically, an input retrieval instruction is analyzed and a retrieval formula is generated, an authorized patent composition database in the same field is obtained from a patent database according to the retrieval formula, and for example, an authorized patent composition target database in the last 10 years of a certain field is retrieved in a keyword combination mode.
Extracting independent claims of each patent document from the target database, performing semantic analysis on the independent claims respectively, including word segmentation, stop word removal and other processing, extracting key features and extracting association relations among the key features; particularly, keywords can be extracted through TF-IDF as key features,
extracting one or more first sub-features further defining the key feature from the independent claim according to the key feature;
specifically, the independent claims of most issued patents are composed of a plurality of technical features, each technical feature may further include further defined sub-features, for example, the technical features of each independent patent claim include A, B and C, wherein a further includes two sub-features, namely a1 and a2, and the invention extracts all the technical features and the association relationship between them through semantic analysis. When the sub-features which are further defined are extracted through semantic analysis, the sub-features which are further defined for the key features of the trial run can be analyzed through positioning the key features and/or setting specific words such as 'including', 'dividing', and the like.
Establishing a knowledge graph of each patent document by taking the key features as entity nodes and taking first sub-features corresponding to the key features as attribute nodes;
if the first sub-feature further comprises one or more second sub-features which are further defined, taking the second sub-features as attribute nodes of the first sub-features; all technical features of the same independent claim are extracted in this way until all technical features are added to the knowledge-graph.
The invention extracts all technical features of independent claims of each patent document and the incidence relation among the technical features through semantic analysis, and constructs the knowledge graph by taking the technical features as nodes.
S2, fusing the knowledge maps of the patent documents to form a total knowledge map of the database;
performing reference resolution, entity disambiguation and entity linking on the knowledge graphs of different patent documents, determining the same entity node or attribute node, and combining the same entity node or attribute node to form the total knowledge graph of the database.
S3, counting the number of branches of each node in the total knowledge graph, and screening out nodes with the number of branches larger than a first threshold value to form a first node set;
in the total knowledge graph, each node represents a technical feature, the branch number of each node in the total knowledge graph is counted, the branch number of each node is the number of attribute nodes associated with each node in the knowledge graph, the more the branch number of the node is, the more improved technology surrounding the technical feature is, the more possible the technical feature is the core technical point in the field, the nodes with the branch number larger than a first threshold value are screened out to form a first node set, and the first node set is the core technical point set in the field.
S4, randomly screening multiple groups of candidate nodes from the first node set according to the sequence of the number of the nodes from multiple to few, and filtering a patent document which simultaneously contains a certain group of candidate nodes and has the least total number of the selected node branches from a database to serve as a core patent corresponding to the current group of candidate nodes;
firstly, determining the total number N of nodes in a first node set, and respectively taking N as N, N-1, …, 2 and 1;
taking n candidate nodes from the first node set, screening patent documents simultaneously containing the n candidate nodes from a database, and forming a corresponding first document set;
when N < N, for each N, non-oversampling from the first set of nodes is performed, each time extracting N candidate nodes until all nodes in the first set of nodes are covered.
Extracting a patent document with the least sum of the branch number of the candidate nodes from the first document set as a core patent corresponding to the current group of candidate nodes;
specifically, if the node with the maximum number of 3 node branches is screened out from the knowledge graph, the 3 nodes are supposed to be 3 core technical points in the field, and a patent containing the 3 core technical points is retrieved from the target database. If a patent includes 3 core technology points at the same time, and the total number of branches of the 3 core technology points is the least, then the patent is the core patent. If the patent containing the 3 core technology points at the same time is not retrieved from the target database, the number of the core technology points is reduced, 2 core technology points are selected from the 3 core technology points, the patent containing the 2 core technology points at the same time is retrieved from the target database, and the patent with the least sum of the branch number is further selected to be the core patent, and so on.
S5, sorting the core patents corresponding to each group of candidate nodes according to the sequence of the number of the candidate nodes. For the core patents with the same number of candidate nodes, the order is changed from few to many according to the sum of the branch numbers of the candidate nodes.
Since different core technology point combinations may correspond to different technical branches in the field, more than one core patent in the field may exist, and thus each group of candidate nodes has a corresponding core patent. And sequencing the core patents corresponding to each group of candidate nodes according to the sequence of the number of the candidate nodes. For the core patents with the same number of candidate nodes, the order is changed from few to many according to the sum of the branch numbers of the candidate nodes. The core patents are finally judged through multiple screening and multiple sorting, so that the final sorting of the core patents is obtained, the accuracy is improved, the more important the patents are, and reliable bases are provided for technical researches in different directions and technical branch researches in the field.
Corresponding to the embodiment of the method, the invention also discloses a core patent judgment device, which comprises the following steps:
the map building module: obtaining an authorized patent database in the same field, extracting technical characteristics of independent claims of each patent document in the database, and constructing a knowledge graph for each patent document according to the technical characteristics;
a map fusion module: fusing the knowledge maps of the patent documents to form a total knowledge map of a database;
a node screening module: counting the number of branches of each node in the total knowledge graph, and screening out nodes with the number of branches larger than a first threshold value to form a first node set;
the document filtering module: randomly screening multiple groups of candidate nodes from a first node set according to the sequence of the number of the nodes from multiple to few, and filtering patent documents which simultaneously contain a certain group of candidate nodes from a database to form a corresponding first document set; extracting a patent document with the least sum of the branch number of the candidate nodes from the first document set as a core patent corresponding to the current group of candidate nodes;
a document ranking module: and sorting the core patents according to the order of the number of the candidate nodes.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (8)

1. A core patent judgment method is characterized by comprising the following steps:
obtaining an authorized patent database in the same field, extracting technical characteristics of independent claims of each patent document in the database, and constructing a knowledge graph for each patent document according to the technical characteristics;
fusing the knowledge maps of the patent documents to form a total knowledge map of a database;
counting the number of branches of each node in the total knowledge graph, and screening out nodes with the number of branches larger than a first threshold value to form a core technology point set;
randomly screening multiple groups of candidate nodes from a first node set according to the sequence of the number of the nodes from multiple to few, and filtering a patent document which simultaneously contains a certain group of candidate nodes and has the minimum sum of the branch number of the candidate nodes from a database to serve as a core patent corresponding to the current group of candidate nodes;
and sequencing the core patents corresponding to each group of candidate nodes according to the sequence of the number of the candidate nodes.
2. The method for judging core patent according to claim 1, wherein the method for obtaining the authorized patent database in the same field comprises: analyzing the input retrieval instruction and generating a retrieval formula, and acquiring an authorized patent composition database in the target field according to the retrieval formula.
3. The method for judging core patents according to claim 1, wherein the extracting technical features of the claims of each patent document in the database, and constructing a knowledge graph for each patent document according to the technical features specifically comprises:
performing word segmentation and word stop processing on independent claims of each patent document, extracting key words to form key features, and extracting association relations among the key features;
extracting one or more first sub-features further defining the key feature from the independent claim according to the key feature;
establishing a knowledge graph of each patent document by taking the key features as entity nodes and taking first sub-features corresponding to the key features as attribute nodes;
if the first sub-feature further comprises one or more second sub-features which are further defined, taking the second sub-features as attribute nodes of the first sub-features; all technical features of the same independent claim are extracted in this way until all technical features are added to the knowledge-graph.
4. The method for judging core patents according to claim 3, wherein the fusing the knowledge maps of the patent documents to form the total knowledge map of the database specifically comprises:
performing reference resolution, entity disambiguation and entity linking on the knowledge graphs of different patent documents, determining the same entity node or attribute node, and combining the same entity node or attribute node to form the total knowledge graph of the database.
5. The method for judging core patents of claim 1, wherein each node in the knowledge-graph represents a technical feature, and the branch number of each node is the number of attribute nodes associated with each node in the knowledge-graph.
6. The method for determining core patents according to claim 3, wherein the step of filtering out from the database a patent document that contains a group of candidate nodes and has the least total number of branches of the candidate nodes as the core patent corresponding to the current group of candidate nodes comprises:
determining the total number of nodes N in the first node set, and taking N as N, N-1, …, 2 and 1 respectively;
taking n candidate nodes from the first node set, screening patent documents simultaneously containing the n candidate nodes from a database, and forming a corresponding first document set;
when N is less than N, for each N, performing non-repeated sampling from the first node set, and extracting N candidate nodes each time until all nodes in the first node set are covered;
and taking a patent document with the least sum of the branch number of the candidate nodes from the first document set as a core patent corresponding to the current group of candidate nodes.
7. The method for judging core patents according to claim 6, wherein in the process of sorting the core patents according to the sequence of the number of the candidate nodes from the largest to the smallest, for the core patents with the same number of the candidate nodes, the core patents are sorted according to the total number of the branches of the candidate nodes from the smallest to the largest.
8. A core patent determination apparatus, comprising:
the map building module: the system is used for acquiring an authorized patent database in the same field, extracting technical characteristics of independent claims of each patent document in the database, and constructing a knowledge graph for each patent document according to the technical characteristics;
a map fusion module: the knowledge maps of the patent documents are fused to form a total knowledge map of the database;
a node screening module: the node selection method comprises the steps of counting the number of branches of each node in the total knowledge graph, and screening out nodes with the number of branches larger than a first threshold value to form a first node set;
the document filtering module: the system comprises a database, a first node set and a second node set, wherein the first node set is used for randomly screening a plurality of groups of candidate nodes from the first node set according to the sequence of the number of the nodes from high to low, and one patent document which simultaneously contains a certain group of candidate nodes and has the minimum sum of the branch number of the candidate nodes is filtered from the database and is used as a core patent corresponding to the current group of candidate nodes;
a document ranking module: the core patents corresponding to each group of candidate nodes are sorted according to the sequence of the number of the candidate nodes from top to bottom.
CN202011178049.8A 2020-10-29 2020-10-29 Core patent judgment method and device Withdrawn CN112307165A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011178049.8A CN112307165A (en) 2020-10-29 2020-10-29 Core patent judgment method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011178049.8A CN112307165A (en) 2020-10-29 2020-10-29 Core patent judgment method and device

Publications (1)

Publication Number Publication Date
CN112307165A true CN112307165A (en) 2021-02-02

Family

ID=74330595

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011178049.8A Withdrawn CN112307165A (en) 2020-10-29 2020-10-29 Core patent judgment method and device

Country Status (1)

Country Link
CN (1) CN112307165A (en)

Similar Documents

Publication Publication Date Title
KR101276602B1 (en) System and method for searching and matching data having ideogrammatic content
CN106528532B (en) Text error correction method, device and terminal
CN104809108B (en) Information monitoring analysis system
CN109739986A (en) A kind of complaint short text classification method based on Deep integrating study
CN100485684C (en) Text content filtering method and system
US20030097375A1 (en) System for information discovery
CN105843850B (en) Search optimization method and device
WO2002025479A1 (en) A document categorisation system
CN106156083A (en) A kind of domain knowledge processing method and processing device
CN102141978A (en) Method and system for classifying texts
CN104317784A (en) Cross-platform user identification method and cross-platform user identification system
CN106897290B (en) Method and device for establishing keyword model
CN106528527A (en) Identification method and identification system for out of vocabularies
CN111680506A (en) External key mapping method and device of database table, electronic equipment and storage medium
KR101803150B1 (en) Important precedents extraction and sorting method using Big Data
CN107341142B (en) Enterprise relation calculation method and system based on keyword extraction and analysis
CN104462552A (en) Question and answer page core word extracting method and device
CN110618980A (en) System and method based on legal text accurate matching and contradiction detection
CN112307165A (en) Core patent judgment method and device
CN115774769A (en) Sensitive word checking processing method and device
CN115691702A (en) Compound visual classification method and system
CN110781211B (en) Data analysis method and device
CN109739950B (en) Method and device for screening applicable legal provision
CN113535895A (en) Search text processing method and device, electronic equipment and medium
KR102081867B1 (en) Method for building inverted index, method and apparatus searching similar data using inverted index

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20210202