CN114357086A

CN114357086A - Patent IPC classification number recommendation method and device based on knowledge graph

Info

Publication number: CN114357086A
Application number: CN202111009919.3A
Authority: CN
Inventors: 石振锋; 王嘉瑜; 孙赟星
Original assignee: Heilongjiang Yangguang Huiyuan Information Technology Co ltd
Current assignee: Heilongjiang Yangguang Huiyuan Information Technology Co ltd
Priority date: 2021-08-31
Filing date: 2021-08-31
Publication date: 2022-04-15

Abstract

A patent IPC classification number recommendation method and device based on a knowledge graph relates to the field of data analysis and aims to solve the problems that an existing method for determining the technical field of patents depends on manual analysis, time consumption is long, efficiency is low, and requirements of enterprises and users cannot be met. The method comprises the following steps: constructing a patent knowledge graph, and performing vectorization representation on entities in the graph by using a TransE model to obtain vectorization representation of the invention name; calculating the similarity between the query patent and each patent in the database by using vectorization expression of the name of the invention, and taking the M patents with the highest similarity with the query patent as recommended similar patents; the N IPC codes with the highest occurrence frequency in similar patents are taken as recommended IPC codes. The device comprises a patent knowledge map construction module, an entity vectorization module, a similarity calculation module and an IPC classification number recommendation module.

Description

Patent IPC classification number recommendation method and device based on knowledge graph

Technical Field

The application relates to the field of data analysis, in particular to a prediction technology in the technical field of patents.

Background

In the face of such huge amount of patent data, how to effectively acquire information in different fields from the data, how to accurately grasp the scientific and technical development conditions in the fields to which different industries belong at present, and how to grasp the more advanced technology of the industry become urgent needs of enterprises. With the increasingly fierce world science and technology competition, various analyses aiming at patents gradually become popular fields.

In the patent application process, the technical fields to which the patents belong need to be divided according to the basic information of the patents, which is a complicated and tedious work, and how to effectively realize the recommendation in the technical field of the patents becomes a work worth researching by enterprises or users.

Generally, the technical field of patent determination is mainly to determine the technical scope by manually analyzing the information in the patent text, comparing with the prior art, and further under the guidance of professional technicians. However, as patent data rapidly grows, manual analysis takes longer and higher cost, and sometimes the requirements of enterprises and users are difficult to meet. Therefore, how to determine the technical field to which the patent belongs efficiently and accurately becomes the research direction.

Disclosure of Invention

The patent IPC classification number recommendation method and device based on the knowledge graph are provided for solving the problems that the existing method for determining the technical field of patents depends on manual analysis, consumes long time, is low in efficiency and cannot meet the requirements of enterprises and users.

The patent IPC classification number recommendation method based on the knowledge graph comprises the following steps:

constructing a patent knowledge graph, wherein the patent knowledge graph comprises entities of a query patent and a plurality of patents having the same technical field as the query patent and the relationship among the entities, and the entities comprise applicants, inventors, IPC classification numbers, invention names and keywords;

vectorizing and expressing the entities in the patent knowledge graph by using a TransE model to obtain vectorized expression of the invention name of each patent in the patent knowledge graph;

calculating the similarity between the query patent and each patent in a database by using vectorization expression of the name of the invention, and taking the M patents with the highest similarity with the query patent as recommended similar patents;

and counting the occurrence times of the IPC codes of all the recommended similar patents, and taking the N IPC codes with the highest occurrence times as the recommended IPC codes.

Optionally, the constructing the patent knowledge graph includes:

searching a plurality of patents in the same technical field as the query patent from a patent retrieval database, and merging the plurality of patents and the query patent into a patent field database;

extracting the applicant, inventor, IPC classification number, invention name and keyword of each patent in the patent domain database as an entity;

and storing the entities of each patent and the relationship among the entities into a Neo4j database to form a patent knowledge graph.

Optionally, the similarity is expressed as: and calculating the Euclidean distance between the inquired patent and each patent in the patent knowledge map by using vectorization expression of the invention name.

Alternatively, M ≧ 10.

Optionally, N has a value of 3.

The patent IPC classification number recommendation device based on the knowledge graph comprises:

a patent knowledge graph construction module configured to construct a patent knowledge graph containing entities of a query patent and several patents having the same technical field as the query patent and relationships between the entities, the entities including an applicant, an inventor, an IPC classification number, an invention name, and keywords;

the entity vectorization module is configured to perform vectorization representation on the entity in the patent knowledge graph by using a TransE model to obtain vectorization representation of the invention name of each patent in the patent knowledge graph;

a similarity calculation module configured to calculate similarities between the query patent and patents in a database using vectorized representation of the title, and take the M patents with the highest similarity to the query patent as recommended similar patents; and

and the IPC classification number recommending module is configured to count the occurrence times of the IPC classification numbers of all the recommended similar patents, and take the N IPC classification numbers with the highest occurrence times as the recommended IPC classification numbers.

Optionally, the patent knowledge graph building module includes:

a patent domain database construction sub-module configured to retrieve a number of patents having the same technical field as the query patent from a patent retrieval database, and merge the number of patents and the query patent into a patent domain database;

an entity extraction sub-module configured to extract an applicant, an inventor, an IPC classification number, an invention name, and a keyword of each patent in the patent domain database as an entity; and

a patent knowledge map construction sub-module configured to save the entities of each patent and the relationships between the entities into a Neo4j database to form a patent knowledge map.

Optionally, the similarity is expressed as: and calculating the Euclidean distance between the query patent and each patent in the patent knowledge graph by using vectorization expression of the invention name.

Alternatively, M ≧ 10.

Optionally, N has a value of 3.

The patent IPC classification number recommendation method and device based on the knowledge map relate the applicant, the inventor, the IPC classification number, the invention name and the key word of a patent by constructing the patent knowledge map, then, the entities are vectorized and expressed by using a TransE model to obtain the vectorized expression of the invention name, and the vectorized expression of the invention name comprises the relationship among the entities, so that the similarity of two patents can be more accurately reflected by calculating the Euclidean distance between the two patents as the similarity by using the vectorized expression of the invention name, and a plurality of patents with the highest similarity to the inquired patents are recommended, and the IPC classification number with the largest occurrence frequency is selected as the recommended IPC classification number, and the accuracy of the method and the device is far higher than that of the conventional content-based patent recommendation algorithm.

Drawings

FIG. 1 is a schematic flowchart of a patent IPC classification number recommendation method based on knowledge-graph according to a first embodiment of the present application;

FIG. 2 is a patent knowledge graph used in a method for recommending patent IPC classification numbers based on knowledge graphs according to a first embodiment of the present application;

FIG. 3 is a flow chart of negative sampling in the first embodiment of the present application;

FIG. 4 is a graph illustrating the comparison of the prediction accuracy of two methods according to one embodiment of the present application;

fig. 5 is a schematic structural diagram of a patent IPC classification number recommendation device based on a knowledge graph according to the second embodiment of the present application.

Detailed Description

The first embodiment is as follows: in this embodiment, the technical field to which the patent belongs is represented by an IPC classification number. As shown in fig. 1, the method for recommending patent IPC classification numbers based on knowledge-graphs according to this embodiment may generally include the following steps S1 to S4.

Step S1, constructing a patent knowledge map

For a query patent, the technical field to which the patent belongs needs to be determined, and the technical field refers to the technical field which can be determined directly, and generally covers a large range, such as physics, chemistry, biology and the like, and can also be the field after the technical field is subdivided, such as optics, mechanics, electromagnetism and the like in the physical technical field. After the technical field to which the patent belongs is determined, the patent belonging to the technical field is searched in the patent search database, and a plurality of patents are selected from the search result.

Combining the inquired patent and a plurality of selected patents into a patent domain database, and extracting the applicant, inventor, IPC classification number, invention name and keyword of each patent in the patent domain database as an entity.

There may be several IPC codes in a patent, the IPC code as an entity may be a main code of a patent or all the IPC codes of a patent, and when all the IPC codes of a patent are selected as entities, the accuracy of the recommendation of inquiring the IPC codes of the patent is higher.

A patent may have several inventor, and the inventor data needs to be simply processed into one-to-one data for use.

The main purpose of this embodiment is to implement recommendation in the patent technology field, and therefore, keywords extracted from the patent invention names and abstracts have an important role in recommendation in the patent technology field, so that keyword information in the patent invention names and abstracts is to be fully extracted. In the embodiment, the advantages of the IT-IDF algorithm and the Textrank algorithm are combined, the two algorithms are adopted to respectively extract the key words 10 th before each patent rank, then the weight of the extracted key words is weighted and averaged, and the word segmentation result corresponding to 5 th before the weight rank is used as the key words of the patent. For example, taking patents with publication numbers CN102058606B and CN102151264B as examples, the TF-IDF algorithm is used to extract the top 10 keywords, and the results are shown in table 1; the results of extracting keywords before ranking by using the TextRank algorithm are shown in table 2, and the keywords before ranking by weight 5 are obtained by performing weighted fusion on the weights of the keywords obtained by the TF-IDF algorithm and the TextRank algorithm, as shown in table 3.

TABLE 1 extraction of top 10 keywords Using TF-IDF Algorithm

TABLE 2 keyword top 10 extracted using Textrank Algorithm

TABLE 3 keywords weighted top 5

After the entity is extracted, the attribute of the entity needs to be defined, where the attribute includes an object attribute and a data attribute, and as shown in table 4, the object attribute describes the relationship between objects, and the data attribute describes the inherent attribute of the entity. Next, the relationship between the entities needs to be defined, the present embodiment defines four relationships of "application", "invention", "technical field", and "inclusion", the application relationship between the applicant and the patent, the invention relationship between the inventor and the patent, the inclusion relationship between the patent and the keyword, and the relationship between the patent and the IPC classification number belong to the technical field.

Table 4 entity attributes

And after the entities of each patent are extracted and the relationship among the entities is defined, the construction of the ontology base in the patent field is completed. Then, the entities of each patent in the patent domain ontology library and the relation data between the entities are stored in the Neo4j database, so that the construction of the patent knowledge graph is completed. Fig. 2 shows a part of a patent knowledge graph, wherein 14 patents are related in the graph, nodes of five colors respectively represent disclosure numbers, keywords, inventors, applicants and IPC classification numbers of the patents, the disclosure numbers in the patent knowledge graph represent one patent, the disclosure numbers can also be replaced by patent invention names or patent numbers, and the relationship among all entities can be visually displayed through the patent knowledge graph.

According to the construction idea of the patent field ontology library, semantic information contained in a patent text is extracted, so that entity and relationship information in a patent knowledge graph can be comprehensively and completely displayed, the obtained patent knowledge graph can quickly and comprehensively retrieve required patent information according to different requirements of users, and therefore the patent knowledge graph constructed based on the Neo4j graph database can visually contain and display entity, relationship and attribute information of patents.

Step S2, entity vectorization representation

Semantic information among patents can be correlated by constructing the patent knowledge graph, but the information in the patent knowledge graph cannot be directly used for recommendation, and in order to further realize patent IPC classification number recommendation, entities in the patent knowledge graph need to be vectorized. The vectorization of the patent knowledge graph is to convert a node (i.e., an entity) and an edge (i.e., a connecting line representing the relationship between two entities) into a vector, and simultaneously retain the original semantic information. In this embodiment, a TransE model is used to perform vectorization representation on the entities in the patent knowledge graph, and the invention name of the patent with the number I is mapped into a d-dimensional vector I_i＝(E_1i,E_2i,...,E_di)^T。

In the training process of the TransE model, an objective function needs to be optimized, in order to train entity and relationship data, a correct triple is needed, and a negative triple is also needed to be introduced. When negative sampling is performed, the entity is usually replaced randomly, which often results in erroneous samples. Based on the problem, the negative sampling algorithm is optimized by the embodiment, so that the final patent IPC classification number recommendation result is more accurate.

All positive triples and original triples in the TransE model are already in the established patent knowledge mapWhen negative sampling faces complex relations of one-to-many and many-to-many, random replacement can cause a plurality of wrong negative samples to be generated, and the training effect of the model is influenced. For example, in the case of one-to-many data, there are triples (h, r, t) and triples (h, r, t '), and if t is replaced with t ', when a negative sample is generated for (h, r, t), erroneous data (h, r, t ') is generated, but if (h, r, t ') exists in the positive triplet set, it cannot be considered that (h, r, t ') is a negative sample. In order to make the sampling process more reasonable, the present embodiment introduces a bernoulli sampling algorithm, which replaces an entity with a certain probability for a triplet other than one-to-one. For each relation in the patent knowledge graph, respectively counting the average value N of the tail entity number corresponding to the head entity under the relation according to the existing triple data information_tpAverage value N of the number of head entities corresponding to tail entities under the relationship_hpThe formula for calculating the probability p of the alternative entity is:

the replacement entity can now be considered to obey a bernoulli distribution with a parameter p. And X is used for representing the replacement entity, the distribution law P of X is as follows:

P{X＝x}＝p^x(1-p)^1-x

where x-0, 1, x-1 represents a replacement head entity and x-0 represents a replacement tail entity.

By improving the negative sampling algorithm, entity data are not replaced randomly any more, and excessive wrong negative samples generated in the negative sampling process can be avoided to a great extent, so that the relatively complex semantic correlation among the original correct triples can be kept, the TransE model is further more practical in the vectorization process, and the improved negative sampling process is shown in FIG. 3.

And vectorizing the patent knowledge graph by using a TransE model to obtain vectorized representation of the name of each patent invention.

Step S3, similarity calculation

In step S2, obtainAfter vectorization representation of each patent invention name is obtained, the Euclidean distance d (I) between the patent invention name entity and other patent invention name entities in the patent field database is calculated and inquired by using the vectorization representation of each patent invention name_i,I_j)：

The Euclidean distance obtained is a number greater than 0, and this data is normalized to (0, 1)]To obtain similarity sim (I)_i,I_i)_KGThe calculation formula is as follows:

according to the formula, the closer the calculated numerical value is to 1, the closer the semantics of the two patent entities are, the higher the similarity is.

After the similarity calculation is completed, all the similarities are arranged in a descending order, and the top M (for example, 10, 20 or 30) patents with the highest similarity to the name entity of the inquired patent are taken as recommended similar specialties.

Step S4, IPC classification number recommendation

And counting the recommended IPC classification numbers of the M similar patents and the occurrence frequency of each IPC classification number in the M similar patents, and taking the N IPC classification numbers with the highest occurrence frequency as the recommended IPC classification numbers.

Taking the invention Patent with the main classification number of A61K9/19, named as 'an esomeprazole sodium freeze-dried preparation for injection and a preparation method thereof', as an example, the Patent is taken as an inquiry Patent, and the existing Content-Based Patent Recommendation algorithm (CB-PR) and the Patent IPC classification number Recommendation method Based on the knowledge map of the embodiment are adopted for Recommendation respectively.

The top 10 ranked patent data similar to the query patent content, obtained using the content-based patent recommendation algorithm, is shown in table 5.

TABLE 5 content-based patent recommendation algorithm to get top-10 similar patents

Ranking	Title:	principal class number
			1	Azacitidine freeze-dried preparation for injection and preparation method thereof	A61K9/19
2	Novel application of TRPML1 specific small molecule inhibitor ML-SI3	A61K31/495
			3	Active oxygen responsive gel storage and preparation method and application thereof	A61K9/06
4	Fludarabine phosphate freeze-drying agent and preparation method thereof	A61K9/19
			5	Lansoprazole freeze-dried preparation for injection and preparation method thereof	A61K9/19
6	Injection of forsythin and forsythiaside and derivatives thereof for children	A61K9/08
			7	Tadalafil enteric-coated tablet and preparation method thereof	A61K9/36
8	Freeze-drying process of bortezomib freeze-dried powder injection for injection	A61K9/19
			9	Azacitidine freeze-dried powder injection and preparation method thereof	A61K9/19
10	Somatostatin freeze-dried powder injection pharmaceutical composition and preparation method thereof	A61K9/19

As can be seen from Table 5, in the top 10 patents similar to the query patent contents, the number of patents with main classification numbers A61K9/19 is only 6, and the number of patents with main classification numbers A61K31/495, A61K9/06, A61K9/08 and A61K9/36 is 1 each.

Similar patent data ranked 10 top (M ═ 10) with the query specificity similarity obtained by the knowledge-graph-based patent IPC classification number recommendation method of this example are shown in table 6.

TABLE 6 similar patents ranked top 10 were obtained by the method of this example

As can be seen from table 6, in the top 10 patents recommended by the knowledge-graph-based patent IPC classification recommendation method of this embodiment, IPC classification a61K9/19 appears 9 times, IPC classification a61K31/56 appears 1 time, and when N is 1, a61K9/19 is taken as the recommended main classification.

As can be seen from the recommended main classification numbers of the top-10 ranked patents, the accuracy of the knowledge-graph-based patent IPC classification number recommendation method of the embodiment is obviously higher than that of the content-based patent recommendation algorithm.

The patent IPC classification number recommendation method based on the knowledge map of this example was used to recommend the main classification number of the invention named "a lyophilized preparation of esomeprazole sodium for injection and its preparation method", where N is 1, and M is 10, 20, 30, 50, and 100, respectively, the recommendation results are shown in table 7.

TABLE 7 IPC classification number recommendation results when N is 1 and M is different

Value of M	IPC classification number recommendation result
		10	A61K9/19
20	A61K9/19
		30	A61K9/19
50	A61K9/19
		100	A61K9/19

100 published patents are selected as inquiry patents to verify the accuracy of the content-based patent recommendation algorithm and the knowledge-graph-based patent IPC classification number recommendation method of the embodiment. The patents with the similarity ranks of top 10, top 20, top 30, top 50 and top 100 are respectively selected as similar patent recommendation results, the IPC classification number is used as the technical field, the recommendation results given by the two methods are compared with the actual technical field, the proportion of correct results is calculated, and the prediction accuracy of the two recommendation methods for the technical field of 100 inquired patents is shown in figure 4. As can be seen from fig. 4, the prediction accuracy of the Patent IPC classification number Recommendation method (KG-PR) Based on Knowledge Graph is 20% higher than that of the content-Based Patent Recommendation algorithm (CB-PR), relatively speaking, the KG-PR algorithm is more practical when performing Recommendation in the Patent technical field, and it is further illustrated that the Patent Knowledge Graph constructed in the embodiment is very effective in implementing Recommendation in the Patent technical field.

The technical field related to one patent is usually more than one, and a plurality of IPC classification numbers are provided for the user to be referred as the prediction result, so that the time spent by the user in determining the technical field of the patent can be saved, the technical field to which each patent belongs can be more accurately determined, and the user can conveniently analyze the patent from multiple aspects. In this embodiment, in combination with actual needs of a user, for each query patent, three (N ═ 3) IPC classification numbers with the highest occurrence frequency in 30 recommended (M ═ 30) similar patents are recommended to the user as prediction results, and statistics is performed on the prediction results as follows: carrying out comparative analysis according to the IPC classification number group and the actual main IPC classification number to obtain the prediction accuracy of 78% in the technical field of 100 inquired patents; the technical field prediction accuracy of 100 inquired patents is 98% by comparing and analyzing the IPC classification large group and the actual main IPC classification.

The patent IPC classification number recommendation method based on the knowledge graph is adopted to predict the technical field of patent inquiry, 100 patents are selected as inquiry patents, descending order sorting is carried out according to the semantic similarity of patent texts, the patents with the top 30 of the similarity ranking are selected as recommended similar patents, the times of occurrence of IPC classification numbers in the 30 similar patents are counted, the technical field of 100 patents is predicted, then comparison is carried out with the actual technical field, the accuracy rate under two conditions of recommending one main IPC classification number and recommending a plurality of IPC classification numbers is obtained, and specific data are shown in a table 8.

Table 8100 recommendation accuracy in patent technical field

As can be seen from Table 8, three IPC classification numbers are recommended for each query patent, which can greatly improve the accuracy of technical field prediction.

The second embodiment is as follows: as shown in fig. 5, the present embodiment provides a knowledge-graph-based patent IPC classification number recommendation apparatus, including:

a patent knowledge graph construction module 1 configured to construct a patent knowledge graph including entities of a query patent and several patents having the same technical field as the query patent and relationships between the entities, the entities including an applicant, an inventor, an IPC classification number, an invention name, and keywords;

the entity vectorization module 2 is configured to perform vectorization representation on the entities in the patent knowledge graph by using a TransE model to obtain vectorization representation of the invention name of each patent in the patent knowledge graph;

a similarity calculation module 3 configured to calculate similarities between the query patent and patents in the database by using vectorization expression of the invention name, and take the M patents with the highest similarity to the query patent as recommended similar patents; and

and the IPC classification number recommending module 4 is configured to count the occurrence times of the IPC classification numbers of all the recommended similar patents, and take the N IPC classification numbers with the highest occurrence times as the recommended IPC classification numbers.

As a preferred embodiment of the present application, the patent knowledge graph building module 1 includes:

a patent domain database construction sub-module 11 configured to retrieve a plurality of patents having the same technical field as the query patent from a patent retrieval database, and merge the plurality of patents and the query patent into a patent domain database;

an entity extraction sub-module 12 configured to extract an applicant, an inventor, an IPC classification number, an invention name, and a keyword of each patent in the patent domain database as an entity; and

and a patent knowledge map construction submodule 13 configured to save the entities of each patent and the relationship between the entities into a Neo4j database to form a patent knowledge map.

As a preferred embodiment of the present application, the similarity is expressed as: and calculating the Euclidean distance between the inquired patent and each patent in the patent knowledge graph by using vectorization expression of the invention name.

As a preferred embodiment of the present application, M.gtoreq.10.

As a preferred embodiment of the present application, N has a value of 3.

The principle and effect of the apparatus for recommending patent IPC based on knowledge map in this embodiment are the same as those of the method for recommending patent IPC based on knowledge map in the first embodiment, and are not described herein again.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed by a computer, cause the processes or functions described in accordance with the embodiments of the application to be performed, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The available media may be magnetic media (e.g., floppy disks, hard disks, tapes), optical media (e.g., DVDs), or semiconductor media (e.g., Solid State Disks (SSDs)), among others.

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be understood by those skilled in the art that all or part of the steps in the method for implementing the above embodiments may be implemented by a program, and the program may be stored in a computer-readable storage medium, where the storage medium is a non-transitory medium, such as a random access memory, a read only memory, a flash memory, a hard disk, a solid state disk, a magnetic tape (magnetic tape), a floppy disk (floppy disk), an optical disk (optical disk), and any combination thereof.

The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A patent IPC classification number recommendation method based on knowledge graph is characterized by comprising the following steps:

2. The method of claim 1, wherein the constructing a patent knowledge graph comprises:

3. The method according to claim 1 or 2, wherein the similarity is expressed as: and calculating the Euclidean distance between the query patent and each patent in the patent knowledge graph by using vectorization expression of the invention name.

4. The method of claim 1, wherein M.gtoreq.10.

5. The method of claim 1, wherein N has a value of 3.

6. A patent IPC classification number recommendation device based on knowledge graph is characterized by comprising:

a patent knowledge graph construction module configured to construct a patent knowledge graph including entities of a query patent and several patents having the same technical field as the query patent and relationships between the entities, the entities including an applicant, an inventor, an IPC classification number, an invention name, and keywords;

the entity vectorization module is configured to utilize a TransE model to carry out vectorization representation on the entities in the patent knowledge graph to obtain vectorization representation of the invention name of each patent in the patent knowledge graph;

the similarity calculation module is configured to calculate the similarity between the query patent and each patent in a database by using vectorization expression of the invention name, and takes the M patents with the highest similarity with the query patent as recommended similar patents; and

7. The apparatus of claim 6, wherein the patent knowledge graph building module comprises:

and the patent knowledge map construction sub-module is configured to store the entities of each patent and the relationship among the entities into a Neo4j database to form a patent knowledge map.

8. The apparatus according to claim 6 or 7, wherein the similarity is expressed as: and calculating the Euclidean distance between the query patent and each patent in the patent knowledge graph by using vectorization expression of the invention name.

9. The device of claim 6, wherein M.gtoreq.10.

10. The apparatus of claim 6, wherein N has a value of 3.