CN112307009A - Method for inquiring technical digital assets - Google Patents

Method for inquiring technical digital assets Download PDF

Info

Publication number
CN112307009A
CN112307009A CN201910684857.2A CN201910684857A CN112307009A CN 112307009 A CN112307009 A CN 112307009A CN 201910684857 A CN201910684857 A CN 201910684857A CN 112307009 A CN112307009 A CN 112307009A
Authority
CN
China
Prior art keywords
technical
code
scheme
target
patent classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910684857.2A
Other languages
Chinese (zh)
Other versions
CN112307009B (en
Inventor
白杰
李冬云
吴先锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aowei Information Technology Jiangsu Co ltd
Original Assignee
Aowei Information Technology Jiangsu Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aowei Information Technology Jiangsu Co ltd filed Critical Aowei Information Technology Jiangsu Co ltd
Priority to CN201910684857.2A priority Critical patent/CN112307009B/en
Priority to PCT/CN2020/094407 priority patent/WO2021017640A1/en
Priority to FR2007744A priority patent/FR3099601A1/en
Publication of CN112307009A publication Critical patent/CN112307009A/en
Application granted granted Critical
Publication of CN112307009B publication Critical patent/CN112307009B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services
    • G06Q50/184Intellectual property management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Technology Law (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Tourism & Hospitality (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Human Resources & Organizations (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • General Health & Medical Sciences (AREA)
  • Operations Research (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computational Linguistics (AREA)
  • Development Economics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The method comprises the steps of calculating similarity indexes of a target technical scheme set A corresponding to a query request and a corresponding target digital asset technical description file and a technical scheme subset B corresponding to all technical points of each digital asset data packet in a digital asset data packet set to be detected, and further sorting and outputting data packets in the digital asset data packet set to be detected according to the similarity indexes, so that the digital asset data packet closest to the self requirement is easily obtained. The present application enables the querying of a technical system in a set of technical systems, which may comprise any complex technology in different directions or domains.

Description

Method for inquiring technical digital assets
Technical Field
The application relates to the field of internet data processing, in particular to a technical digital asset query method.
Background
In a digital asset financial transaction center, digital assets capable of transaction, namely authenticated digital assets, are stored in a digital asset registration platform of the transaction center, and the digital asset registration platform manages and uses the digital asset information before or after transaction, such as sharing information with other data platforms, inquiring or verifying the digital assets stored by the platform on request, and the like.
The digital assets are usually digitalized assets with intellectual achievement as core, and mainly include technical, design and expression digital assets, technical digital assets such as patent digital assets, design digital assets such as copyright or appearance digital assets, expression digital assets such as trademark digital assets, etc. Wherein, the technical digital assets, such as patent digital assets, may be composed of a plurality of patents with competitive relationship and complementary relationship and their accompanying or dependent parent assets. FIG. 1 is a diagram of an application scenario of a digital asset registration platform. In the figure, a user accesses a digital asset registration platform 2 through the internet by software such as clients 11 and 12 or APP installed on a terminal 1. Generally, the authenticated digital asset registered by the digital asset registration platform 2 is in the form of a data package, refer to fig. 2 (fig. 2 is an example of the structure and content of the authenticated digital asset data package). The data package comprises two parts, a digital asset bibliography item 21 and a digital asset entity 22, wherein the digital asset bibliography item 21 is stored in the digital asset registration platform 2, and the digital asset entity 22 is stored in a centralized or decentralized form in a local server or a third party server 3. If the digital asset registration platform 2 is a public chain node or a private sub-chain node of the blockchain network 4, the digital asset entity 22 will also be stored in the private sub-chain of the blockchain. Further, the digital asset bibliographic item 21 includes a digital asset entry 211 and a digital asset technical description 212. Taking a patent digital asset as an example, the digital asset entry 211 may include a plurality of items (at least one item), such as certificate code data and bibliographic data of hundreds of competitive patents or complementary patents. And the digital asset technical description 212 is a comprehensive description of the former's comprehensive technology, laws, market information, etc., in a form including an abstract and a detailed description.
In practice, the user often needs to use the client software 11, 12 to browse and query the digital asset information stored in the digital asset trading platform 2. For technical class digital assets, the purpose of the query is to find the target patent asset pack. However, a patent asset package typically includes a plurality of patents or patent combinations of disparate nature, such as a patent asset package for an engine, which may include structural, material, control, software, and even chemical patents, making the patent asset package difficult to implement content-based holistic querying. Taking a keyword-based patent query method as an example, generally, a user provides descriptions of required technical schemes or a set of keywords (if the keywords are the former, the keywords of the technical schemes need to be extracted in advance), then the keywords having upper, equivalent and lower relations with the keywords are obtained according to each keyword in the set to form a new keyword set, finally, all the keywords in the new keyword set generate a plurality of search formulas for querying according to use experiences, then the search formulas are used in combination according to the experiences and query results, and finally, the obtained query results are manually screened to obtain target results. Obviously, the method using keywords as query clues is not efficient and cannot be applied to the patent digital assets which are the objects of the overall query with hundreds of patent sets, and moreover, the patent sets include many uncertainties in the number of patents, the types of patents, the contents and the like. In addition, there is a semantic-based digital file similarity query method, which is suitable for similarity query of two definite texts, but for query targeting an indefinite number of patent collections, i.e. patent digital asset data packages, with bibliographic items and contents separated from each other, the semantic-based digital file query method is still not suitable for query of the patent digital asset data packages or cannot use query of the patent digital asset data packages at all. In addition, if the technical digital asset entity is taken as an operation object, the problems of reduced query efficiency and excessive resource occupation are caused.
Disclosure of Invention
Based on the above technical problems, the present application aims to provide a method for efficiently querying a target digital patent asset package in an automated manner by using the entire digital patent asset package as a query object.
The method for querying the technical digital assets comprises the following steps:
acquiring a query request and a corresponding target digital asset technical description file, wherein the technical description file comprises technical schemes corresponding to all technical essential points of a target digital asset, and acquiring a target technical scheme set A;
acquiring a set of digital asset data packets to be detected, and determining a technical scheme subset B corresponding to all technical key points of each digital asset data packet in the set;
calculating a similarity index of the target technical scheme set A and each technical scheme subset B;
and sorting and outputting the data packets in the digital asset data packet set to be detected according to the similarity index.
The query method of the second technical digital asset provided by the application comprises the following steps:
acquiring a query request and a corresponding target digital asset technical description file, wherein the technical description file comprises technical schemes corresponding to all technical essential points of a target digital asset, and acquiring a target technical scheme set A;
acquiring a set of digital asset data packets to be detected, and determining a technical scheme subset B corresponding to all technical key points of each digital asset data packet in the set;
determining a patent classification number set A corresponding to the target technical scheme set A and a patent classification number set B corresponding to each technical scheme subset B;
calculating the similarity index of the target technical scheme set A and each technical scheme subset B according to the patent classification number set A and each patent classification number set B;
and sorting and outputting the data packets in the digital asset data packet set according to the similarity index.
According to the method and the device, the relative similarity between the two technical schemes is calculated through the technical classification of the technical schemes of the digital asset data packet and the information such as the technical direction and the field expressed by the technical classification, and the relative similarity between the two technical scheme sets is further obtained, so that the on-line query of the digital asset data packet is integrally realized. The technical scheme of the application has the greatest characteristic that the query and quantitative expression of the data packets or the technical system in the technical system set, namely the data packet set consisting of a plurality of technical schemes in different fields, are realized by utilizing the relative or fuzzy or inaccurate similarity indexes among the individual technical schemes. The limitation of the traditional query thinking which only takes the key words of the technical scheme as clues is overcome.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a diagram of an application scenario of a digital asset registration platform;
FIG. 2 is an exemplary diagram of the structure and content of an authenticated digital asset package;
FIG. 3 is a flow chart of a method for querying a digital asset of a first technical class according to an embodiment of the present application;
FIG. 4 is an exemplary diagram of a first calculation of a similarity index between a target solution set A and a solution subset B utilized in the process of FIG. 3;
FIG. 5 is an exemplary diagram of computing the similarity between a target solution set A and a solution sub-set B as employed by the example of FIG. 4;
FIG. 6 is a flow chart of a second method employed by the process of FIG. 3 for calculating a similarity index between a target solution set A and a solution subset B;
FIG. 7 is a flow chart of a method for querying a digital asset of a second technical class according to an embodiment of the present application;
fig. 8 is a flowchart of a first method employed by the process of fig. 7 for calculating a similarity index between a target solution set a and a solution subset B.
Detailed Description
As can be seen from fig. 2, with the digital asset entity 22 as the object of operation, queries for implementing technical class digital assets are almost infeasible, e.g., uncertainty in their data volume, and may be stored in multiple network databases spread across geographic locations, etc.
Since a technical system is an organic collection of a plurality of different levels and different content technical solutions, which may belong to different fields or disciplines, may or may not be related at all, for example, a technical solution of an engine system may relate to mechanical, material, circuit, software, etc. solutions, which may not have any direct relationship with each other from the technical solution point of view. Furthermore, a technical solution may be applicable in different technical systems, for example, from the viewpoint of a technical solution itself, a technical system may not be reflected at all, and therefore, we cannot judge the overall nature of the technical system through a specific individual technical solution, and moreover, it is common knowledge that a part of the technical system cannot be replaced by a whole part. This results in an extremely difficult and thinking-and operation-wise obstacle to the degree of similarity or competitiveness of the two technical systems using the individual information of the technical solutions.
There are many reasons why a technical system theoretically has a myriad of descriptions, which can even be considered to belong to different technical systems. However, the degree of similarity or competitiveness of the two technical systems can still be reflected by some information. For example, the higher the degree of similarity between two technical systems as a whole, the more reactive it will be at a higher level of abstraction, the more likely it will be to be locally similar, and the more reactive it will be at a lower level of abstraction, from which we have the opportunity to judge the degree of similarity or competitiveness of two technical systems by multiple generalized descriptions of different levels of abstraction for one technical system.
Fig. 3 is a flowchart of a method for querying a digital asset of a first technical class according to an embodiment of the present application.
According to fig. 3, first, in step 31, a query request sent by a valid client is obtained, where the query request includes a target digital asset technical description file, and this file includes at least one or more technical solutions, and these technical solutions form a target technical solution set a. Wherein, each technical scheme in the set corresponds to the target technical point, the technical characteristics of the digital asset data package to be inquired are described, the expression form can be any form which is beneficial to clearly describing the digital asset data package, such as a WORD document or a PDF document, and the expression form can adopt the expression form of a patent file, etc. In addition, the query request may also include query conditions to narrow the query scope.
At step 32, a set of digital asset data packets to be detected may be obtained on the system platform or in the blockchain network according to the query condition. For each digital asset data packet in the set, a technical scheme subset B corresponding to all technical points of the digital asset data packet can be obtained through the technical description part or the technical document of the digital asset data packet.
At step 33, a similarity index is calculated for the target solution set a and each solution subset B. The similarity index can represent the overall similarity degree between each digital asset data packet in the digital asset data packet set to be detected and the technical scheme given in the target digital asset technical description file in the query request. And finally, in step 34, reordering the data packets in the digital asset data packet set to be detected according to the similarity index and outputting the data packets, thereby realizing the query of the technical digital assets.
The step of calculating the similarity index between the target solution set a and the solution subset B in step 33 may adopt the following substeps. Refer to fig. 4. Fig. 4 is an exemplary diagram of the similarity index of the first calculation target solution set a and the solution subset B employed in step 13.
According to fig. 4, each technical solution 11, 12, 13 in the target technical solution set 41 is determined, then each digital asset data packet 421, 422, 423 in the digital asset data packet set 42 to be detected is determined one by one, and further, the technical solution in the technical solution subset corresponding to the digital asset data packet 421, 422, 423 can be determined through the technical description part or document in each digital asset data packet 421, 422, 423. Specifically, the technical solutions 211 and 212 are included in the technical solution subset of the packet 421; among the subset of solutions for packet 422 are solutions 221, 222, 223, and 224; among the subset of solutions for packet 423 are solutions 231, 232, and 233. Then, the similarity between each solution 11, 12, 13 in the target solution set 41 and each solution in the solution subset of each data packet 421, 422, 423 is calculated. That is, the similarity between solutions 11, 12, and 13 and each solution in the solution subset of the packets 421, 422, and 423.
Specifically, the following calculations were performed:
the scheme similarity calculation can be performed in various orders, for example, as follows.
1. Calculating the similarity a11-211 and a11-212 between the technical scheme 11 and the technical schemes 211 and 212 of the data packet 421; the similarity between the technical scheme 11 and the technical schemes 221, 222, 223 and 224 is a11-221, a11-222, a11-223 and a 11-224; the similarity between the technical scheme 11 and the technical schemes 231, 232 and 233 is a11-231, a11-232 and a 11-233; reference is made to data set 43 in fig. 4.
2. Calculating the similarity a12-211 and a12-212 between the technical scheme 12 and the technical schemes 211 and 212 of the data packet 421; the similarity between the technical scheme 12 and the technical schemes 221, 222, 223 and 224 is a12-221, a12-222, a12-223 and a 12-224; the similarity between the technical scheme 12 and the technical schemes 231, 232 and 233 is a12-231, a12-232 and a 12-233; refer to data set 44 in fig. 4.
3. Calculating the similarity a13-211 and a13-212 between the technical scheme 13 and the technical schemes 211 and 212 of the data packet 421; the similarity between the technical scheme 13 and the technical schemes 221, 222, 223 and 224 is a13-221, a13-222, a13-223 and a 13-224; the similarity between the technical scheme 13 and the technical schemes 231, 232 and 233 is a13-231, a13-232 and a 13-233; reference is made to data set 45 in fig. 4.
And (II) calculating the maximum similarity of the schemes and the similarity index between the technical scheme sets.
1. Calculating a solution maximum similarity of each solution of the target solution set 41 to a solution subset of the data packet 421, and calculating a similarity index of the target solution set 41 to the solution subset of the data packet 421.
(1) Calculating the maximum similarity A11, A12 and A13 of the technical schemes 11, 12 and 13 and the technical scheme subset of the data packet 321, wherein:
A11=a11-211+a11-212;A12=a12-211+a12-212;A13=a13-211+a13-212;
(2) calculating a similarity index X11 between the target solution set 41 and a solution subset of the data packet 421, wherein:
X11=A11+A12+A13。
2. calculating a solution maximum similarity of each solution of the target solution set 41 to a solution subset of the data packet 422, and calculating a similarity index of the target solution set 41 to the solution subset of the data packet 422.
(1) Calculating the solution maximum similarity B11, B12, B13 of the solution subsets of the solutions 11, 12, 13 and the data packet 422, wherein:
B11=a11-221+a11-222+a11-223+a11-224;B12=a12-221+a12-222+a12-223+a12-224;B13=a13-221+a13-222+a13-223+a13-224;
(2) calculating a similarity index X12 of the target solution set 41 and a solution subset of the data packets 422, wherein:
X12=B11+B12+B13。
3. calculating the maximum similarity of each solution of the target solution set 41 to the solution subset of the data packet 423 and calculating the similarity index of the target solution set 41 to the solution subset of the data packet 423.
(1) Calculating the maximum similarity of the technical solutions 11, 12 and 13 and the technical solution subset of the data packet 423, wherein the maximum similarity is C11, C12 and C13:
C11=a11-231+a11-232+a11-233;C12=a12-231+a12-232+a12-233;
C13=a13-231+a13-232+a13-233;
(2) calculating the similarity index of the target technical solution set 41 and the technical solution subset of the data packet 423:
X13=C11+C12+C13。
it can be seen that X11, X12, and X13 are the basis for packet reordering in step 14.
The similarity of the technical solutions in the above step (a) may be calculated by using a keyword-based calculation method, or may also be calculated by using a semantic-based calculation method, so as to calculate the similarity between each technical solution in the target technical solution set a and each technical solution in the technical solution subset B. For example, the keyword-based method, referring to fig. 5, fig. 5 also shows an example of calculating the maximum similarity of the solutions and the similarity index between the solution sets using the similarity.
Firstly, determining each technical scheme 11, 12 and 13 in the target technical scheme set 51, and extracting all keywords corresponding to the technical schemes 11, 12 and 13 respectively and target keyword sets H1, H2 and H3 generated by corresponding derivative words, wherein the derivative words comprise synonyms, near-synonyms, hypernyms, hyponyms and the like of the keywords; the H1, H2 and H3 are keyword sets formed by removing repeated keywords from the keyword sets respectively. Then, each digital asset data packet 521, 522, 523 in the digital asset data packet set 52 to be detected is determined one by one, and further, the technical scheme in the technical scheme subset corresponding to the digital asset data packet 521, 522, 523 can be determined through the technical description part or document in each digital asset data packet 521, 522, 523. Specifically, the technical solutions 211 and 212 are included in the technical solution subset of the packet 521; among the subset of solutions for packet 522 are solutions 221, 222, 223, and 224; among the subset of solutions for packet 523 are solutions 231, 232, and 233.
The number of occurrences of each keyword in the set of calculated target keywords H1, H2, H3 and each data packet 521, 522, 523 is then calculated. I.e., the number of times each keyword in H1, H2, H3 appears in each solution in the subset of solutions of packets 521, 522, and 523.
As shown in fig. 5, the keywords in the set H1 appear 10 times in the technical solution 211 of the data packet 521, that is, the similarity value is 10; in the technical solution 212, the number of occurrences is 15, that is, the similarity value is 15; the data packet 522 occurs 20 times in the technical scheme 221, that is, the similarity value is 20; in the technical solution 222, 15 occurrences occur, that is, the similarity value is 15; in the technical solution 223, the number of occurrences is 30, that is, the similarity value is 30; occurs 5 times in technical solution 224, i.e., the similarity value is 5; the occurrence is 0 times in the technical solution 231 of the data packet 523, that is, the similarity value is 0; the number of occurrences is 5 in solution 232, i.e., the similarity value is 5, and the number of occurrences is 2 in solution 233, i.e., the similarity value is 2.
The keywords in the set H2 appear 5 times in the technical solution 211 of the data packet 521, that is, the similarity value is 5; in the technical solution 212, the number of occurrences is 15, that is, the similarity value is 15; occurs 5 times in the technical solution 221 of the data packet 522, i.e. the similarity value is 5; in the technical scheme 222, the occurrence is 10 times, that is, the similarity value is 10; in the technical solution 223, the number of occurrences is 20, that is, the similarity value is 20; occurs 10 times in technical scheme 224, i.e. the similarity value is 10; the occurrence is 5 times in the technical scheme 231 of the data packet 523, that is, the similarity value is 5; the number of occurrences is 5 in solution 232, i.e., the similarity value is 5, and the number of occurrences is 5 in solution 233, i.e., the similarity value is 5.
The keywords in the set H3 appear 10 times in the technical solution 211 of the data packet 521, that is, the similarity value is 10; occurs 20 times in technical scheme 212; i.e. a similarity value of 20; 25 occurrences occur in the technical solution 221 of the data packet 522, i.e. the similarity value is 25; in the technical solution 222, 15 occurrences occur, that is, the similarity value is 15; in the technical solution 223, the occurrence is 5 times, that is, the similarity value is 5; occurs 5 times in technical solution 224, i.e., the similarity value is 5; the data packet 523 occurs 10 times in the technical solution 231, that is, the similarity value is 10; the number of occurrences is 5 in solution 232, i.e., the similarity value is 5, and the number of occurrences is 5 in solution 233, i.e., the similarity value is 5.
Adding the times 10 and 15 of occurrence of each keyword in the target keyword set H1 in the technical solutions 211 and 212 of the technical solution subset 521 is the maximum similarity 25 between the technical solution 11 in the target technical solution 51 and the digital asset data packet 521 in the digital asset data packet set 52 to be detected, in this example, the maximum similarity value between the technical solution 11 and the digital asset data packet 521 is 25, that is, the value of a11 in fig. 5 is 25.
Similarly, the maximum similarity between the technical scheme 12 in the target technical scheme 51 and the digital asset data package 521 in the digital asset data package set 52 to be detected can also be obtained, in this example, the maximum similarity between the technical scheme 12 and the digital asset data package 521 is 20, that is, the value of a12 in fig. 5 is 20. The maximum similarity a13 between the solution 13 in the target solution 51 and the digital asset data package 521 in the digital asset data package set 52 to be detected is 30.
Further, the similarity index X11 of the target technical solution 51 with the digital asset data packet 521 in the digital asset data packet set 52 to be detected is a11+ a12+ a13 is 25+20+30 is 75. The similarity index X12 of the target technical solution 51 and the digital asset data packet 522 in the digital asset data packet set 52 to be detected is B11+ B12+ B13 is 70+45+50 is 165. The similarity index X13 of the target technical solution 51 and the digital asset data packet 523 in the digital asset data packet set 52 to be detected is C11+ C12+ C13 is 7+15+20 is 42.
In other embodiments of the present application, semantic similarity between technical solutions is calculated using a semantic-based calculation method. Assume that the semantic similarity function is LAN (X1, X2), where X1 is the description document of the first technical file and X2 is the description document of the second technical file, so the semantic similarity between technical solution 11 and technical solution 211 is LAN (technical solution 11, technical solution 211). Obviously, the similarity index between the digital asset data packets can be obtained through semantic similarity, and details are not repeated here.
Fig. 6 is a flowchart of a second method employed by the process of fig. 3 for calculating a similarity index between a target solution set a and a solution subset B.
The flow illustrated in fig. 6 shows a general scheme, which adopts the principle that, in order to describe a technical system as a whole, the key technical scheme of a technical system is expressed by a general description of four abstract levels (or more levels or less levels, but not less than two levels, too many levels may reduce the efficiency of the method, and the improvement degree of the judgment accuracy is limited), and the degree of similarity or competitiveness of two technical systems can be judged quickly according to the statistics and comparison of the expression quantity of each level of the key technical scheme of the two technical systems. Refer to fig. 6.
In step 61, a technology classification rule having a progressive feature with four levels is determined or selected. The technical classification rule can be designed in advance for use, and if the technical classification rule is used for inquiring a technical system in a specific field, such as a chemical field or a semiconductor field, the targeted technical classification rule is beneficial to the accuracy of retrieval and judgment. However, in most cases, one of the commonly used technical classification rules can be selected for use, which is not so different in application effect, and the most commonly used are the international patent classification rule, the european or U.S. patent classification rule, and the like. The progressive features are the four abstraction levels, and obviously, the international patent classification rules and the like have the features. If the rule is designed by itself, reference can be made to the following table, for example, the meaning of the technical classification rule of four abstraction levels is as follows, wherein the smaller the value, the higher the abstraction level:
table 1 technical rule design table
Hierarchy level A II III Fourthly
Name (R) Technical direction Technical Field Direction of specialty Professional field
Expression of A-G A-Z A-Z + number 0-9 A-Z + number 0-9
Description of the invention 1 position 2 position 3 position 4 bit
For example, for the encoding BAFA01a105 of a technical point, where B represents technical direction information of the technical point, AF represents technical field information, a01 represents professional direction information, and a105 represents professional field information.
Since the design of the technical classification rules and the content definition belong to the public technical category, they are not described in detail herein.
Step 62, selecting technical points from the two technical systems, respectively. The selection of the technical points is carried out according to the principles of comprehensiveness, generalization and key consideration. The comprehensive method emphasizes that the selection of the technical key points should cover or take into account each branch of the technical system structure, and avoids omission to the maximum extent; the summary is intended to make the selected technical points and the description thereof have multi-hierarchy, so that the technical point set can embody the integral characteristics of the system; the key point is that key technical schemes or innovative technical schemes with characteristics in the system are selected as far as possible, and the identifiability of the system is improved to the maximum extent. Thus, for the technical point set a extracted from the first technical system summary and the technical point set B extracted from the second technical system summary, the technical classification rule is used to technically classify each of the technical points, so as to obtain the corresponding classification number set A, B. The technical point information in the technical point set is a technical description file of the technical point, and includes information such as characters or pictures, for example, the information may also be a style of a patent application file; and in the classification number set, the technical classification code corresponding to each technical point file is used.
In the following steps, the classification number set A, B will be the object of operation.
Step 63, selecting 80% of numbers as operation objects (generally 100% when the number is small; the description about the number of selected numbers is described in detail later) in an arbitrary manner, such as a random or sequential manner, according to the number of the classified numbers in the classified number set A, and obtaining a new classified number set A; similarly, in the classification number set B, 100% of the numbers are selected as operation objects according to the number of the classification numbers therein, and a new classification number set B is obtained.
For the new classified number set A, for each number in the new classified number set A, obtaining each level code indicated by the number, removing repeated items in the new classified number set A, obtaining each level code set X11, X12, X13 and X14 and corresponding numbers Y11, Y12, Y13 and Y14 of all the numbers, and for each number in the new classified number set B, obtaining each level code indicated by the number, removing repeated items in the new classified number set B, obtaining each level code set X21, X22, X23 and X24 and corresponding numbers Y21, Y22, Y23 and Y24 of all the numbers. How "remove duplicates" is done is explained below. Assume that the first level codes of all the numbers in the new category number set a, i.e. the code set X11 representing the technology direction, are:
x11 ═ B, a, C, B, D, E, F, D, B }, where B repeats 2 times, C repeats 1 time, D repeats 1 time, and X11 ═ B, a, C, D, E, F }, after the repetition is removed, in which case the corresponding code number Y11 is 6.
Step 64, calculating the number E1 of coding coincidence of X11 and X21, and the number E2 of coding coincidence of X12 and X22, the number E3 of coding coincidence of X13 and X23, and the number E4 of coding coincidence of X14 and X24 according to the coding sets X11, X12, X13 and X14, and X21, X22, X23 and X24.
For example, assuming that X11 is { B, a, C, D, E, F }, and X21 is { B, a, G }, the number of code overlaps, E1, of X11 and X21 is 2.
Step 65, calculating the relative overlap ratio Ai and Bi of each level of the classification number set A, B; wherein the content of the first and second substances,
for the classification number set a, Ai ═ (Ei/Y1 i)%; for the classification number set B, Bi ═ (Ei/Y2 i)%.
Step 66 and step 67, calculating the technical correlation index F of the classification number set A according to the relative contact ratio Ai and BiATechnical correlation index F with classified number set BA(ii) a Wherein, FA=∑Ci*Ai;FB∑ Ci ═ Bi; in the formula, Ci is an empirical constant;
according to the correlation index FAAnd FBCalculating the similarity probability G of the classification number set A, BA、GB(ii) a Wherein G isA=FA/(∑Ci);GB=FB/(∑Ci);
G is to beAG is used as the similarity index of the target technical scheme set A and the technical scheme subset BBAs similarity index of the technical scheme subset B and the target technical scheme set A;
in the above correlation equation, i is 1 to n, where n is the number of encoding levels of the technical classification rule, and in this example, n is 4.
In the method described in fig. 6, the correlation between two technical systems is characterized by a correlation index. The correlation index formula is of the form:
F=C1*A1+C2*A2+C3*A3+C4*A4。
in the formula, F represents a correlation index, A1, A2, A3 and A4 respectively represent the contact ratio of primary, secondary, tertiary and quaternary codes of the technical classification codes, C1, C2, C3 and C4 respectively represent the correlation coefficients of the primary, secondary, tertiary and quaternary codes of the technical classification codes and the system integrity property, and empirical values of the correlation coefficients are obtained through methods such as machine learning or statistics and are used for identifying the influence degree of the primary codes on the technical system integrity property.
The degree of similarity or the degree of collision between the two technical systems is characterized by a probability of similarity or a probability of collision. The similarity probability or collision probability formula is in the form:
T=F/(C1+C2+C3+C4)×100%。
fig. 7 is a flowchart of a second technical digital asset query method according to an embodiment of the present application.
According to fig. 7, first, in step 71, a query request sent by a valid client is obtained, where the query request includes a target digital asset technical description file, and this file includes at least one or more technical solutions corresponding to all technical points of the target digital asset, and these technical solutions form a target technical solution set a.
At step 72, a set of digital asset data packets to be detected may be obtained on the system platform or in the blockchain network according to the query condition. For each digital asset data packet in the set, a technical scheme subset B corresponding to all technical points of the digital asset data packet can be obtained through the technical description part or the technical document of the digital asset data packet.
In step 73, the patent classification number sets a corresponding to all technical solutions in the target technical solution set a and the patent classification number sets B corresponding to all technical solutions in each technical solution subset B are determined. Since a technical solution may have multiple patent classification numbers, the set a and the set B should adopt a classification number inclusion standard, and either only select the principal classification number inclusion set of the technical solution or all the classification numbers of the technical solution are included in the set. The former is beneficial to improving the calculation efficiency, but when the calculation resources of the digital processor are sufficient, the latter can improve the calculation accuracy.
At step 74, a similarity index between the target solution set a and each solution subset B is calculated according to the patent classification number set a and each patent classification number set B. The similarity index can represent the overall similarity degree between each digital asset data packet in the digital asset data packet set to be detected and the technical scheme given in the target digital asset technical description file in the query request. And finally, in step 75, reordering the data packets in the digital asset data packet set to be detected according to the similarity index and outputting the data packets, thereby realizing the query of the technical digital assets.
The method for determining similarity index of two technical systems adopted in the embodiment of fig. 7 utilizes the patent classification rule. For example, the international patent classification number described in the patent application information of the two technical systems can be used to obtain the technical field overlapping information indicated by the international patent classification number, and thus the similarity degree of the two technical systems can be determined as a whole. In other embodiments, any technical classification rule may be used to obtain the technical classifications of the key or main technical points of the two technical systems, and is not limited to the patent classification, or the patent classification is only one form of technical classification, and the method provided by the present application may be used as long as the two technical systems perform technical classification on the key or main technical points in the systems according to the same technical classification rule. For example, with two technical systems applied in the united states or in europe, the patent classification number of the united states or europe may be used to determine the degree of conflict between any two technical systems according to the method provided in the present application. The following describes specific implementation processes of other embodiments of the present application with International Patent Classification (IPC) as a technical classification rule of key technical points in a technical system.
The international patent classification number, i.e., IPC, adopts a classification mode of combining functions and applications, and a classification principle of mainly taking functionality and secondarily taking applicability. Using the form of the grade, the technical content is noted as: and five parts of a part, a major class, a minor class, a major group and a minor group are classified step by step to form a complete classification system. Thus, a complete IPC class number is made up of a combination of symbols representing department, major, minor, major and minor groups.
In one embodiment, the five pieces of information are used to determine the degree of similarity or conflict between two technical systems, or two sets of technical systems. In another embodiment, four of the five pieces of partial information, i.e. the major, minor, major and minor groups of information, are used to determine the degree of similarity or degree of conflict between two technical systems, or two sets of technical systems. Similarly, three of the five pieces of partial information, i.e., the small, large and small groups of information, may also be used to determine the degree of similarity or the degree of conflict between two technical systems, or between two sets of technical systems. Alternatively, two of the five pieces of partial information, i.e., the major and minor groups of information, are used to determine the degree of similarity or degree of conflict between the two technical systems, or between the two sets of technical systems. Alternatively, one of the five pieces of partial information, i.e., the information of the group, is also used to determine the degree of conflict between the two technical systems, or between the two technical systems in the two sets.
Obviously, of these five pieces of information, the range of concept of information of a part is the largest, and the purpose of utilizing this information is not to omit the information used; the information concept of the group is minimized, and the information is used for the purpose of making the information used more accurate. Thus, there may be a number of embodiments that utilize patent classification information, such as only department, subclass, major group, and minor group information to determine the degree of similarity or degree of conflict between two technical systems, or two sets of technical systems. And so on. The fourth embodiment of determining the similarity or conflict degree of two technical systems by using three of the five pieces of partial information, i.e. the information of the small group, the large group and the small group, is further described below, and the method in this embodiment may be implemented in the form of software.
Specifically, the step of calculating the similarity index between the target solution set a and the solution subset B in step 74 of the flowchart illustrated in fig. 7 may adopt the following sub-steps. Refer to fig. 8. Fig. 8 is a flowchart of a first method employed by the process step 74 for calculating a similarity index between the target solution set a and the solution subset B.
The process illustrated in fig. 8 is characterized by using a patent application of two technical systems or technical solution sets as a technical point, and using the international patent classification number of the patent application as a technical classification rule. Specifically, the international patent classification number performs the analysis of technical correlation or similarity between two technical systems according to the subclass, major group and minor group classification numbers of the IPC classification of the patent applications in the set A and the set B.
First, in step 81, the IPC numbers in all the patent application information of the patent classification number set a and the patent classification number set B are obtained to form two IPC number sets, and the two IPC number sets respectively correspond to the set A, B.
At step 82, the minor group code, major group code and minor group code indicated by all international patent classification numbers of the first number set are obtained, repeated parts in each group of codes are removed, and a minor group code set B3 (the first column of table 1, i.e. the IPC minor group of set a) is obtained, the number B3 of minor group codes is 19 (the last row of the first column of table 1, i.e. the last row of the IPC minor group column of set a), a major group code set B2 (the first column of table 2, i.e. the IPC major group of set a), the number B2 of major group codes is 19 (the last row of the first column of table 2, i.e. the last row of the IPC major group column of set a), and a minor group code set B1 (the first column of table 3, i.e. the IPC minor group of set a), and the number B1 of minor group codes is 13 (the last row of the first column of table 3, i.e. the last row of.
Then, the minor group codes, major group codes and minor group codes indicated by all international patent classification numbers of the second number set are obtained, repeated parts in each group of codes are removed, and a minor group code set D3 (the second column of table 2, i.e. the IPC minor group of set B), the number D3 of the minor group codes being 10 (the last row of the second column of table 2, i.e. the last row of the IPC minor group column of set B), a major group code set D2 (the second column of table 3, i.e. the IPC major group of set B), the number D2 of the major group codes being 10 (the last row of the second column of table 3, i.e. the last row of the major group column of set B), and a minor group code set D1 (the second column of table 4, i.e. the IPC minor group of set B), the number D1 of the minor group codes being 5 (the last row of the second column of table 4, i.e. the last row of the IPC minor group column of.
Table 2: IPC subclass information comparison table of set A and set B
IPC subclass of set A IPC subclass of set B Overlapping IPC subclasses
A41D E21C B65G
A62D E21D C02F
B01D B23P E21C
B01F B25B E21D
B03B E02F E21F
B61G B65G
B61K E21F
B61L G06Q
B65G C02F
B66D E01H
C01B
C01F
C02F
C09K
C25C
E01B
E21C
E21D
E21F
Add up to 19 items Total of 10 items Repeat 5 items
Table 3: IPC large group comparison table of set A and set B
IPC group of set A IPC group of set B Overlapping IPC team
A41D13/00 E21C35/00 E21D15/00
A61F17/00 E21C41/00
A61J9/00 E21D15/00
A62D1/00 B23P19/00
B61K7/00 B25B27/00
B61L11/00 E02F9/00
B61L23/00 E21C33/00
B65G11/00 E21D20/00
B65G21/00 E21D23/00
B65G65/00 E21F13/00
B66B15/00
B66C1/00
B66D1/00
C01B33/00
C02F1/00
C09K3/00
C25C3/00
E21D15/00
E21D19/00
Add up to 19 items Total of 10 items Repeat 1 item
Table 4: IPC group comparison table for set A and set B
Figure BDA0002145930710000101
Figure BDA0002145930710000111
It should be noted that, in step 82, 100% of the patent classification number analysis objects of the set a and the set B are selected respectively, and in other embodiments, only a part of them may be selected. The result of this is that the execution result of the method has a certain error, but the overall judgment is not affected, and the practicability of the method is enhanced, and any technical system can judge under the condition that the patent classification number has an error. In addition, a selection range is set, so that better balance between the effect and the efficiency can be achieved, and the method has flexibility in use.
In step 83, based on the minor group code sets B3 and D3, major group code sets B2 and D2, and minor group code sets B1 and D1 of the two technical systems obtained in step 82, the number of overlap of minor group codes E3 of the two technical systems is calculated to be 5 (the third column of table 1, i.e., the last row of the overlapped IPC minor group column), the number of overlap of major group codes E2 is calculated to be 1 (the third column of table 2, i.e., the last row of the overlapped IPC major group column), and the number of overlap of minor group codes E1 is calculated to be 0 (the third column of table 3, i.e., the last row of the overlapped IPC minor group column).
In step 84, calculating the small group code overlap ratio, the large group code overlap ratio and the small group code overlap ratio of any one technical system according to the small group code number b 3-19, d 3-10, the large group code number b 2-19, d 2-10, the small group code number b 1-13, d 1-5, the number of superposition of two technical system small group codes E3-5, the number of superposition of large group codes E2-1 and the number of superposition of small group codes E1-0; among them, for the first technical system, A3 ═ E3/b3 ≈ (5/19)%, a2 ═ E2/b2 ≈ 5 ≈ 1/19 ≈ 5%, a1 ═ E1/b1 = (0/13)% = 0;
for the second technical system, B3 ═ E3/d3 ≈ 50% (5/10)%, B2 ═ E2/d2 ≈ 10% (1/10)%, and B1 ═ E1/d1 = (0/5)% = 0%.
In step 85, calculating a patent technology correlation index F of any technical system relative to another technical system according to the contact ratio; wherein, for the first technical system, FA=C3*A3+C2*A2+C1*A1,FBC3 × B3+ C2 × B2+ C1 × B1, C3, C2, and C1 are empirical constants, in this example, C3, C2, and C1 respectively represent correlation coefficients of the collision between the IPC subclass, major class, and minor class and the two systems, and their empirical values are 1, 2, and 3, respectively.
For the first technical system, FAA3+ C2 a2+ C1 a1 ═ C3, i.e., FA=C3*A3+C2*A2+C1*A1=1*26%+2*5%+3*0=36%。
For the second technical system, FBC3 × B3+ C2 × B2+ C1 × B1, i.e., FB=C3*B3+C2*B2+C1*B1=1*50%+2*10%+3*0=60%。
In step 86, calculating the patent conflict probability G of any technical system relative to another technical system according to the correlation index F; wherein.
GA=FA/(C3+C2+C1)=36%/(1+2+3)=6%。GAAs a similarity between the first technical system and the second technical system. GB=FB/(C3+C2+C1)=60%/(1+2+3)=10%。GBAs a similarity of the second technical system to the first technical system.
Wherein G isAIs the similarity index, G, of the target technical solution set A and the technical solution subset BBIs the similarity index of the technical solution subset B and the target technical solution set A.

Claims (10)

1. The technical digital asset query method is characterized by comprising the following steps:
acquiring a query request and a corresponding target digital asset technical description file, wherein the technical description file comprises technical schemes corresponding to all technical essential points of a target digital asset, and acquiring a target technical scheme set A;
acquiring a set of digital asset data packets to be detected, and determining a technical scheme subset B corresponding to all technical key points of each digital asset data packet in the set;
calculating a similarity index of the target technical scheme set A and each technical scheme subset B;
and sorting and outputting the data packets in the digital asset data packet set to be detected according to the similarity index.
2. The query method of claim 1, wherein the similarity index between the target solution set a and the solution subset B is calculated according to the following steps:
calculating the scheme similarity of each technical scheme in the target technical scheme set A and each technical scheme in the technical scheme subset B, and summing the scheme similarities to obtain the maximum scheme similarity;
and adding the maximum similarity of each scheme to obtain a similarity index of the target technical scheme set A and the technical scheme subset B.
3. The query method according to claim 2, wherein the solution similarity of each solution in the target solution set a and each solution in the solution subset B is calculated according to the following steps:
acquiring a target keyword set generated by all keywords of each technical scheme in the target technical scheme set A and corresponding derivative words; the target keyword set is a keyword set with repeated keywords removed;
calculating the occurrence frequency of each keyword in the target keyword set in each technical scheme of the technical scheme subset B;
and taking the sum of all the times as the similarity of the target technical scheme A and the technical scheme subset B.
4. The query method of claim 1, wherein the step of calculating the similarity index between the target solution set a and the solution subset B comprises the steps of:
determining or selecting a technical classification rule with at least two levels having progressive features;
taking the target technical scheme set A and the technical scheme subset B as corresponding technical point sets A, B, and carrying out technical classification on technical points in the technical point set A, B by using the technical classification rules to obtain corresponding classification number sets A, B;
selecting M% of classified numbers in the classified number set A, acquiring each level code indicated by each number, and obtaining a set X1i formed by each level code of all classified numbers in the M% of numbers and a corresponding number Y1 i; selecting N% of classified numbers in the classified number set B, acquiring each level code indicated by each number, and obtaining each level code set X2i and the corresponding number Y2i of all the classified numbers in the N% of numbers; wherein the information in the set is information after removing the repetition;
calculating the number Ei of code coincidence of each level in X1i and X2i according to the code sets X1i and X2 i;
calculating the encoding relative contact ratio Ai and Bi of each level of the classification number set A, B according to Y1i, Y2i and Ei; wherein, for the classification number set a, Ai ═ (Ei/Y1 i)%; for the classification number set B, Bi ═ (Ei/Y2 i)%;
calculating the technical correlation index F of the classification number set A according to the relative contact ratio Ai and BiATechnical correlation index F with a set of class numbers BA(ii) a Wherein the content of the first and second substances,
FA=∑Ci*Ai;FB∑ Ci ═ Bi; wherein Ci is an empirical constant;
according to the correlation index FAAnd FBCalculating the similarity probability G of the classification number set A, BA、GB(ii) a Wherein the content of the first and second substances,
GA=FA/(∑Ci);GB=FB/(∑Ci);
wherein G isAIs the similarity index, G, of the target technical solution set A and the technical solution subset BBIs the similarity index of the technical scheme subset B and the target technical scheme set A;
in the above formula, i is 1-n, where n is the number of encoding levels of the technical classification rule.
5. The technical digital asset query method is characterized by comprising the following steps:
acquiring a query request and a corresponding target digital asset technical description file, wherein the technical description file comprises technical schemes corresponding to all technical essential points of a target digital asset, and acquiring a target technical scheme set A;
acquiring a set of digital asset data packets to be detected, and determining a technical scheme subset B corresponding to all technical key points of each digital asset data packet in the set;
determining a patent classification number set A corresponding to the target technical scheme set A and a patent classification number set B corresponding to each technical scheme subset B;
calculating the similarity index of the target technical scheme set A and each technical scheme subset B according to the patent classification number set A and each patent classification number set B;
and sorting and outputting the data packets in the digital asset data packet set according to the similarity index.
6. The query method of claim 5, wherein the similarity index between the target solution set A and the solution subset B is calculated according to the following steps:
acquiring M% of international patent classification number indicated part code sets B5, number B5, major code sets B4 and B4, minor code sets B3 and B3, major code sets B2 and B2, and minor code sets B1 and B1 in a patent classification number set A respectively; and acquiring N% of international patent classification number-indicated partial code sets D5, number D5, major code sets D4, number D4, minor code sets D3, number D3, major code sets D2, number D2, and minor code sets D1, number D1 in the patent classification number set B; wherein, M is more than or equal to 100 and more than 0; 100 is more than or equal to N and is more than 0, and the information in the set is the information after the repetition is removed;
according to the part code sets B5, D5, large group code sets B4, D4, small group code sets B3, D3, large group code sets B2, D2 and small group code sets B1 and D1 of the patent classification number set A, B, calculating the number E5 of part code coincidences, the number E4 of large group code coincidences, the number E3 of small group code coincidences, the number E2 of large group code coincidences and the number E1 of small group code coincidences of the patent classification number set A, B;
calculating partial code overlap ratios A2 and B2, large code overlap ratios A2 and B2, small code overlap ratios A2 and B2, large code overlap ratios A2 and B2 and small code overlap ratios A2 and B2 of two patent classification number sets 2 according to the partial code numbers B5 and d5, the large code numbers B4 and d4, the small code numbers B3 and d3, the large group code numbers B2 and d2, the small group code numbers B1 and d1 of the patent classification number set A, B, the partial code overlap ratio E5, the large code overlap ratio E4, the small code overlap ratio E3, the large group code overlap ratio E2 and the small group code overlap ratio E2 of the patent classification number set A, B; wherein the content of the first and second substances,
for patent classification set a, a5 ═ E5/b 5% >, a4 ═ E4/b 4% >, A3 ═ E3/b 3% >, a2 ═ E2/b 2% >, a1 ═ E1/b 1% >;
for patent classification set B, B5 ═ E5/d 5% >, B4 ═ E4/d 4% >, B3 ═ E3/d 3% >, B2 ═ E2/d 2% >, B1 ═ E1/d 1% >;
calculating the patent technology correlation index F of the target technical solution set A and the technical solution subset B according to the contact ratios A5, B5, A4, B4, A3, B3, A2, B2, A1 and B1AOr FB(ii) a Wherein the content of the first and second substances,
for target solution sets A, FA=C5*A5+C4*A4+C3*A3+C2*A2+C1*A1;
For technical scheme subset B, FB=C5*B5+C4*B4+C3*B3+C2*B2+C1*B1;
Wherein, C5, C4, C3, C2 and C1 are empirical constants;
calculating the mutual similarity probability G of the target technical scheme set A and the technical scheme subset B according to the correlation index F;
wherein:
GA=FA/(C5+C4+C3+C2+C1);GB=FB/(C5+C4+C3+C2+C1);
wherein G isAIs the similarity index, G, of the target technical solution set A and the technical solution subset BBIs the similarity index of the technical solution subset B and the target technical solution set A.
7. The query method of claim 5, wherein the similarity index between the target solution set A and the solution subset B is calculated according to the following steps:
respectively acquiring M% of large-class code sets B4 and B4, small-class code sets B3 and B3, large-group code sets B2 and B2, and small-group code sets B1 and B1 indicated by international patent classification numbers in a patent classification number set A; acquiring N% of large-class code sets D4, the number D4, small-class code sets D3, the number D3, large-group code sets D2, the number D2 and small-group code sets D1 and the number D1 indicated by international patent classification numbers in the patent classification number set B; wherein, M is more than or equal to 100 and more than 0; 100 is more than or equal to N and is more than 0, and the information in the set is the information after the repetition is removed;
calculating the number E4 of large-class code coincidences, the number E3 of small-class code coincidences, the number E2 of large-class code coincidences and the number E1 of small-class code coincidences of a patent classification number set A, B according to the large-class code sets B4, D4, the small-class code sets B3, D3, the large-group code sets B2, D2 and the small-group code sets B1, D1 of the patent classification number set A, B;
calculating major code overlap ratios A4 and B4, minor code overlap ratios A3 and B3, major code overlap ratios A2 and B2, and minor code overlap ratios A1 and B1 of two patent classification number sets A, B according to the major code numbers B4 and d4, the minor code numbers B3 and d3, the major code numbers B2 and d2, and the minor code number B1 and d1 of the patent classification number set A, B, and the major code overlap ratio E4, the minor code overlap ratio E3, the major code overlap ratio E2 and the minor code overlap ratio E1 of the patent classification number set A, B; wherein the content of the first and second substances,
for patent classification set a, a4 ═ E4/b 4% >, A3 ═ E3/b 3% >, a2 ═ E2/b 2% >, a1 ═ E1/b 1% >;
for patent classification set B, B4 ═ E4/d 4% >, B3 ═ E3/d 3% >, B2 ═ E2/d 2% >, B1 ═ E1/d 1% >;
calculating a patent technology correlation index F of the target technical solution set A and the technical solution subset B according to the contact ratios A4, B4, A3, B3, A2, B2, A1 and B1AOr FB(ii) a Wherein the content of the first and second substances,
for target solution sets A, FA=C4*A4+C3*A3+C2*A2+C1*A1;
For technical scheme subset B, FB=C4*B4+C3*B3+C2*B2+C1*B1;
Wherein, C4, C3, C2 and C1 are empirical constants;
calculating the mutual similarity probability G of the target technical scheme set A and the technical scheme subset B according to the correlation index F;
wherein:
GA=FA/(C4+C3+C2+C1);GB=FB/(C4+C3+C2+C1);
wherein G isAIs the similarity index, G, of the target technical solution set A and the technical solution subset BBIs the similarity index of the technical solution subset B and the target technical solution set A.
8. The query method of claim 5, wherein calculating a similarity index between the target solution set A and the solution subset B according to the following steps comprises:
respectively acquiring a minor group code set B3, a quantity B3, a major group code set B2, a quantity B2, a minor group code set B1 and a quantity B1 which are indicated by M% of international patent classification numbers in a patent classification number set A; acquiring N% of subclass code sets D3, the number D3, the large group code sets D2, the number D2 and the small group code sets D1 and the number D1 indicated by international patent classification numbers in the patent classification number set B; wherein, M is more than or equal to 100 and more than 0; 100 is more than or equal to N and is more than 0, and the information in the set is the information after the repetition is removed;
calculating the number E3 of the minor group code coincidences, the number E2 of the major group code coincidences and the number E1 of the minor group code coincidences of the patent classification number set A, B according to the minor group code sets B3, D3, the major group code sets B2, D2 and the minor group code sets B1 and D1 of the patent classification number set A, B;
calculating minor code overlap ratios A3 and B3, major code overlap ratios A2 and B2 and minor code overlap ratios A1 and B1 of the two patent classification number sets A, B according to the minor code numbers B3 and d3, the major code numbers B2 and d2, the minor code numbers B1 and d1 of the patent classification number sets A, B, the number of superposition of minor codes E3, the number of superposition of major code E2 and the number of superposition of minor codes E1 of the patent classification number sets A, B; wherein the content of the first and second substances,
for patent classification set a, A3 ═ E3/b 3% >, a2 ═ E2/b 2% >, a1 ═ E1/b 1% >;
for patent classification set B, B3 ═ E3/d 3% >, B2 ═ E2/d 2% >, B1 ═ E1/d 1% >;
calculating a patent technology correlation index F of the target technical scheme set A and the technical scheme subset B according to the contact ratios A3, B3, A2, B2, A1 and B1AOr FB(ii) a Wherein the content of the first and second substances,
for target solution sets A, FA=C3*A3+C2*A2+C1*A1;
For technical scheme subset B, FB=C3*B3+C2*B2+C1*B1;
Wherein, C3, C2 and C1 are empirical constants;
calculating the mutual similarity probability G of the target technical scheme set A and the technical scheme subset B according to the correlation index F;
wherein:
GA=FA/(C3+C2+C1);GB=FB/(C3+C2+C1);
wherein G isAIs the similarity index, G, of the target technical solution set A and the technical solution subset BBIs a technical methodSimilarity indexes of the case set B and the target technical scheme set A.
9. The query method of claim 5, wherein the similarity index between the target solution set A and the solution subset B is calculated according to the following steps:
respectively acquiring a large group code set B2 and a quantity B2, and a small group code set B1 and a quantity B1, which are indicated by M% of international patent classification numbers in the patent classification number set A; acquiring N% of large group code set D2 and quantity D2 indicated by international patent classification numbers in the patent classification number set B, and small group code set D1 and quantity D1; wherein, M is more than or equal to 100 and more than 0; 100 is more than or equal to N and is more than 0, and the information in the set is the information after the repetition is removed;
calculating the number E2 of large group code coincidences and the number E1 of small group code coincidences of the patent classification number set A, B according to the large group code sets B2 and D2 and the small group code sets B1 and D1 of the patent classification number set A, B;
calculating the group code overlap ratio A2 and B2 and the group code overlap ratio A1 and B1 of the two patent classification number sets A, B according to the group code numbers B2 and d2 and the group code numbers B1 and d1 of the patent classification number set A, B and the number E2 of the group code overlap and the number E1 of the group code overlap of the patent classification number set A, B; wherein the content of the first and second substances,
for patent classification set a, a2 ═ (E2/b 2)%, a1 ═ (E1/b 1)%;
for patent classification set B, B2 ═ (E2/d 2)%, B1 ═ (E1/d 1)%;
calculating the patent technology correlation index F of the target technical scheme set A and the technical scheme subset B according to the contact ratios A2, B2, A1 and B1AOr FB(ii) a Wherein the content of the first and second substances,
for target solution sets A, FA=C2*A2+C1*A1;
For technical scheme subset B, FB=C2*B2+C1*B1;
Wherein, C2 and C1 are empirical constants;
calculating the mutual similarity probability G of the target technical scheme set A and the technical scheme subset B according to the correlation index F;
wherein:
GA=FA/(C2+C1);GB=FB/(C2+C1);
wherein G isAIs the similarity index, G, of the target technical solution set A and the technical solution subset BBIs the similarity index of the technical solution subset B and the target technical solution set A.
10. The query method of claim 5, wherein the similarity index between the target solution set A and the solution subset B is calculated according to the following steps:
respectively acquiring a small group code set B1 and the number B1 indicated by M% of international patent classification numbers in the patent classification number set A; acquiring N% of group code sets D1 indicated by international patent classification numbers in the patent classification number set B and the quantity D1; wherein, M is more than or equal to 100 and more than 0; 100 is more than or equal to N and is more than 0, and the information in the set is the information after the repetition is removed;
calculating the number E1 of the group code coincidence of the patent classification number set A, B according to the group code sets B1 and D1 of the patent classification number set A, B;
calculating the group code overlap ratio A1 and B1 of the two patent classification number sets A, B according to the group code number B1 and d1 of the patent classification number set A, B and the group code overlap number E1 of the patent classification number set A, B; wherein the content of the first and second substances,
for patent classification No. set a, a1 ═ (E1/b 1)%;
for patent classification No. set B, B1 ═ (E1/d 1)%;
calculating a patent technology correlation index F of the target technical scheme set A and the technical scheme subset B according to the contact ratio A1 and B1AOr FB(ii) a Wherein the content of the first and second substances,
for target solution sets A, FA=C1*A1;
For technical scheme subset B, FB=C1*B1;
Wherein C1 is an empirical constant;
calculating the mutual similarity probability G of the target technical scheme set A and the technical scheme subset B according to the correlation index F;
wherein:
GA=FA/(C1);GB=FB/(C1);
wherein G isAIs the similarity index, G, of the target technical solution set A and the technical solution subset BBIs the similarity index of the technical solution subset B and the target technical solution set A.
CN201910684857.2A 2019-07-26 2019-07-26 Query method for technical digital assets Active CN112307009B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201910684857.2A CN112307009B (en) 2019-07-26 2019-07-26 Query method for technical digital assets
PCT/CN2020/094407 WO2021017640A1 (en) 2019-07-26 2020-06-04 Query method of technical digital assets
FR2007744A FR3099601A1 (en) 2019-07-26 2020-07-23 Technical digital asset query method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910684857.2A CN112307009B (en) 2019-07-26 2019-07-26 Query method for technical digital assets

Publications (2)

Publication Number Publication Date
CN112307009A true CN112307009A (en) 2021-02-02
CN112307009B CN112307009B (en) 2024-07-09

Family

ID=74230051

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910684857.2A Active CN112307009B (en) 2019-07-26 2019-07-26 Query method for technical digital assets

Country Status (3)

Country Link
CN (1) CN112307009B (en)
FR (1) FR3099601A1 (en)
WO (1) WO2021017640A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113706060B (en) * 2021-10-29 2022-02-11 中国电力科学研究院有限公司 Power grid regulation and control data asset processing method, system, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105320772A (en) * 2015-11-02 2016-02-10 武汉大学 Associated paper query method for patent duplicate checking
CN108549690A (en) * 2018-04-12 2018-09-18 石家庄铁道大学 Spatial key querying method and system based on space length constraint
US20180335899A1 (en) * 2017-05-18 2018-11-22 Adobe Systems Incorporated Digital Asset Association with Search Query Data
CN109284360A (en) * 2018-09-18 2019-01-29 江苏润桐数据服务有限公司 A kind of automatic denoising method of patent retrieval and device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110246379A1 (en) * 2010-04-02 2011-10-06 Cpa Global Patent Research Limited Intellectual property scoring platform
US10891701B2 (en) * 2011-04-15 2021-01-12 Rowan TELS Corp. Method and system for evaluating intellectual property
KR20140078969A (en) * 2012-12-18 2014-06-26 (주)광개토연구소 Patent Information System and its Patent Information Providing Method Including Information on Patent Troll
CN103455609B (en) * 2013-09-05 2017-06-16 江苏大学 A kind of patent document similarity detection method based on kernel function Luke cores
CN104050235B (en) * 2014-03-27 2017-02-22 浙江大学 Distributed information retrieval method based on set selection
CN109726401B (en) * 2019-01-03 2022-09-23 中国联合网络通信集团有限公司 Patent combination generation method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105320772A (en) * 2015-11-02 2016-02-10 武汉大学 Associated paper query method for patent duplicate checking
US20180335899A1 (en) * 2017-05-18 2018-11-22 Adobe Systems Incorporated Digital Asset Association with Search Query Data
CN108549690A (en) * 2018-04-12 2018-09-18 石家庄铁道大学 Spatial key querying method and system based on space length constraint
CN109284360A (en) * 2018-09-18 2019-01-29 江苏润桐数据服务有限公司 A kind of automatic denoising method of patent retrieval and device

Also Published As

Publication number Publication date
FR3099601A1 (en) 2021-02-05
CN112307009B (en) 2024-07-09
WO2021017640A1 (en) 2021-02-04

Similar Documents

Publication Publication Date Title
Kalashnikov et al. Domain-independent data cleaning via analysis of entity-relationship graph
Liu et al. Who is. com? Learning to parse WHOIS records
Dong et al. Reference reconciliation in complex information spaces
US7680858B2 (en) Techniques for clustering structurally similar web pages
US7882139B2 (en) Content oriented index and search method and system
US7676465B2 (en) Techniques for clustering structurally similar web pages based on page features
US20100299332A1 (en) Method and system of indexing numerical data
Bhagat et al. Applying link-based classification to label blogs
CN102880721A (en) Implementation method of vertical search engine
CN113515600A (en) Automatic calculation method for spatial analysis based on metadata
Robles et al. Sampling of attributed networks from hierarchical generative models
CN112417152A (en) Topic detection method and device for case-related public sentiment
Marchese et al. Detecting mesoscale structures by surprise
CN112307009A (en) Method for inquiring technical digital assets
Chatterjee et al. SAGEL: smart address geocoding engine for supply-chain logistics
Martínez et al. Efficient model similarity estimation with robust hashing
Ferrer et al. Median graph: A new exact algorithm using a distance based on the maximum common subgraph
Ravanifard et al. Recommending content using side information
US8468163B2 (en) Ontology system providing enhanced search capability with ranking of results
Jaiswal et al. Schema matching and embedded value mapping for databases with opaque column names and mixed continuous and discrete-valued data fields
CN104794135A (en) Method and device for carrying out sorting on search results
CN112307055A (en) Retrieval method of technical open type digital assets
CN112307201A (en) Method for judging similarity degree of any two technical systems
CN114282119B (en) Scientific and technological information resource retrieval method and system based on heterogeneous information network
US20100268723A1 (en) Method of partitioning a search query to gather results beyond a search limit

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 210000 A-002, building D4, No.15 Wanshou Road, Nanjing area, China (Jiangsu) pilot Free Trade Zone, Nanjing City, Jiangsu Province

Applicant after: Aowei Co.,Ltd.

Address before: 210000 A-002, building D4, No.15 Wanshou Road, Nanjing area, China (Jiangsu) pilot Free Trade Zone, Nanjing City, Jiangsu Province

Applicant before: Jiangsu Aowei Holding Co.,Ltd.

Address after: 210000 A-002, building D4, No.15 Wanshou Road, Nanjing area, China (Jiangsu) pilot Free Trade Zone, Nanjing City, Jiangsu Province

Applicant after: Jiangsu Aowei Holding Co.,Ltd.

Address before: Room 309, 3 / F, building B, No.9 Xinghuo Road, Jiangbei new district, Nanjing City, Jiangsu Province, 210000

Applicant before: Aowei information technology (Jiangsu) Co.,Ltd.

GR01 Patent grant