CN111368062A - Verification method and device for denoising patent retrieval database - Google Patents

Verification method and device for denoising patent retrieval database Download PDF

Info

Publication number
CN111368062A
CN111368062A CN202010135180.XA CN202010135180A CN111368062A CN 111368062 A CN111368062 A CN 111368062A CN 202010135180 A CN202010135180 A CN 202010135180A CN 111368062 A CN111368062 A CN 111368062A
Authority
CN
China
Prior art keywords
database
patent document
keyword
obtaining
classification number
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010135180.XA
Other languages
Chinese (zh)
Inventor
邓梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Rainpat Data Service Co ltd
Original Assignee
Jiangsu Rainpat Data Service Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Rainpat Data Service Co ltd filed Critical Jiangsu Rainpat Data Service Co ltd
Priority to CN202010135180.XA priority Critical patent/CN111368062A/en
Publication of CN111368062A publication Critical patent/CN111368062A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3325Reformulation based on results of preceding query
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/383Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services
    • G06Q50/184Intellectual property management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Tourism & Hospitality (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Technology Law (AREA)
  • Library & Information Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Operations Research (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a verification method and a verification device for denoising a patent retrieval database, wherein a first patent database is obtained through a first keyword of a first patent document; obtaining information that a second patent document has a first classification number according to a first patent database; obtaining second classification number information according to the first patent literature, judging whether the first classification number information and the second classification number information meet first relevance, if not, obtaining a second patent database according to the first classification number information, and deleting the second patent database from the first patent database; obtaining a second patent document from a second patent database, wherein the second patent document has a second keyword; obtaining a second relevance according to the first keyword and the second keyword; and judging whether the second relevance meets a first preset condition, if so, obtaining a first recovery instruction, and recovering the second patent to obtain a third patent database as a target database. The technical problems that patent retrieval processing analysis needs to be carried out by depending on professionals, the process is complex, and the retrieval result is inaccurate in the prior art are solved.

Description

Verification method and device for denoising patent retrieval database
Technical Field
The invention relates to the technical field of data processing, in particular to a verification method and a verification device for denoising a patent retrieval database.
Background
With the continuous development and improvement of social systems, the number of patent documents is rapidly increased, so that the protection of the patent rights of enterprises in various countries is more and more important. For an enterprise, how to accurately retrieve and analyze information meeting the needs of the enterprise from a large amount of patent documents is very important for the development of the whole enterprise. In the era of intellectual economy, intellectual property rights are regarded as strategic resources for providing core competitiveness for an enterprise or even a country, and unprecedented importance is highlighted. The patent contains a large amount of technical information, and a user can acquire the technical development trend in the current technical field by searching and analyzing related patents, so that a direction is provided for later research and development, and infringement risks can be avoided. The patent literature retrieval is the basic work that enterprises comprehensively know the prior art, improves the research and development starting point and avoids intellectual property risks. Because original patent data disclosed on the internet is incomplete, language is obscure, and the original patent data is long and difficult to understand, enterprises have difficulty in searching if professional searching methods and skills are not mastered.
However, the applicant of the present invention finds that the prior art has at least the following technical problems:
in the prior art, patent retrieval processing and analysis need to be carried out by depending on professionals, the process is complex, and the technical problem of incomplete or long retrieval results exists.
Disclosure of Invention
The embodiment of the invention provides a verification method and a verification device for denoising a patent retrieval database, and solves the technical problems that in the prior art, patent retrieval processing analysis needs to be carried out by depending on professionals, the process is complex, and the retrieval result is incomplete or lengthy.
In view of the foregoing problems, embodiments of the present application are provided to provide a verification method and apparatus for denoising a patent search database.
In a first aspect, the present invention provides a verification method for denoising a patent search database, where the method includes: obtaining a first patent document, wherein the first patent document is provided with a first keyword, and a first patent database is obtained from a patent retrieval database according to the first keyword; obtaining a second patent document according to the first patent database, wherein the second patent document has first classification number information; obtaining second classification number information according to the first patent document, judging whether the first classification number information and the second classification number information meet first relevance, retrieving from the first patent database according to the first classification number information to obtain a second patent database when the first relevance is not met, and deleting the second patent database from the first patent database; obtaining a second patent document from the second patent database, the second patent document having a second keyword; obtaining a second relevance according to the first keyword and the second keyword; and judging whether the second relevance meets a first preset condition, and when the first relevance meets the first preset condition, obtaining a first recovery instruction, wherein the first recovery instruction is used for recovering the second patent document into the first patent database to obtain a third patent database, and the third patent database is a target database.
Preferably, before obtaining the first patent database from the patent retrieval database according to the first keyword, the method includes: obtaining the first patent document; determining a first patentee from the first patent document; searching the first patentee through the first query platform to obtain first competitive product information; determining a second patentee according to the first competitive product information; obtaining a third patent document from the second patentee; determining a first keyword according to the first patent document and the third patent document; and acquiring a first patent database from a patent retrieval database according to the first keyword.
Preferably, after determining whether the second relevance satisfies a first predetermined condition, the method includes: when the first relevance does not satisfy the first predetermined condition, the first patent database is taken as the target database.
Preferably, the method further comprises: retrieving from the second patent database according to the second keyword to obtain a fourth patent database; retrieving from the fourth patent database according to third classification number information to obtain a fifth patent database, wherein the third classification number information and the second classification number information satisfy the first relevance; obtaining the quantity ratio according to the fifth patent database and the fourth patent database; judging whether the number ratio meets a second preset condition; when the number ratio meets the second preset condition, obtaining a second recovery instruction, wherein the second recovery instruction is used for adding the fourth patent database into the third patent database to obtain a sixth patent database; the sixth patent database is taken as the target database.
Preferably, when the number ratio satisfies the second predetermined condition, the method includes: the fifth patent database accounts for at least 50% of the fourth patent database.
In a second aspect, the present invention provides a verification apparatus for denoising a patent search database, where the apparatus includes:
a first obtaining unit configured to obtain a first patent document having a first keyword, and obtain a first patent database from a patent search database based on the first keyword;
a second obtaining unit configured to obtain a second patent document having first classification number information from the first patent database;
a first execution unit, configured to obtain second classification number information according to the first patent document, determine whether the first classification number information and the second classification number information satisfy a first relevance, retrieve from the first patent database according to the first classification number information to obtain a second patent database when the first relevance is not satisfied, and delete the second patent database from the first patent database;
a third obtaining unit configured to obtain a second patent document from the second patent database, the second patent document having a second keyword;
a fourth obtaining unit, configured to obtain a second relevance according to the first keyword and the second keyword;
a second execution unit, configured to determine whether the second relevance meets a first predetermined condition, and obtain a first recovery instruction when the first relevance meets the first predetermined condition, where the first recovery instruction is used to recover the second patent document to the first patent database to obtain a third patent database, and the third patent database is a target database.
Preferably, the apparatus further comprises:
a fifth obtaining unit configured to obtain the first patent document;
a first determination unit configured to determine a first patentee from the first patent document;
a sixth obtaining unit, configured to search the first patentee through the first query platform to obtain first competitive product information;
the second determination unit is used for determining a second patentee according to the first competitive product information;
a seventh obtaining unit configured to obtain a third patent document according to the second patentee;
a third determining unit configured to determine a first keyword from the first patent document and the third patent document;
an eighth obtaining unit, configured to obtain the first patent database from the patent search database according to the first keyword.
Preferably, the apparatus further comprises:
a third execution unit configured to take the first patent database as the target database when the first correlation does not satisfy the first predetermined condition.
Preferably, the apparatus further comprises:
a ninth obtaining unit, configured to retrieve from the second patent database according to the second keyword, and obtain a fourth patent database;
a tenth obtaining unit, configured to retrieve from the fourth patent database according to third classification number information to obtain a fifth patent database, where the third classification number information and the second classification number information satisfy the first association;
an eleventh obtaining unit, configured to obtain a quantity ratio according to the fifth patent database and the fourth patent database;
a first judging unit configured to judge whether the number ratio satisfies a second predetermined condition;
a twelfth obtaining unit, configured to obtain a second recovery instruction when the number ratio satisfies the second predetermined condition, where the second recovery instruction is used to add the fourth patent database to the third patent database to obtain a sixth patent database;
a fourth execution unit configured to take the sixth patent database as the target database.
Preferably, the number of the fifth patent database accounts for at least 50% of the number of the fourth patent database.
In a third aspect, the present invention provides a verification apparatus for denoising a patent search database, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of any one of the above methods when executing the computer program.
In a fourth aspect, the invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of any of the methods described above.
One or more technical solutions in the embodiments of the present application have at least one or more of the following technical effects:
according to the verification method and device for denoising the patent retrieval database, provided by the embodiment of the invention, a first patent document is obtained, wherein the first patent document is provided with a first keyword, and the first patent database is obtained from the patent retrieval database according to the first keyword; obtaining a second patent document according to the first patent database, wherein the second patent document has first classification number information; obtaining second classification number information according to the first patent document, judging whether the first classification number information and the second classification number information meet first relevance, retrieving from the first patent database according to the first classification number information to obtain a second patent database when the first relevance is not met, and deleting the second patent database from the first patent database; obtaining a second patent document from the second patent database, the second patent document having a second keyword; obtaining a second relevance according to the first keyword and the second keyword; and judging whether the second relevance meets a first preset condition, and when the first relevance meets the first preset condition, obtaining a first recovery instruction, wherein the first recovery instruction is used for recovering the second patent document into the first patent database to obtain a third patent database, and the third patent database is a target database. The method has the advantages that multiple analysis of keywords and classification numbers is realized, the accuracy and the integrity of the patent retrieval target database are effectively improved through all-round processing of retrieval, denoising and validation, the process automation degree is high, dependent professionals are avoided, and the method is suitable for application of patent analysis of enterprises. Therefore, the technical problems that patent retrieval processing analysis needs to be carried out by depending on professionals, the process is complex, and the retrieval result is incomplete or long in the prior art are solved.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
FIG. 1 is a schematic flow chart illustrating a verification method for denoising a patent search database according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a verification apparatus for denoising a patent search database according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of another verification apparatus for denoising a patent search database according to an embodiment of the present invention.
Description of reference numerals: a first obtaining unit 11, a second obtaining unit 12, a first executing unit 13, a third obtaining unit 14, a fourth obtaining unit 15, a second executing unit 16, a bus 300, a receiver 301, a processor 302, a transmitter 303, a memory 304, and a bus interface 306.
Detailed Description
The embodiment of the invention provides a method and a device for verifying denoising of a patent retrieval database, which are used for solving the technical problems that patent retrieval processing analysis needs to be carried out by depending on professionals, the process is complex, and the retrieval result is incomplete or tedious in the prior art.
The technical scheme provided by the invention has the following general idea:
obtaining a first patent document, wherein the first patent document is provided with a first keyword, and a first patent database is obtained from a patent retrieval database according to the first keyword; obtaining a second patent document according to the first patent database, wherein the second patent document has first classification number information; obtaining second classification number information according to the first patent document, judging whether the first classification number information and the second classification number information meet first relevance, retrieving from the first patent database according to the first classification number information to obtain a second patent database when the first relevance is not met, and deleting the second patent database from the first patent database; obtaining a second patent document from the second patent database, the second patent document having a second keyword; obtaining a second relevance according to the first keyword and the second keyword; and judging whether the second relevance meets a first preset condition, and when the first relevance meets the first preset condition, obtaining a first recovery instruction, wherein the first recovery instruction is used for recovering the second patent document into the first patent database to obtain a third patent database, and the third patent database is a target database. The method has the advantages that multiple analysis of keywords and classification numbers is achieved, the accuracy and the integrity of the patent retrieval target database are effectively improved through all-dimensional processing of retrieval, denoising and validation, the process automation degree is high, dependent professionals are avoided, the method is suitable for patent analysis application of enterprises, and the application range is wide.
The technical solutions of the present invention are described in detail below with reference to the drawings and specific embodiments, and it should be understood that the specific features in the embodiments and examples of the present invention are described in detail in the technical solutions of the present application, and are not limited to the technical solutions of the present application, and the technical features in the embodiments and examples of the present application may be combined with each other without conflict.
The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
Example one
Fig. 1 is a schematic flow chart of a verification method for denoising a patent search database in an embodiment of the present invention. As shown in fig. 1, an embodiment of the present invention provides a verification method for denoising a patent search database, where the method includes:
step 110: a first patent document is obtained, the first patent document having a first keyword, and a first patent database is obtained from a patent search database based on the first keyword.
Further, before obtaining the first patent database from the patent retrieval database according to the first keyword, the method includes: obtaining the first patent document; determining a first patentee from the first patent document; searching the first patentee through the first query platform to obtain first competitive product information; determining a second patentee according to the first competitive product information; obtaining a third patent document from the second patentee; determining a first keyword according to the first patent document and the third patent document; and acquiring a first patent database from a patent retrieval database according to the first keyword.
Specifically, a first patent document is obtained through a patent retrieval platform, and a first patentee of the first patent document can be determined according to bibliographic information of the first patent document. The patentee is a subject to which the patent right is entitled, and the patentee refers to a unit and an individual who can apply for and obtain the patent right, that is, a subject of the patent right. The patentees include proprietary owners and holders, the former can be citizens, collective ownership units, foreign trade enterprises and Chinese and foreign joint venture enterprises, and the latter is nationwide ownership units. Patentees in turn include an original subject who originally acquired patent rights and a relay subject who subsequently acquired patent rights. The patentee enjoys the rights granted under the law and assumes the obligations prescribed by the law. After the first patent owner of the first patent is determined, for example, the first patent owner is a certain network technology company. Searching the first patentee through the first query platform, and obtaining first competitive product information having a competitive relationship with the product of the first patentee, wherein the first competitive product information is the same type competitive variety of the commodity, namely the same type competitive variety of the product of the first patentee. If the network technology company is searched in the search box checked by the enterprise, the competition information of the network technology company is provided on the search result page. And determining the unit or enterprise information of the first competitive product according to the information of the first competitive product, and further determining a second patentee of the first competitive product. And inputting the information of the second patentee into a patent retrieval platform to obtain a second patent document. The method comprises the steps of specifically analyzing the content of a first patent document and the content of a second patent document to respectively obtain core description words of the first patent document and the second patent document, determining a first keyword according to the relevance between the core words of the first patent document and the second patent document, obtaining a first core word of the first patent document and a second core word of the second patent document, and determining a first relevance between the first core word and the second core word according to semantic analysis by combining the first patent document and the second patent document, wherein the first relevance refers to whether the semantics of the first core word and the second core word are the same or similar. A first preset threshold of the first relevance of the word semantics is set, for example, the first preset threshold is 0.8. And judging whether the first relevance is greater than a first preset threshold value, and determining the first keyword according to the first core word and the second core word when the first relevance is greater than the first preset threshold value. Namely, the first core word and the second core word have higher similarity, the second core word with low first relevance is filtered and deleted, the second core word with high first relevance with the first core word is reserved, and the first keyword is determined according to the first core word and the reserved second core word. And finally, performing patent search in a search database by using the determined first keyword so as to obtain a set of all relevant patent documents related to the first keyword as a first patent database.
Step 120: from the first patent database, a second patent document is obtained, the second patent document having first classification number information.
Step 130: and obtaining second classification number information according to the first patent document, judging whether the first classification number information and the second classification number information meet first relevance, retrieving from the first patent database according to the first classification number information to obtain a second patent database when the first relevance is not met, and deleting the second patent database from the first patent database.
Specifically, a patent database obtained through keyword search usually has a certain noise, and in order to ensure the accuracy of patent documents in the patent database, it is necessary to remove noise from the patent database obtained through search, that is, to remove patent documents having a gap from target patent documents, in order to provide reliable support for evaluation and analysis of later-stage patents. The embodiment of the invention adopts the form of the patent classification number to carry out denoising treatment, the classification number of the patent literature in China is an international patent classification mode, the IPC classification number is adopted and is combined with functions and applications, a classification principle with main functionality and auxiliary applicability is adopted, a grade form is adopted, technical contents are noted to be classified step by step to form a complete classification system, and the patent information of the technical field of the product can be easily retrieved according to the international classification of a certain product. By analyzing the classification number information of the patent in the first patent database, although the first patent database is searched by keywords, wherein only the patent documents in the database can be determined to be the patent documents related to the first keywords or the patent documents with the contents of the first keywords, but the patent documents in the first patent database can not be determined to be close to the contents of the first patent documents, the requirement of the search is met, the patent documents in the first patent database are denoised once by the analysis processing of the classification number, and the patent documents which are greatly different from the first patent documents can be deleted to improve the accuracy of the patent database, such as the patent documents with only the first keywords but unrelated to the first keywords or the patent documents with the first keywords but completely different from the first patent documents, and the patents which do not meet the requirements of the category field can be removed by denoising the classification number.
Step 140: a second patent document is obtained from the second patent database, the second patent document having a second keyword.
Step 150: and obtaining a second relevance according to the first keyword and the second keyword.
Step 160: and judging whether the second relevance meets a first preset condition, and when the first relevance meets the first preset condition, obtaining a first recovery instruction, wherein the first recovery instruction is used for recovering the second patent document into the first patent database to obtain a third patent database, and the third patent database is a target database.
Specifically, in order to avoid operating the patent documents meeting the requirements in the denoising process by the classification number analysis, the embodiment of the present invention has a function of verifying the dryness result, and by specifically verifying the keywords, the second keyword requires a higher association with the target patent document, words such as the main protection element, the subject matter, the innovation point, and the like in the claims are described for the core in the first patent document, that is, the first keyword has a higher correlation, which is replaceable, when the search is performed from the second patent database, that is, the set of the patent documents deleted in the denoising process, the screening is performed from the second patent database, and when the patent document with the second keyword is obtained, the corresponding patent document should be restored to the target database, the accuracy of the target database is improved, and the comprehensiveness of the retrieval result is guaranteed. The method has the advantages that multiple analysis of keywords and classification numbers is realized, the accuracy and the integrity of the patent retrieval target database are effectively improved through all-round processing of retrieval, denoising and validation, the process automation degree is high, dependent professionals are avoided, and the method is suitable for application of patent analysis of enterprises. Therefore, the technical problems that patent retrieval processing analysis needs to be carried out by depending on professionals, the process is complex, and the retrieval result is incomplete or long in the prior art are solved.
Further, after the determining whether the second relevance meets a first predetermined condition, the method includes: when the first relevance does not satisfy the first predetermined condition, the first patent database is taken as the target database.
Specifically, if the first keyword and the second keyword do not satisfy the first predetermined condition, that is, the relationship between the first keyword and the second keyword cannot be satisfied, it is described that the correlation between the patent in the second patent database and the first patent document is insufficient, and the search requirement is not satisfied, and the patent is continuously deleted, the first patent database is continuously used as the target database, and an operation of restoring the relevant patent document in the deleted second patent database is not required. Therefore, the first patent database is the first patent database after denoising processing, that is, the first patent database after the second patent database has been deleted from the first patent database.
Further, the method further comprises: retrieving from the second patent database according to the second keyword to obtain a fourth patent database; retrieving from the fourth patent database according to third classification number information to obtain a fifth patent database, wherein the third classification number information and the second classification number information satisfy the first relevance; obtaining the quantity ratio according to the fifth patent database and the fourth patent database; judging whether the number ratio meets a second preset condition; when the number ratio meets the second preset condition, obtaining a second recovery instruction, wherein the second recovery instruction is used for adding the fourth patent database into the third patent database to obtain a sixth patent database; the sixth patent database is taken as the target database.
Further, when the number ratio satisfies the second predetermined condition, the method includes: the fifth patent database accounts for at least 50% of the fourth patent database.
Specifically, in order to further perform a second validation on the operation of the first recovery instruction, the retrieved database may be subjected to a second validation of the classification number after the retrieval process is performed by using the second keyword, so as to avoid a situation that the patent documents retrieved by using the second keyword do not match the classification number information, and the second keyword is first used to retrieve from the second patent database, from which all the patent documents related to the second keyword are obtained as a fourth patent database; and then, carrying out classification number analysis on the patent documents in the fourth patent database, and verifying by using third classification number information, wherein the third classification number information is the classification number information which meets the first relevance requirement with the second classification number information, namely the classification number information with larger relevance, if more than half of the patent documents in the fourth patent database meet the requirement or have higher requirement, such as 80%, and the like, the method is specifically set according to the retrieval requirement, and when the requirement of quantity proportion is met, the fourth patent database is recovered and added into the third patent database to form a new target database, so that the process of secondary validation is realized, and the reliability of retrieving the target database is further improved.
Example two
Based on the same inventive concept as the verification method for denoising the patent retrieval database in the foregoing embodiment, the present invention further provides a verification method device for denoising the patent retrieval database, as shown in fig. 2, the device includes:
a first obtaining unit 11, the first obtaining unit 11 being configured to obtain a first patent document, the first patent document having a first keyword, and obtain a first patent database from a patent search database according to the first keyword;
a second obtaining unit 12, wherein the second obtaining unit 12 is configured to obtain a second patent document according to the first patent database, and the second patent document has the first classification number information;
a first executing unit 13, where the first executing unit 13 is configured to obtain second classification number information according to the first patent document, determine whether the first classification number information and the second classification number information satisfy a first relevance, retrieve from the first patent database according to the first classification number information to obtain a second patent database when the first relevance is not satisfied, and delete the second patent database from the first patent database;
a third obtaining unit 14, wherein the third obtaining unit 14 is configured to obtain a second patent document from the second patent database, and the second patent document has a second keyword;
a fourth obtaining unit 15, where the fourth obtaining unit 15 is configured to obtain a second relevance according to the first keyword and the second keyword;
a second executing unit 16, where the second executing unit 16 is configured to determine whether the second relevance meets a first predetermined condition, and obtain a first recovery instruction when the first relevance meets the first predetermined condition, where the first recovery instruction is used to recover the second patent document to the first patent database to obtain a third patent database, and the third patent database is a target database.
Further, the apparatus further comprises:
a fifth obtaining unit configured to obtain the first patent document;
a first determination unit configured to determine a first patentee from the first patent document;
a sixth obtaining unit, configured to search the first patentee through the first query platform to obtain first competitive product information;
the second determination unit is used for determining a second patentee according to the first competitive product information;
a seventh obtaining unit configured to obtain a third patent document according to the second patentee;
a third determining unit configured to determine a first keyword from the first patent document and the third patent document;
an eighth obtaining unit, configured to obtain the first patent database from the patent search database according to the first keyword.
Further, the apparatus further comprises:
a third execution unit configured to take the first patent database as the target database when the first correlation does not satisfy the first predetermined condition.
Further, the apparatus further comprises:
a ninth obtaining unit, configured to retrieve from the second patent database according to the second keyword, and obtain a fourth patent database;
a tenth obtaining unit, configured to retrieve from the fourth patent database according to third classification number information to obtain a fifth patent database, where the third classification number information and the second classification number information satisfy the first association;
an eleventh obtaining unit, configured to obtain a quantity ratio according to the fifth patent database and the fourth patent database;
a first judging unit configured to judge whether the number ratio satisfies a second predetermined condition;
a twelfth obtaining unit, configured to obtain a second recovery instruction when the number ratio satisfies the second predetermined condition, where the second recovery instruction is used to add the fourth patent database to the third patent database to obtain a sixth patent database;
a fourth execution unit configured to take the sixth patent database as the target database.
Further, the fifth patent database accounts for at least 50% of the fourth patent database.
Various changes and specific examples of the verification method for denoising the patent search database in the first embodiment of fig. 1 are also applicable to the verification device for denoising the patent search database in the present embodiment, and through the foregoing detailed description of the verification method for denoising the patent search database, those skilled in the art can clearly know the implementation method of the verification device for denoising the patent search database in the present embodiment, so for the brevity of the description, detailed descriptions are omitted here.
EXAMPLE III
Based on the same inventive concept as the verification method for denoising the patent search database in the foregoing embodiment, the present invention further provides a verification apparatus for denoising the patent search database, as shown in fig. 3, including a memory 304, a processor 302, and a computer program stored in the memory 304 and operable on the processor 302, wherein the processor 302 implements the steps of any one of the methods for denoising the patent search database when executing the program.
Where in fig. 3 a bus architecture (represented by bus 300), bus 300 may include any number of interconnected buses and bridges, bus 300 linking together various circuits including one or more processors, represented by processor 302, and memory, represented by memory 304. The bus 300 may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface 306 provides an interface between the bus 300 and the receiver 301 and transmitter 303. The receiver 301 and the transmitter 303 may be the same element, i.e., a transceiver, providing a means for communicating with various other apparatus over a transmission medium. The processor 302 is responsible for managing the bus 300 and general processing, and the memory 304 may be used for storing data used by the processor 302 in performing operations.
Example four
Based on the same inventive concept as the verification method for denoising the patent retrieval database in the foregoing embodiment, the present invention further provides a computer-readable storage medium having a computer program stored thereon, where the computer program, when executed by a processor, implements the following steps: obtaining a first patent document, wherein the first patent document is provided with a first keyword, and a first patent database is obtained from a patent retrieval database according to the first keyword; obtaining a second patent document according to the first patent database, wherein the second patent document has first classification number information; obtaining second classification number information according to the first patent document, judging whether the first classification number information and the second classification number information meet first relevance, retrieving from the first patent database according to the first classification number information to obtain a second patent database when the first relevance is not met, and deleting the second patent database from the first patent database; obtaining a second patent document from the second patent database, the second patent document having a second keyword; obtaining a second relevance according to the first keyword and the second keyword; and judging whether the second relevance meets a first preset condition, and when the first relevance meets the first preset condition, obtaining a first recovery instruction, wherein the first recovery instruction is used for recovering the second patent document into the first patent database to obtain a third patent database, and the third patent database is a target database.
In a specific implementation, when the program is executed by a processor, any method step in the first embodiment may be further implemented.
One or more technical solutions in the embodiments of the present application have at least one or more of the following technical effects:
according to the verification method and device for denoising the patent retrieval database, provided by the embodiment of the invention, a first patent document is obtained, wherein the first patent document is provided with a first keyword, and the first patent database is obtained from the patent retrieval database according to the first keyword; obtaining a second patent document according to the first patent database, wherein the second patent document has first classification number information; obtaining second classification number information according to the first patent document, judging whether the first classification number information and the second classification number information meet first relevance, retrieving from the first patent database according to the first classification number information to obtain a second patent database when the first relevance is not met, and deleting the second patent database from the first patent database; obtaining a second patent document from the second patent database, the second patent document having a second keyword; obtaining a second relevance according to the first keyword and the second keyword; and judging whether the second relevance meets a first preset condition, and when the first relevance meets the first preset condition, obtaining a first recovery instruction, wherein the first recovery instruction is used for recovering the second patent document into the first patent database to obtain a third patent database, and the third patent database is a target database. The method has the advantages that multiple analysis of keywords and classification numbers is realized, the accuracy and the integrity of the patent retrieval target database are effectively improved through all-round processing of retrieval, denoising and validation, the process automation degree is high, dependent professionals are avoided, and the method is suitable for application of patent analysis of enterprises. Therefore, the technical problems that patent retrieval processing analysis needs to be carried out by depending on professionals, the process is complex, and the retrieval result is incomplete or long in the prior art are solved.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. A verification method for denoising a patent retrieval database is characterized by comprising the following steps:
obtaining a first patent document, wherein the first patent document is provided with a first keyword, and a first patent database is obtained from a patent retrieval database according to the first keyword;
obtaining a second patent document according to the first patent database, wherein the second patent document has first classification number information;
obtaining second classification number information according to the first patent document, judging whether the first classification number information and the second classification number information meet first relevance, retrieving from the first patent database according to the first classification number information to obtain a second patent database when the first relevance is not met, and deleting the second patent database from the first patent database;
obtaining a second patent document from the second patent database, the second patent document having a second keyword;
obtaining a second relevance according to the first keyword and the second keyword;
and judging whether the second relevance meets a first preset condition, and when the first relevance meets the first preset condition, obtaining a first recovery instruction, wherein the first recovery instruction is used for recovering the second patent document into the first patent database to obtain a third patent database, and the third patent database is a target database.
2. The method of claim 1, wherein prior to obtaining a first patent database from a patent search database based on the first keyword, comprising:
obtaining the first patent document;
determining a first patentee from the first patent document;
searching the first patentee through the first query platform to obtain first competitive product information;
determining a second patentee according to the first competitive product information;
obtaining a third patent document from the second patentee;
determining a first keyword according to the first patent document and the third patent document;
and acquiring a first patent database from a patent retrieval database according to the first keyword.
3. The method of claim 1, wherein said determining whether the second association satisfies a first predetermined condition comprises:
when the first relevance does not satisfy the first predetermined condition, the first patent database is taken as the target database.
4. The method of claim 1, wherein the method further comprises:
retrieving from the second patent database according to the second keyword to obtain a fourth patent database;
retrieving from the fourth patent database according to third classification number information to obtain a fifth patent database, wherein the third classification number information and the second classification number information satisfy the first relevance;
obtaining the quantity ratio according to the fifth patent database and the fourth patent database;
judging whether the number ratio meets a second preset condition;
when the number ratio meets the second preset condition, obtaining a second recovery instruction, wherein the second recovery instruction is used for adding the fourth patent database into the third patent database to obtain a sixth patent database;
the sixth patent database is taken as the target database.
5. The method of claim 4, wherein when the number ratio satisfies the second predetermined condition, comprising:
the fifth patent database accounts for at least 50% of the fourth patent database.
6. A verification apparatus for denoising a patent search database, the apparatus comprising:
a first obtaining unit configured to obtain a first patent document having a first keyword, and obtain a first patent database from a patent search database based on the first keyword;
a second obtaining unit configured to obtain a second patent document having first classification number information from the first patent database;
a first execution unit, configured to obtain second classification number information according to the first patent document, determine whether the first classification number information and the second classification number information satisfy a first relevance, retrieve from the first patent database according to the first classification number information to obtain a second patent database when the first relevance is not satisfied, and delete the second patent database from the first patent database;
a third obtaining unit configured to obtain a second patent document from the second patent database, the second patent document having a second keyword;
a fourth obtaining unit, configured to obtain a second relevance according to the first keyword and the second keyword;
a second execution unit, configured to determine whether the second relevance meets a first predetermined condition, and obtain a first recovery instruction when the first relevance meets the first predetermined condition, where the first recovery instruction is used to recover the second patent document to the first patent database to obtain a third patent database, and the third patent database is a target database.
7. The apparatus of claim 6, wherein the apparatus further comprises:
a fifth obtaining unit configured to obtain the first patent document;
a first determination unit configured to determine a first patentee from the first patent document;
a sixth obtaining unit, configured to search the first patentee through the first query platform to obtain first competitive product information;
the second determination unit is used for determining a second patentee according to the first competitive product information;
a seventh obtaining unit configured to obtain a third patent document according to the second patentee;
a third determining unit configured to determine a first keyword from the first patent document and the third patent document;
an eighth obtaining unit, configured to obtain the first patent database from the patent search database according to the first keyword.
8. The apparatus of claim 6, wherein the apparatus further comprises:
a third execution unit configured to take the first patent database as the target database when the first correlation does not satisfy the first predetermined condition.
9. A verification apparatus for denoising a patent search database, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method according to any one of claims 1 to 5 when executing the program.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.
CN202010135180.XA 2020-03-02 2020-03-02 Verification method and device for denoising patent retrieval database Withdrawn CN111368062A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010135180.XA CN111368062A (en) 2020-03-02 2020-03-02 Verification method and device for denoising patent retrieval database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010135180.XA CN111368062A (en) 2020-03-02 2020-03-02 Verification method and device for denoising patent retrieval database

Publications (1)

Publication Number Publication Date
CN111368062A true CN111368062A (en) 2020-07-03

Family

ID=71206518

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010135180.XA Withdrawn CN111368062A (en) 2020-03-02 2020-03-02 Verification method and device for denoising patent retrieval database

Country Status (1)

Country Link
CN (1) CN111368062A (en)

Similar Documents

Publication Publication Date Title
Lazar et al. Generating duplicate bug datasets
RU2591175C1 (en) Method and system for global identification in collection of documents
CN110442847B (en) Code similarity detection method and device based on code warehouse process management
CN107102993B (en) User appeal analysis method and device
CN112199512B (en) Scientific and technological service-oriented case map construction method, device, equipment and storage medium
CN110321466A (en) A kind of security information duplicate checking method and system based on semantic analysis
CN112000929A (en) Cross-platform data analysis method, system, equipment and readable storage medium
Du et al. SemCluster: a semi-supervised clustering tool for crowdsourced test reports with deep image understanding
CN108009298B (en) Internet character search information integration analysis control method
CN113901169A (en) Information processing method, information processing device, electronic equipment and storage medium
CN110069455B (en) File merging method and device
CN110413307A (en) Correlating method, device and the electronic equipment of code function
US8903754B2 (en) Programmatically identifying branding within assets
KR20180077397A (en) System for constructing software project relationship and method thereof
CN111368062A (en) Verification method and device for denoising patent retrieval database
CN111274364A (en) Automatic denoising method and device based on keyword retrieval data
CN111291094A (en) Retrieval method and device based on keywords and multi-platform classification numbers
EP3547154B1 (en) Constraint satisfaction software tool for database tables
Li Feature and variability extraction from natural language software requirements specifications
CN111339243A (en) Method and device for denoising and checking retrieval data based on competitive product information
CN111274229A (en) Method and device for verifying denoising result of retrieved data
Rattan et al. Detecting high level similarities in source code and beyond
CN111353023A (en) Target database optimization method and device based on keyword retrieval
Alshara et al. Pi-link: A ground-truth dataset of links between pull-requests and issues in github
CN111324726A (en) Method and device for automatically drying patent database

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20200703

WW01 Invention patent application withdrawn after publication