CN111309895A - Automatic denoising method and device for retrieval data - Google Patents

Automatic denoising method and device for retrieval data Download PDF

Info

Publication number
CN111309895A
CN111309895A CN202010092938.6A CN202010092938A CN111309895A CN 111309895 A CN111309895 A CN 111309895A CN 202010092938 A CN202010092938 A CN 202010092938A CN 111309895 A CN111309895 A CN 111309895A
Authority
CN
China
Prior art keywords
classification number
number information
target document
denoising
document
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010092938.6A
Other languages
Chinese (zh)
Inventor
邓梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Rainpat Data Service Co ltd
Original Assignee
Jiangsu Rainpat Data Service Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Rainpat Data Service Co ltd filed Critical Jiangsu Rainpat Data Service Co ltd
Priority to CN202010092938.6A priority Critical patent/CN111309895A/en
Publication of CN111309895A publication Critical patent/CN111309895A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/11Patent retrieval

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an automatic denoising method and device for retrieval data, which are used for obtaining first classification number information of a first document; searching and obtaining a first patent with a second classification number from a first target document, and judging whether a first correlation is met according to the first classification number and the second classification number; when the first target document is satisfied, acquiring a first denoising instruction, and searching the first target document according to the second classification number information to acquire a second target document; and according to the first denoising instruction and the second target document, obtaining a second denoising instruction, and deleting the second target document from the first target document to obtain a third target document. The technical problems that the existing denoising method is manual screening, the denoising process is time-consuming and labor-consuming, omission is easily caused, and denoising is incomplete are solved. The method achieves the technical effects of effectively comparing classification number information, realizing automatic denoising of retrieval data, ensuring the accuracy of retrieval results and the comprehensiveness of denoising, avoiding the complexity of manual denoising, and improving the retrieval efficiency.

Description

Automatic denoising method and device for retrieval data
Technical Field
The invention relates to the technical field of data processing, in particular to an automatic denoising method and device for retrieval data.
Background
Patent document search is to search for patents and patent documents. Chinese Patent Retrieval System (CPRS): the patent retrieval and full text browsing system is only used in a local area network of the national intellectual property office. The system comprises: the full text of the data recorded in the three Chinese patents and the invention and the utility model since 1985; bibliographic data and full text descriptions of U.S. patents since 1975; the entire descriptions of the patents and utility models have been filed since 1993. The patent literature retrieval is the basic work that enterprises comprehensively know the prior art, improves the research and development starting point and avoids intellectual property risks. Because original patent data disclosed on the internet is incomplete, language is obscure, and the original patent data is long and difficult to understand, enterprises have difficulty in searching if professional searching methods and skills are not mastered. With the continuous development and improvement of social systems, the number of patent documents is rapidly increased, so that the protection of the patent rights of enterprises in various countries is more and more important. For an enterprise, how to accurately retrieve and analyze information meeting the needs of the enterprise from a large amount of patent documents is very important for the development of the whole enterprise.
However, the applicant of the present invention finds that the prior art has at least the following technical problems:
the denoising of the retrieval result in the prior art is carried out by manual screening, so that the technical problems that the denoising process is time-consuming and labor-consuming, omission is easily caused, the denoising is incomplete, and the accuracy of the retrieval result cannot be guaranteed exist.
Disclosure of Invention
The embodiment of the invention provides an automatic denoising method and device for retrieval data, and solves the technical problems that denoising of retrieval results is performed by manual screening, the denoising process is time-consuming and labor-consuming, omission is easy to cause, denoising is incomplete, and accuracy of the retrieval results cannot be guaranteed in the prior art.
In view of the foregoing problems, the present application provides an automatic denoising method and apparatus for search data.
In a first aspect, the present invention provides an automatic denoising method for search data, the method comprising: according to a first document, obtaining first classification number information; obtaining a first target document, wherein the first target document is a retrieval document set determined according to the first document; retrieving from the first target document to obtain a first patent, the first patent having second classification number information; judging whether the second classification number information and the first classification number information meet a first correlation or not according to the first classification number information and the second classification number information; when the second classification number information and the first classification number information do not meet the first correlation, a first denoising instruction is obtained according to first document classification number information, and the first denoising instruction is used for retrieving in the first target document according to the second classification number information to obtain a second target document; and obtaining a second denoising instruction according to the first denoising instruction and a second target document, wherein the second denoising instruction is used for deleting the second target document from the first target document to obtain a third target document.
Preferably, before the obtaining the first target document, the method includes: obtaining a first keyword according to the first document; obtaining a second keyword according to the first document, wherein the first keyword is different from the second keyword; determining a second relevance according to the first keyword and the second keyword; judging whether the second relevance meets a first preset threshold value or not; when the second relevance meets the first preset threshold value, determining a first search term according to the first keyword and the second keyword; and obtaining a first retrieval instruction according to the first retrieval word, wherein the first retrieval instruction is used for retrieving according to the first retrieval word to obtain a first target document.
Preferably, after determining whether the second correlation satisfies a first predetermined threshold, the method includes: when the second relevance does not meet the first preset threshold value, obtaining a first attribute according to the first classification number information; obtaining a second attribute according to the first keyword; obtaining a third relevance according to the first attribute and the second attribute; judging whether the third relevance meets a second preset threshold value or not; and when the third relevance meets the second preset threshold, obtaining a second retrieval instruction, wherein the second retrieval instruction is used for retrieving according to the first keyword to obtain a first target document.
Preferably, after determining whether the third correlation satisfies a second predetermined threshold, the method includes: when the third relevance does not meet the third preset threshold value, obtaining a third attribute according to the second keyword; obtaining a fourth relevance according to the first attribute and the third attribute; determining whether the fourth correlation satisfies the second predetermined threshold; and when the fourth relevance meets the second preset threshold value, obtaining a second retrieval instruction, wherein the second retrieval instruction is used for retrieving according to the second keyword to obtain a first target document.
Preferably, the method further comprises: obtaining a second patent having third classification number information other than the second classification number information according to the second target document; judging whether the first correlation is met or not according to the third classification number information and the first classification number information; when the third classification number information and the first classification number information meet the first correlation, obtaining a third denoising instruction, wherein the third denoising instruction is used for retrieving in the second target document according to the third classification number information to obtain a fourth target document; and obtaining a fifth denoising instruction according to the fourth target document and the third denoising instruction, wherein the fifth denoising instruction is used for adding the fourth target document into the third target document to obtain a fifth target document.
Preferably, the method further comprises: according to the third target document, obtaining a third patent, wherein the third patent is a patent with multiple classification numbers and has fourth classification number information; judging whether fifth correlation is met or not according to the fourth classification number information and the first classification number information; when the fourth classification number information and the first classification number information meet the fifth correlation, a sixth denoising instruction is obtained, and the sixth denoising instruction is used for retrieving in the third target document according to the fourth classification number information to obtain a sixth target document; and obtaining a seventh denoising instruction according to the sixth target document and the sixth denoising instruction, wherein the seventh denoising instruction is used for deleting the sixth target document from the third target document to obtain an eighth target document.
In a second aspect, the present invention provides an automatic denoising apparatus for retrieving data, the apparatus comprising:
a first obtaining unit configured to obtain first classification number information according to a first document;
a second obtaining unit, configured to obtain a first target document, where the first target document is a search document set determined according to the first document;
a third obtaining unit, configured to retrieve from the first target document, and obtain a first patent, where the first patent has second classification number information;
the first judging unit is used for judging whether the second classification number information and the first classification number information meet a first correlation or not according to the first classification number information and the second classification number information;
a fourth obtaining unit, configured to obtain a first denoising instruction according to first document classification number information when the second classification number information and the first classification number information do not satisfy the first correlation, where the first denoising instruction is used to retrieve in the first target document according to the second classification number information to obtain a second target document;
a fifth obtaining unit, configured to obtain a second denoising instruction according to the first denoising instruction and a second target document, where the second denoising instruction is used to delete the second target document from the first target document, and obtain a third target document.
Preferably, the apparatus further comprises:
a sixth obtaining unit, configured to obtain a first keyword according to the first document;
a seventh obtaining unit, configured to obtain a second keyword according to the first document, where the first keyword is different from the second keyword;
the first determining unit is used for determining second relevance according to the first keyword and the second keyword;
a second determination unit configured to determine whether the second correlation satisfies a first predetermined threshold;
a second determining unit, configured to determine a first search term according to the first keyword and a second keyword when the second relevance satisfies the first predetermined threshold;
an eighth obtaining unit, configured to obtain a first search instruction according to the first search term, where the first search instruction is used to perform a search according to the first search term to obtain a first target document.
Preferably, the apparatus further comprises:
a ninth obtaining unit configured to obtain a first attribute according to the first classification number information when the second relevance does not satisfy the first predetermined threshold;
a tenth obtaining unit, configured to obtain a second attribute according to the first keyword;
an eleventh obtaining unit, configured to obtain a third relevance according to the first attribute and the second attribute;
a third judging unit configured to judge whether the third correlation satisfies a second predetermined threshold;
a twelfth obtaining unit, configured to obtain a second retrieval instruction when the third association satisfies the second predetermined threshold, where the second retrieval instruction is used to perform retrieval according to the first keyword to obtain a first target document.
Preferably, the apparatus further comprises:
a thirteenth obtaining unit, configured to obtain a third attribute according to the second keyword when the third relevance does not satisfy the third predetermined threshold;
a fourteenth obtaining unit, configured to obtain a fourth relevance according to the first attribute and the third attribute;
a fourth determination unit configured to determine whether the fourth relevance satisfies the second predetermined threshold;
a fifteenth obtaining unit, configured to obtain a second retrieval instruction when the fourth relevance satisfies the second predetermined threshold, where the second retrieval instruction is used to perform retrieval according to the second keyword to obtain a first target document.
Preferably, the apparatus further comprises:
a sixteenth obtaining unit configured to obtain, from the second target document, a second patent having third classification number information other than the second classification number information;
a fifth judging unit, configured to judge whether the first correlation is satisfied according to the third classification number information and the first classification number information;
a seventeenth obtaining unit, configured to obtain a third denoising instruction when the third classification number information and the first classification number information satisfy the first correlation, where the third denoising instruction is used to perform retrieval in the second target document according to the third classification number information to obtain a fourth target document;
an eighteenth obtaining unit, configured to obtain a fifth denoising instruction according to the fourth target document and the third denoising instruction, where the fifth denoising instruction is used to add the fourth target document to the third target document to obtain a fifth target document.
Preferably, the apparatus further comprises:
a nineteenth obtaining unit configured to obtain, according to the third target document, a third patent, which is a patent having a plurality of classification numbers, the third patent having fourth classification number information;
a fifth judging unit, configured to judge whether a fifth correlation is satisfied according to the fourth classification number information and the first classification number information;
a twentieth obtaining unit, configured to obtain a sixth denoising instruction when the fourth classification number information and the first classification number information satisfy the fifth correlation, where the sixth denoising instruction is used to retrieve from the third target document according to the fourth classification number information to obtain a sixth target document;
a twenty-first obtaining unit, configured to obtain a seventh denoising instruction according to the sixth target document and the sixth denoising instruction, where the seventh denoising instruction is used to delete the sixth target document from the third target document and obtain an eighth target document.
In a third aspect, the present invention provides an automatic denoising device for retrieving data, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of any one of the above methods when executing the program.
In a fourth aspect, the invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of any of the methods described above.
One or more technical solutions in the embodiments of the present application have at least one or more of the following technical effects:
according to the automatic denoising method and device for the retrieval data, provided by the embodiment of the invention, first classification number information is obtained according to a first document; obtaining a first target document, wherein the first target document is a retrieval document set determined according to the first document; retrieving from the first target document to obtain a first patent, the first patent having second classification number information; judging whether the second classification number information and the first classification number information meet a first correlation or not according to the first classification number information and the second classification number information; when the second classification number information and the first classification number information do not meet the first correlation, a first denoising instruction is obtained according to first document classification number information, and the first denoising instruction is used for retrieving in the first target document according to the second classification number information to obtain a second target document; and obtaining a second denoising instruction according to the first denoising instruction and a second target document, wherein the second denoising instruction is used for deleting the second target document from the first target document to obtain a third target document. The effect of automatically denoising retrieval data is achieved, effective comparison of classification number information is utilized, denoising is conducted on patent documents of which classification numbers do not meet retrieval requirements in retrieved patent documents, accuracy of retrieval results is guaranteed, meanwhile, due to the fact that the whole process is automatically conducted, the technical effect of a complex process of manual denoising is avoided, and therefore the technical problems that denoising conducted on the retrieval results in the prior art is conducted through manual screening, the denoising process is time-consuming and labor-consuming, omission is easily caused, denoising is incomplete, and accuracy of the retrieval results cannot be guaranteed are solved.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
FIG. 1 is a schematic flow chart of an automatic denoising method for search data according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of an automatic denoising apparatus for retrieving data according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of another automatic denoising apparatus for retrieving data according to an embodiment of the present invention.
Description of reference numerals: a first obtaining unit 11, a second obtaining unit 12, a third obtaining unit 13, a first judging unit 14, a fourth obtaining unit 15, a fifth obtaining unit 16, a bus 300, a receiver 301, a processor 302, a transmitter 303, a memory 304, and a bus interface 306.
Detailed Description
The embodiment of the invention provides an automatic denoising method and device for retrieval data, which are used for solving the technical problems that denoising of retrieval results is performed by manual screening in the prior art, the denoising process is time-consuming and labor-consuming, omission is easily caused, denoising is incomplete, and the accuracy of the retrieval results cannot be guaranteed.
The technical scheme provided by the invention has the following general idea:
according to a first document, obtaining first classification number information; obtaining a first target document, wherein the first target document is a retrieval document set determined according to the first document; retrieving from the first target document to obtain a first patent, the first patent having second classification number information; judging whether the second classification number information and the first classification number information meet a first correlation or not according to the first classification number information and the second classification number information; when the second classification number information and the first classification number information do not meet the first correlation, a first denoising instruction is obtained according to first document classification number information, and the first denoising instruction is used for retrieving in the first target document according to the second classification number information to obtain a second target document; and obtaining a second denoising instruction according to the first denoising instruction and a second target document, wherein the second denoising instruction is used for deleting the second target document from the first target document to obtain a third target document. The method has the advantages that the classification number information is effectively compared, the retrieval data is automatically denoised, patent documents with classification numbers which do not meet retrieval requirements in retrieval results are removed, the accuracy of the retrieval results and the comprehensiveness of denoising are ensured, meanwhile, the whole process is automatically performed, the complexity of manual denoising is avoided, and the technical effect of retrieval efficiency is improved.
The technical solutions of the present invention are described in detail below with reference to the drawings and specific embodiments, and it should be understood that the specific features in the embodiments and examples of the present invention are described in detail in the technical solutions of the present application, and are not limited to the technical solutions of the present application, and the technical features in the embodiments and examples of the present application may be combined with each other without conflict.
The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
Example one
Fig. 1 is a schematic flow chart of an automatic denoising method for search data according to an embodiment of the present invention. As shown in fig. 1, an embodiment of the present invention provides an automatic denoising method for search data, where the method includes:
step 110: according to the first document, first classification number information is obtained.
Specifically, the first document is a patent document to be searched, and is determined as first classification number information according to the classification number of the first document, and the classification number of the patent is identified by using an IPC international patent classification table. When the same patent may have several classification numbers, the first of which is called the principal classification number. If an invention patent application or utility model patent application relates to technical subjects of different types and the technical subjects constitute the invention information, multiple classification should be performed according to the technical subjects involved, and a plurality of classification numbers should be given. The classification number which can represent the invention information most fully is arranged at the first position. The classification table is a tool for uniformly classifying the patent documents of each country. Its primary purpose is to serve as an effective search tool for patent literature searches conducted by various patent offices and other users in determining the novelty and creativity of patent applications, including evaluation of technical advancement and practical value.
Step 120: a first target document is obtained, wherein the first target document is a search document set determined according to the first document.
Further, before the obtaining the first target document, the method includes: obtaining a first keyword according to the first document; obtaining a second keyword according to the first document, wherein the first keyword is different from the second keyword; determining a second relevance according to the first keyword and the second keyword; judging whether the second relevance meets a first preset threshold value or not; when the second relevance meets the first preset threshold value, determining a first search term according to the first keyword and the second keyword; and obtaining a first retrieval instruction according to the first retrieval word, wherein the first retrieval instruction is used for retrieving according to the first retrieval word to obtain a first target document.
Further, after determining whether the second association satisfies a first predetermined threshold, the method includes: when the second relevance does not meet the first preset threshold value, obtaining a first attribute according to the first classification number information; obtaining a second attribute according to the first keyword; obtaining a third relevance according to the first attribute and the second attribute; judging whether the third relevance meets a second preset threshold value or not; and when the third relevance meets the second preset threshold, obtaining a second retrieval instruction, wherein the second retrieval instruction is used for retrieving according to the first keyword to obtain a first target document.
Further, after determining whether the third correlation satisfies a second predetermined threshold, the method includes: when the third relevance does not meet the third preset threshold value, obtaining a third attribute according to the second keyword; obtaining a fourth relevance according to the first attribute and the third attribute; determining whether the fourth correlation satisfies the second predetermined threshold; and when the fourth relevance meets the second preset threshold value, obtaining a second retrieval instruction, wherein the second retrieval instruction is used for retrieving according to the second keyword to obtain a first target document.
Specifically, when novelty and creativity of the first document are judged, or when a company evaluates the competitiveness of the patent content of the company, the first document to be searched is used for searching, so that the set of all related patents and patents threatening the related patents is the first target document, namely the sum of all searched patent documents. Generally, a search process utilizes keywords and a plurality of classification numbers, a first target document is obtained by utilizing the keywords to perform automatic search, manual search is avoided, the keywords are obtained according to a first document, namely a patent or a file to be searched, the keywords can be multiple, the relevance between the keywords is judged, if the preset threshold is met, the relevance between the two keywords is larger, the requirement is met, if the preset threshold is not met, the relevance between the two keywords is smaller, the requirement is not met, if the relevance meets the requirement, the main core word of the document is the keyword or is related to the keyword, analysis is performed according to the obtained keywords, the relation among the keywords is determined according to the part of speech, the attribute, the action and the like, the search word is determined according to the relevance between the keywords, the attribute can be determined according to the relation between the two keywords, so as to determine the related multiple keywords, or directly determining the first keyword and the second keyword as search terms, and finally searching by using the search terms, thereby obtaining all patent document sets related to the search terms, which are the first target documents. When the relevance between the first keyword and the second keyword does not meet the requirement of a set threshold, the difference between the two keywords is large, at the moment, the classification number information of the document to be retrieved is used for determining the attribute of the document, the first keyword is subjected to attribute analysis, if the attribute of the first keyword is close to the attribute of the classification number of the first document and accords with the judgment of the relevance, the first keyword is used as a search word for searching the patent document, and the first target document is obtained. If the correlation between the attribute of the first keyword and the attribute of the first classification number information does not meet the requirement of a threshold value, performing attribute analysis on the second keyword, judging whether the second keyword and the attribute of the first classification number meet the correlation requirement, and if the correlation requirement is met, indicating that the second keyword is closer to the content classification of the first document and meets the retrieval requirement, determining the second keyword as the retrieval word at the moment, and performing patent document retrieval to obtain retrieval data corresponding to the keyword, wherein the retrieval data is the first target document.
Step 130: and searching from the first target document to obtain a first patent, wherein the first patent has second classification number information.
Specifically, a first target document obtained by searching is searched, a patent IPC classification number in the first target document is determined, the classification number of the patent is compared with the patent number of the first document, and classification number information different from that of the first document is screened out as second classification number information.
Step 140: and judging whether the second classification number information and the first classification number information meet a first correlation or not according to the first classification number information and the second classification number information.
Specifically, the classification number information of the first document is compared with the second classification number information, which is the other searched classification number information, to determine whether the two satisfy the requirement of correlation, wherein the first correlation is that the searched patent classification number is closer to the patent classification number of the first document, such as the same large classification or a similar or close classification.
Step 150: and when the second classification number information and the first classification number information do not meet the first correlation, obtaining a first denoising instruction according to the first document classification number information, wherein the first denoising instruction is used for retrieving in the first target document according to the second classification number information to obtain a second target document.
Specifically, when the second classification number information and the first classification number information do not satisfy the first correlation, that is, the difference between the second classification number and the first classification number is large, and does not satisfy the set correlation requirement, it is described that the patent corresponding to the second classification number information has a large difference from the first document, and does not satisfy the requirement of the search, a denoising process should be performed, the corresponding first denoising instruction is obtained, the patent document belonging to the second classification number in the first target document is deleted, and the search is performed in the first target document according to the second classification number information, so that the patent document whose classification number is the second classification number information in the first target document is obtained.
Step 160: and obtaining a second denoising instruction according to the first denoising instruction and a second target document, wherein the second denoising instruction is used for deleting the second target document from the first target document to obtain a third target document.
Specifically, according to a first denoising instruction and a second target document determined through retrieval, a corresponding second denoising instruction is obtained, the second target document is deleted from the first target document, namely, the patent document which does not meet the requirement of the classification number is denoised from the retrieval data. Thereby realized carrying out the effect of automatic denoising to retrieval data, utilize the effective comparison of classification number information, to denoising the patent document that classification number does not satisfy the retrieval requirement in the patent document retrieved, ensure the accuracy of retrieval result, simultaneously because whole process is automatic goes on, the loaded down with trivial details process of artifical denoising has been avoided, and then solved among the prior art to denoise the retrieval result and utilized artifical screening to go on, there is the process of denoising to waste time and energy, and cause easily and omit, it is incomplete to denoise, can't guarantee the technical problem of retrieval result accuracy.
Further, the method further comprises: obtaining a second patent having third classification number information other than the second classification number information according to the second target document; judging whether the first correlation is met or not according to the third classification number information and the first classification number information; when the third classification number information and the first classification number information meet the first correlation, obtaining a third denoising instruction, wherein the third denoising instruction is used for retrieving in the second target document according to the third classification number information to obtain a fourth target document; and obtaining a fifth denoising instruction according to the fourth target document and the third denoising instruction, wherein the fifth denoising instruction is used for adding the fourth target document into the third target document to obtain a fifth target document.
Specifically, since there is a case where one patent document has more than one classification number, and in order to avoid the situation where the patent document meeting the requirement is erroneously operated in the denoising process, the patent in the target document is further analyzed, since the second target document is deleted according to the second patent number information, and there is also a possibility that the patent has other classification numbers, and the requirement of correlation between the classification number and the first classification number is satisfied, at this time, the patent is deleted for denoising, and there is a consequence of erroneous denoising, so that the accuracy of the retrieval data is affected, and in order to avoid this situation, the patent in the second target document is screened to obtain a patent containing not only the second classification number information but also other classification numbers, which are called as third classification number information, and the third classification number information is analyzed with the first classification number information of the first document, if the third classification number information and the first classification number information meet the relevance requirement, namely the two are relatively close and belong to the same large class or similar, related, cross subject, upstream-downstream relation and the like, a denoising instruction is correspondingly obtained, retrieval is carried out in a second target document according to the third classification number information, a fourth target document is determined, the document is a patent document with the third classification number information in the second target document, namely the document with the wrong deletion exists, the document is screened at the moment and is added into the third target document again, the accuracy of a denoising process is ensured, and the retrieval result is more comprehensive and effective.
Further, the method further comprises: according to the third target document, obtaining a third patent, wherein the third patent is a patent with multiple classification numbers and has fourth classification number information; judging whether fifth correlation is met or not according to the fourth classification number information and the first classification number information; when the fourth classification number information and the first classification number information meet the fifth correlation, a sixth denoising instruction is obtained, and the sixth denoising instruction is used for retrieving in the third target document according to the fourth classification number information to obtain a sixth target document; and obtaining a seventh denoising instruction according to the sixth target document and the sixth denoising instruction, wherein the seventh denoising instruction is used for deleting the sixth target document from the third target document to obtain an eighth target document.
Specifically, the third target document subjected to denoising processing by the second classification number may also have a situation of incomplete denoising, and a patent document which does not meet the purpose of retrieval may still be included therein, and in order to further effectively denoise the retrieval result, the patent document in the third target document is further screened to obtain a patent document with a plurality of classification numbers, and the classification numbers of the document are compared, for example, some patents include a plurality of classification numbers, wherein a classification number and a first classification number meet the requirement of correlation, the retrieval result is determined, but the patent also includes a classification number which is an interfering classification number to be deleted, and the classification number and the first classification number meet the fifth correlation, and the fifth correlation is set according to the information of the classification number which is required to denoise and the first classification number, that is, the classified number patent satisfying the fifth correlation requirement with the first classified number information should be denoised, if the patent and the first document patent may be in the same large class, thus satisfying the requirement of the initial search, but belong to different classes under the large class, one of the classes is definitely required to be denoised in the search requirement, but because it includes a plurality of classified numbers, it is not deleted in the initial denoising process, but when further screening the plurality of classified numbers, if it is found that it includes a classified number required to be denoised, it should be deleted, then the screening denoising is performed according to the classified number, and the patent document including the classified number in the third target document is deleted, thereby ensuring the comprehensiveness of denoising, avoiding the problems of complication and omission of the manual denoising process, also ensuring the accuracy of the search result, and realizing the automatic processing of the whole process, high efficiency and convenience are guaranteed, retrieval personnel can conveniently and quickly retrieve the information, and working efficiency is improved.
Example two
Based on the same inventive concept as the automatic denoising method for the search data in the foregoing embodiment, the present invention further provides an automatic denoising method device for the search data, as shown in fig. 2, the device includes:
a first obtaining unit 11, wherein the first obtaining unit 11 is used for obtaining first classification number information according to a first document;
a second obtaining unit 12, where the second obtaining unit 12 is configured to obtain a first target document, where the first target document is a search document set determined according to the first document;
a third obtaining unit 13, wherein the third obtaining unit 13 is configured to retrieve from the first target document to obtain a first patent, and the first patent has second classification number information;
a first judging unit 14, where the first judging unit 14 is configured to judge whether the second classification number information and the first classification number information satisfy a first correlation according to the first classification number information and the second classification number information;
a fourth obtaining unit 15, where the fourth obtaining unit 15 is configured to obtain a first denoising instruction according to first document classification number information when the second classification number information and the first classification number information do not satisfy the first correlation, where the first denoising instruction is used to perform a search in the first target document according to the second classification number information to obtain a second target document;
a fifth obtaining unit 16, where the fifth obtaining unit 16 is configured to obtain a second denoising instruction according to the first denoising instruction and a second target document, and the second denoising instruction is configured to delete the second target document from the first target document to obtain a third target document.
Further, the apparatus further comprises:
a sixth obtaining unit, configured to obtain a first keyword according to the first document;
a seventh obtaining unit, configured to obtain a second keyword according to the first document, where the first keyword is different from the second keyword;
the first determining unit is used for determining second relevance according to the first keyword and the second keyword;
a second determination unit configured to determine whether the second correlation satisfies a first predetermined threshold;
a second determining unit, configured to determine a first search term according to the first keyword and a second keyword when the second relevance satisfies the first predetermined threshold;
an eighth obtaining unit, configured to obtain a first search instruction according to the first search term, where the first search instruction is used to perform a search according to the first search term to obtain a first target document.
Further, the apparatus further comprises:
a ninth obtaining unit configured to obtain a first attribute according to the first classification number information when the second relevance does not satisfy the first predetermined threshold;
a tenth obtaining unit, configured to obtain a second attribute according to the first keyword;
an eleventh obtaining unit, configured to obtain a third relevance according to the first attribute and the second attribute;
a third judging unit configured to judge whether the third correlation satisfies a second predetermined threshold;
a twelfth obtaining unit, configured to obtain a second retrieval instruction when the third association satisfies the second predetermined threshold, where the second retrieval instruction is used to perform retrieval according to the first keyword to obtain a first target document.
Further, the apparatus further comprises:
a thirteenth obtaining unit, configured to obtain a third attribute according to the second keyword when the third relevance does not satisfy the third predetermined threshold;
a fourteenth obtaining unit, configured to obtain a fourth relevance according to the first attribute and the third attribute;
a fourth determination unit configured to determine whether the fourth relevance satisfies the second predetermined threshold;
a fifteenth obtaining unit, configured to obtain a second retrieval instruction when the fourth relevance satisfies the second predetermined threshold, where the second retrieval instruction is used to perform retrieval according to the second keyword to obtain a first target document.
Further, the apparatus further comprises:
a sixteenth obtaining unit configured to obtain, from the second target document, a second patent having third classification number information other than the second classification number information;
a fifth judging unit, configured to judge whether the first correlation is satisfied according to the third classification number information and the first classification number information;
a seventeenth obtaining unit, configured to obtain a third denoising instruction when the third classification number information and the first classification number information satisfy the first correlation, where the third denoising instruction is used to perform retrieval in the second target document according to the third classification number information to obtain a fourth target document;
an eighteenth obtaining unit, configured to obtain a fifth denoising instruction according to the fourth target document and the third denoising instruction, where the fifth denoising instruction is used to add the fourth target document to the third target document to obtain a fifth target document.
Further, the apparatus further comprises:
a nineteenth obtaining unit configured to obtain, according to the third target document, a third patent, which is a patent having a plurality of classification numbers, the third patent having fourth classification number information;
a fifth judging unit, configured to judge whether a fifth correlation is satisfied according to the fourth classification number information and the first classification number information;
a twentieth obtaining unit, configured to obtain a sixth denoising instruction when the fourth classification number information and the first classification number information satisfy the fifth correlation, where the sixth denoising instruction is used to retrieve from the third target document according to the fourth classification number information to obtain a sixth target document;
a twenty-first obtaining unit, configured to obtain a seventh denoising instruction according to the sixth target document and the sixth denoising instruction, where the seventh denoising instruction is used to delete the sixth target document from the third target document and obtain an eighth target document.
Various changes and specific examples of the automatic denoising method for the search data in the first embodiment of fig. 1 are also applicable to the automatic denoising device for the search data in the present embodiment, and through the foregoing detailed description of the automatic denoising method for the search data, those skilled in the art can clearly know the implementation method of the automatic denoising device for the search data in the present embodiment, so for the brevity of the description, detailed descriptions are omitted here.
EXAMPLE III
Based on the same inventive concept as the automatic denoising method for the retrieved data in the foregoing embodiment, the present invention further provides an automatic denoising device for the retrieved data, as shown in fig. 3, including a memory 304, a processor 302, and a computer program stored on the memory 304 and operable on the processor 302, wherein the processor 302, when executing the program, implements the steps of any one of the foregoing automatic denoising methods for the retrieved data.
Where in fig. 3 a bus architecture (represented by bus 300), bus 300 may include any number of interconnected buses and bridges, bus 300 linking together various circuits including one or more processors, represented by processor 302, and memory, represented by memory 304. The bus 300 may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface 306 provides an interface between the bus 300 and the receiver 301 and transmitter 303. The receiver 301 and the transmitter 303 may be the same element, i.e., a transceiver, providing a means for communicating with various other apparatus over a transmission medium. The processor 302 is responsible for managing the bus 300 and general processing, and the memory 304 may be used for storing data used by the processor 302 in performing operations.
Example four
Based on the same inventive concept as the automatic denoising method for the search data in the foregoing embodiments, the present invention also provides a computer-readable storage medium having a computer program stored thereon, which when executed by a processor, implements the steps of: according to a first document, obtaining first classification number information; obtaining a first target document, wherein the first target document is a retrieval document set determined according to the first document; retrieving from the first target document to obtain a first patent, the first patent having second classification number information; judging whether the second classification number information and the first classification number information meet a first correlation or not according to the first classification number information and the second classification number information; when the second classification number information and the first classification number information do not meet the first correlation, a first denoising instruction is obtained according to first document classification number information, and the first denoising instruction is used for retrieving in the first target document according to the second classification number information to obtain a second target document; and obtaining a second denoising instruction according to the first denoising instruction and a second target document, wherein the second denoising instruction is used for deleting the second target document from the first target document to obtain a third target document.
In a specific implementation, when the program is executed by a processor, any method step in the first embodiment may be further implemented.
One or more technical solutions in the embodiments of the present application have at least one or more of the following technical effects:
according to the automatic denoising method and device for the retrieval data, provided by the embodiment of the invention, first classification number information is obtained according to a first document; obtaining a first target document, wherein the first target document is a retrieval document set determined according to the first document; retrieving from the first target document to obtain a first patent, the first patent having second classification number information; judging whether the second classification number information and the first classification number information meet a first correlation or not according to the first classification number information and the second classification number information; when the second classification number information and the first classification number information do not meet the first correlation, a first denoising instruction is obtained according to first document classification number information, and the first denoising instruction is used for retrieving in the first target document according to the second classification number information to obtain a second target document; and obtaining a second denoising instruction according to the first denoising instruction and a second target document, wherein the second denoising instruction is used for deleting the second target document from the first target document to obtain a third target document. The effect of automatically denoising retrieval data is achieved, effective comparison of classification number information is utilized, denoising is conducted on patent documents of which classification numbers do not meet retrieval requirements in retrieved patent documents, accuracy of retrieval results is guaranteed, meanwhile, due to the fact that the whole process is automatically conducted, the technical effect of a complex process of manual denoising is avoided, and therefore the technical problems that denoising conducted on the retrieval results in the prior art is conducted through manual screening, the denoising process is time-consuming and labor-consuming, omission is easily caused, denoising is incomplete, and accuracy of the retrieval results cannot be guaranteed are solved.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (9)

1. An automatic denoising method for retrieving data, the method comprising:
according to a first document, obtaining first classification number information;
obtaining a first target document, wherein the first target document is a retrieval document set determined according to the first document;
retrieving from the first target document to obtain a first patent, the first patent having second classification number information;
judging whether the second classification number information and the first classification number information meet a first correlation or not according to the first classification number information and the second classification number information;
when the second classification number information and the first classification number information do not meet the first correlation, a first denoising instruction is obtained according to first document classification number information, and the first denoising instruction is used for retrieving in the first target document according to the second classification number information to obtain a second target document;
and obtaining a second denoising instruction according to the first denoising instruction and a second target document, wherein the second denoising instruction is used for deleting the second target document from the first target document to obtain a third target document.
2. The method of claim 1, wherein obtaining the first object document is preceded by:
obtaining a first keyword according to the first document;
obtaining a second keyword according to the first document, wherein the first keyword is different from the second keyword;
determining a second relevance according to the first keyword and the second keyword;
judging whether the second relevance meets a first preset threshold value or not;
when the second relevance meets the first preset threshold value, determining a first search term according to the first keyword and the second keyword;
and obtaining a first retrieval instruction according to the first retrieval word, wherein the first retrieval instruction is used for retrieving according to the first retrieval word to obtain a first target document.
3. The method of claim 2, wherein said determining whether the second association satisfies a first predetermined threshold comprises:
when the second relevance does not meet the first preset threshold value, obtaining a first attribute according to the first classification number information;
obtaining a second attribute according to the first keyword;
obtaining a third relevance according to the first attribute and the second attribute;
judging whether the third relevance meets a second preset threshold value or not;
and when the third relevance meets the second preset threshold, obtaining a second retrieval instruction, wherein the second retrieval instruction is used for retrieving according to the first keyword to obtain a first target document.
4. The method of claim 3, wherein said determining whether said third association satisfies a second predetermined threshold comprises:
when the third relevance does not meet the third preset threshold value, obtaining a third attribute according to the second keyword;
obtaining a fourth relevance according to the first attribute and the third attribute;
determining whether the fourth correlation satisfies the second predetermined threshold;
and when the fourth relevance meets the second preset threshold value, obtaining a second retrieval instruction, wherein the second retrieval instruction is used for retrieving according to the second keyword to obtain a first target document.
5. The method of claim 1, wherein the method further comprises:
obtaining a second patent having third classification number information other than the second classification number information according to the second target document;
judging whether the first correlation is met or not according to the third classification number information and the first classification number information;
when the third classification number information and the first classification number information meet the first correlation, obtaining a third denoising instruction, wherein the third denoising instruction is used for retrieving in the second target document according to the third classification number information to obtain a fourth target document;
and obtaining a fifth denoising instruction according to the fourth target document and the third denoising instruction, wherein the fifth denoising instruction is used for adding the fourth target document into the third target document to obtain a fifth target document.
6. The method of claim 1, wherein the method further comprises:
according to the third target document, obtaining a third patent, wherein the third patent is a patent with multiple classification numbers and has fourth classification number information;
judging whether fifth correlation is met or not according to the fourth classification number information and the first classification number information;
when the fourth classification number information and the first classification number information meet the fifth correlation, a sixth denoising instruction is obtained, and the sixth denoising instruction is used for retrieving in the third target document according to the fourth classification number information to obtain a sixth target document;
and obtaining a seventh denoising instruction according to the sixth target document and the sixth denoising instruction, wherein the seventh denoising instruction is used for deleting the sixth target document from the third target document to obtain an eighth target document.
7. An apparatus for automatically denoising retrieved data, the apparatus comprising:
a first obtaining unit configured to obtain first classification number information according to a first document;
a second obtaining unit, configured to obtain a first target document, where the first target document is a search document set determined according to the first document;
a third obtaining unit, configured to retrieve from the first target document, and obtain a first patent, where the first patent has second classification number information;
the first judging unit is used for judging whether the second classification number information and the first classification number information meet a first correlation or not according to the first classification number information and the second classification number information;
a fourth obtaining unit, configured to obtain a first denoising instruction according to first document classification number information when the second classification number information and the first classification number information do not satisfy the first correlation, where the first denoising instruction is used to retrieve in the first target document according to the second classification number information to obtain a second target document;
a fifth obtaining unit, configured to obtain a second denoising instruction according to the first denoising instruction and a second target document, where the second denoising instruction is used to delete the second target document from the first target document, and obtain a third target document.
8. An apparatus for automatically denoising retrieved data, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method of any one of claims 1-6 when executing the program.
9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.
CN202010092938.6A 2020-02-14 2020-02-14 Automatic denoising method and device for retrieval data Withdrawn CN111309895A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010092938.6A CN111309895A (en) 2020-02-14 2020-02-14 Automatic denoising method and device for retrieval data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010092938.6A CN111309895A (en) 2020-02-14 2020-02-14 Automatic denoising method and device for retrieval data

Publications (1)

Publication Number Publication Date
CN111309895A true CN111309895A (en) 2020-06-19

Family

ID=71158327

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010092938.6A Withdrawn CN111309895A (en) 2020-02-14 2020-02-14 Automatic denoising method and device for retrieval data

Country Status (1)

Country Link
CN (1) CN111309895A (en)

Similar Documents

Publication Publication Date Title
CN108829858B (en) Data query method and device and computer readable storage medium
CN109240901B (en) Performance analysis method, performance analysis device, storage medium, and electronic apparatus
US11449564B2 (en) System and method for searching based on text blocks and associated search operators
CN101819578A (en) Retrieval method, method and device for establishing index and retrieval system
CN107102993B (en) User appeal analysis method and device
CN111752955A (en) Data processing method, device, equipment and computer readable storage medium
US20120317125A1 (en) Method and apparatus for identifier retrieval
CN113190687A (en) Knowledge graph determining method and device, computer equipment and storage medium
CN113901169A (en) Information processing method, information processing device, electronic equipment and storage medium
CN111460137B (en) Method, equipment and medium for identifying micro-service focus based on topic model
JP2013174988A (en) Similar document retrieval support apparatus and similar document retrieval support program
CN112434009A (en) End-to-end data probing method and device, computer equipment and storage medium
CN111062832A (en) Auxiliary analysis method and device for intelligently providing patent answer and debate opinions
CN111309895A (en) Automatic denoising method and device for retrieval data
CN111274364A (en) Automatic denoising method and device based on keyword retrieval data
CN113407678B (en) Knowledge graph construction method, device and equipment
US8214336B2 (en) Preservation of digital content
CN113761213B (en) Knowledge graph-based data query system, method and terminal equipment
CN113722421B (en) Contract auditing method and system and computer readable storage medium
CN113468339A (en) Label extraction method, system, electronic device and medium based on knowledge graph
CN111324726A (en) Method and device for automatically drying patent database
Mashina Application of statistical methods to solve the problem of enriching ontologies of developing subject areas
CN111353023A (en) Target database optimization method and device based on keyword retrieval
CN111274229A (en) Method and device for verifying denoising result of retrieved data
CN111339123A (en) Double-retrieval patent database establishing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20200619