CN111274364A - Automatic denoising method and device based on keyword retrieval data - Google Patents

Automatic denoising method and device based on keyword retrieval data Download PDF

Info

Publication number
CN111274364A
CN111274364A CN202010092898.5A CN202010092898A CN111274364A CN 111274364 A CN111274364 A CN 111274364A CN 202010092898 A CN202010092898 A CN 202010092898A CN 111274364 A CN111274364 A CN 111274364A
Authority
CN
China
Prior art keywords
keyword
denoising
target document
obtaining
instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010092898.5A
Other languages
Chinese (zh)
Inventor
邓梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Rainpat Data Service Co ltd
Original Assignee
Jiangsu Rainpat Data Service Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Rainpat Data Service Co ltd filed Critical Jiangsu Rainpat Data Service Co ltd
Priority to CN202010092898.5A priority Critical patent/CN111274364A/en
Publication of CN111274364A publication Critical patent/CN111274364A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3349Reuse of stored results of previous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services
    • G06Q50/184Intellectual property management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • Technology Law (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Primary Health Care (AREA)
  • General Business, Economics & Management (AREA)
  • Operations Research (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • General Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an automatic denoising method and device based on keyword retrieval data, which are used for obtaining a first keyword according to a first document; obtaining a first target document according to the first keyword; obtaining a first patent according to the first target document, wherein the first patent has a second keyword; obtaining a first denoising instruction, wherein the first denoising instruction is used for deleting a first patent from a first target document; obtaining a second denoising instruction according to the first denoising instruction and a second keyword, wherein the second denoising instruction is used for retrieving in the first target document according to the second keyword to obtain a second target document; and deleting the second target document from the first target document to obtain a third target document. The method solves the technical problems that in the prior art, the search result is denoised through manually screening keywords, the denoising process is time-consuming and labor-consuming, and the accuracy of the search result cannot be guaranteed. The technical effects of automatic denoising processing and improvement of the accuracy of the retrieval result are achieved.

Description

Automatic denoising method and device based on keyword retrieval data
Technical Field
The invention relates to the technical field of data processing, in particular to an automatic denoising method and device based on keyword retrieval data.
Background
Patent document search is to search for patents and patent documents. Chinese Patent Retrieval System (CPRS): the patent retrieval and full text browsing system is only used in a local area network of the national intellectual property office. The system comprises: the full text of the data recorded in the three Chinese patents and the invention and the utility model since 1985; bibliographic data and full text descriptions of U.S. patents since 1975; the entire descriptions of the patents and utility models have been filed since 1993. The patent literature retrieval is the basic work that enterprises comprehensively know the prior art, improves the research and development starting point and avoids intellectual property risks. Because original patent data disclosed on the internet is incomplete, language is obscure, and the original patent data is long and difficult to understand, enterprises have difficulty in searching if professional searching methods and skills are not mastered. Generally, the retrieval result contains more data, the retrieval data denoising processing is required to be further carried out, the current denoising process is carried out manually through retrieval personnel, the retrieval result is further screened, the process is complex, a large amount of manpower and material resources are consumed, and the manual processing is adopted, so that the problems that the denoising result is not comprehensive and the retrieval result is influenced exist.
However, the applicant of the present invention finds that the prior art has at least the following technical problems:
the search results are denoised by manually screening keywords in the prior art, the process of denoising wastes time and labor, omission is easily caused, denoising is incomplete, and the accuracy of the search results cannot be guaranteed.
Disclosure of Invention
The embodiment of the invention provides an automatic denoising method and device based on keyword retrieval data, and solves the technical problems that denoising is performed on a retrieval result through manually screening keywords in the prior art, the denoising process is time-consuming and labor-consuming, omission is easily caused, denoising is incomplete, and accuracy of the retrieval result cannot be guaranteed.
In view of the foregoing problems, embodiments of the present application are provided to provide an automatic denoising method and apparatus based on keyword search data.
In a first aspect, the present invention provides an automatic denoising method based on keyword retrieval data, the method comprising: obtaining a first keyword according to a first document; obtaining a first target document according to the first keyword; obtaining a first patent according to the first target document, wherein the first patent has a second keyword, and the second keyword is different from the first keyword; obtaining a first denoising instruction, wherein the first denoising instruction is used for deleting the first patent from the first target document; obtaining a second denoising instruction according to the first denoising instruction and a second keyword, wherein the second denoising instruction is used for retrieving in the first target document according to the second keyword to obtain a second target document; and deleting the second target document from the first target document to obtain a third target document.
Preferably, before obtaining the first denoising instruction, the method includes: obtaining a denoising keyword; obtaining a first relevance according to the denoising keyword and the second keyword; judging whether the first relevance meets a first preset threshold value or not; obtaining the first denoising instruction when the first relevance meets the first predetermined threshold.
Preferably, after determining whether the first correlation satisfies a first predetermined threshold, the method further includes: when the first relevance does not meet the first preset threshold, obtaining a second relevance according to the first preset threshold; obtaining a third keyword according to the denoising keyword and the second relevance, wherein the third keyword is different from the second keyword; obtaining a third denoising instruction, wherein the third denoising instruction is used for searching in the first target document according to the third key word to obtain a fourth target document; and obtaining a fourth denoising instruction according to the fourth target document, wherein the fourth denoising instruction is used for deleting the fourth target document from the first target document to obtain a fifth target document.
Preferably, the obtaining a first correlation according to the denoising keyword and the second keyword includes: obtaining a first attribute according to the denoising keyword; obtaining a second attribute according to the second keyword; and obtaining the first relevance according to the first attribute and the second attribute.
Preferably, the method further comprises: obtaining a fourth keyword according to the first attribute and the first preset threshold; judging whether the fourth keyword is the same as the second keyword and the third keyword; when the fourth keyword is different from the second keyword and the third keyword, obtaining a fifth denoising instruction, wherein the fifth denoising instruction is used for retrieving in the third target document according to the fourth keyword to obtain a sixth target document; and obtaining a sixth denoising instruction according to the fifth denoising instruction and the fourth keyword, wherein the sixth denoising instruction is used for deleting the sixth target document from the third target document to obtain a seventh target document.
In a second aspect, the present invention provides an automatic denoising apparatus for retrieving data based on keywords, the apparatus comprising:
a first obtaining unit configured to obtain a first keyword from a first document;
a second obtaining unit, configured to obtain a first target document according to the first keyword;
a third obtaining unit, configured to obtain a first patent according to the first target document, where the first patent has a second keyword, where the second keyword is different from the first keyword;
a fourth obtaining unit, configured to obtain a first denoising instruction, where the first denoising instruction is used to delete the first patent from the first target document;
a fifth obtaining unit, configured to obtain a second denoising instruction according to the first denoising instruction and a second keyword, where the second denoising instruction is used to perform retrieval in the first target document according to the second keyword to obtain a second target document;
a first executing unit, configured to delete the second target document from the first target file, and obtain a third target document.
Preferably, the apparatus further comprises:
a sixth obtaining unit, configured to obtain a denoising keyword;
a seventh obtaining unit, configured to obtain a first relevance according to the denoising keyword and the second keyword;
a first judging unit configured to judge whether the first correlation satisfies a first predetermined threshold;
an eighth obtaining unit, configured to obtain the first denoising instruction when the first correlation satisfies the first predetermined threshold.
Preferably, the apparatus further comprises:
a ninth obtaining unit, configured to obtain a second relevance according to the first predetermined threshold when the first relevance does not satisfy the first predetermined threshold;
a tenth obtaining unit, configured to obtain a third keyword according to the denoising keyword and the second relevance, where the third keyword is different from the second keyword;
an eleventh obtaining unit, configured to obtain a third denoising instruction, where the third denoising instruction is used to perform a search in the first target document according to the third keyword to obtain a fourth target document;
a second execution unit, configured to obtain a fourth denoising instruction according to the fourth target document, where the fourth denoising instruction is used to delete the fourth target document from the first target document and obtain a fifth target document.
Preferably, the apparatus further comprises:
a twelfth obtaining unit, configured to obtain a first attribute according to the denoising keyword;
a thirteenth obtaining unit, configured to obtain a second attribute according to the second keyword;
a fourteenth obtaining unit, configured to obtain the first association according to the first attribute and the second attribute.
Preferably, the apparatus further comprises:
a fifteenth obtaining unit, configured to obtain a fourth keyword according to the first attribute and the first predetermined threshold;
a second judging unit, configured to judge whether the fourth keyword is the same as the second keyword and a third keyword;
a sixteenth obtaining unit, configured to obtain a fifth denoising instruction when the fourth keyword is different from the second keyword and the third keyword, where the fifth denoising instruction is used to perform a search in the third target document according to the fourth keyword to obtain a sixth target document;
a third execution unit, configured to obtain a sixth denoising instruction according to the fifth denoising instruction and the fourth keyword, where the sixth denoising instruction is used to delete the sixth target document from the third target document, and obtain a seventh target document.
In a third aspect, the present invention provides an automatic denoising device for retrieving data based on keywords, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of any one of the above methods when executing the program.
In a fourth aspect, the invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of any of the methods described above.
One or more technical solutions in the embodiments of the present application have at least one or more of the following technical effects:
according to the automatic denoising method and device based on the keyword retrieval data, provided by the embodiment of the invention, a first keyword is obtained according to a first document; obtaining a first target document according to the first keyword; obtaining a first patent according to the first target document, wherein the first patent has a second keyword, and the second keyword is different from the first keyword; obtaining a first denoising instruction, wherein the first denoising instruction is used for deleting the first patent from the first target document; obtaining a second denoising instruction according to the first denoising instruction and a second keyword, wherein the second denoising instruction is used for retrieving in the first target document according to the second keyword to obtain a second target document; and deleting the second target document from the first target document to obtain a third target document. The keyword is effectively determined through analysis processing of the keyword, automatic denoising processing is carried out on the retrieval data, accuracy of retrieval results is improved, and time and labor waste caused by manual denoising is effectively avoided. Therefore, the technical problems that in the prior art, the search result is denoised through manually screening keywords, the denoising process is time-consuming and labor-consuming, omission is easily caused, denoising is incomplete, and the accuracy of the search result cannot be guaranteed are solved.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
FIG. 1 is a schematic flow chart of an automatic denoising method based on keyword retrieval data according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of an automatic denoising apparatus based on keyword search data according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of another automatic denoising device based on keyword retrieval data according to an embodiment of the present invention.
Description of reference numerals: a first obtaining unit 11, a second obtaining unit 12, a third obtaining unit 13, a fourth obtaining unit 14, a fifth obtaining unit 15, a first executing unit 16, a bus 300, a receiver 301, a processor 302, a transmitter 303, a memory 304, and a bus interface 306.
Detailed Description
The embodiment of the invention provides an automatic denoising method and device based on keyword retrieval data, which are used for solving the technical problems that the denoising process is time-consuming and labor-consuming, omission is easy to cause, denoising is incomplete and the accuracy of a retrieval result cannot be ensured in the prior art by manually screening keywords and denoising a retrieval result.
The technical scheme provided by the invention has the following general idea:
obtaining a first keyword according to a first document; obtaining a first target document according to the first keyword; obtaining a first patent according to the first target document, wherein the first patent has a second keyword, and the second keyword is different from the first keyword; obtaining a first denoising instruction, wherein the first denoising instruction is used for deleting the first patent from the first target document; obtaining a second denoising instruction according to the first denoising instruction and a second keyword, wherein the second denoising instruction is used for retrieving in the first target document according to the second keyword to obtain a second target document; and deleting the second target document from the first target document to obtain a third target document. The keyword is effectively determined through analysis processing of the keyword, automatic denoising processing is carried out on the retrieval data, accuracy of retrieval results is improved, and time and labor waste caused by manual denoising is effectively avoided.
The technical solutions of the present invention are described in detail below with reference to the drawings and specific embodiments, and it should be understood that the specific features in the embodiments and examples of the present invention are described in detail in the technical solutions of the present application, and are not limited to the technical solutions of the present application, and the technical features in the embodiments and examples of the present application may be combined with each other without conflict.
The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
Example one
Fig. 1 is a schematic flow chart of an automatic denoising method based on keyword search data according to an embodiment of the present invention. As shown in fig. 1, an embodiment of the present invention provides an automatic denoising method based on keyword search data, where the method includes:
step 110: a first keyword is obtained from a first document.
Specifically, the first document is document data to be searched, and a search keyword of the first document is obtained by analyzing a title and a content of the first document, the search keyword is the first keyword, and when the first keyword is confirmed, the keyword in the first document can be identified by using a conventional input or performing data analysis by using a neural network model, and the keyword can be determined by analyzing the title or can be obtained by analyzing and confirming the content according to the appearance frequency, the part of speech and the like. For example, the first document is a patent document about a toothbrush, and the first keyword is obtained as a toothbrush by analyzing the title and contents.
Step 120: and obtaining a first target document according to the first keyword.
Specifically, the first keyword determined in step 110 is used to search for document data through the first keyword, and keyword search is performed in the document database by using keyword search, so as to obtain a document data set related to the first keyword as a first target document, that is, the first target document includes all document data related to the first keyword, and the document including a document meeting requirements also includes a document in which only the first keyword appears but the content of a specific document does not meet requirements.
Step 130: according to the first target document, a first patent is obtained, wherein the first patent has a second keyword, and the second keyword is different from the first keyword.
Specifically, a patent document in a first target document is searched, a second keyword which is a keyword different from the first keyword and which is not matched with the target document is obtained from the patent document, for example, the first keyword is a toothbrush, the first target document is all documents related to toothbrushes searched by the keyword of the toothbrush, including documents whose titles include toothbrushes and whose full text shows a toothbrush character, manual toothbrushes, electric toothbrushes, and the like, that are all documents related to toothbrushes, but the document of the present search is a means toothbrush, while the first target document is many electric toothbrushes, and the second keyword is a motor by searching for an electric toothbrush patent which includes a second keyword motor.
Step 140: obtaining a first denoising instruction, wherein the first denoising instruction is used for deleting the first patent from the first target document.
Step 150: and obtaining a second denoising instruction according to the first denoising instruction and the second keyword, wherein the second denoising instruction is used for retrieving in the first target document according to the second keyword to obtain a second target document.
Specifically, according to a first patent and a second keyword, the first patent is determined to be a non-target document from a first target document, a corresponding denoising instruction is obtained, the first patent is deleted from the first target document to realize automatic denoising, meanwhile, the first target document can be searched according to the second keyword to ensure that all document data which do not meet the search requirement can be found, and the second keyword is determined to be the search in the first target document to find all patent documents with the second keyword.
Step 160: and deleting the second target document from the first target document to obtain a third target document.
Specifically, the patent documents determined according to the second keyword retrieval are deleted finally to achieve the purpose of denoising retrieved document data, namely, all the retrieved patent documents containing the second keyword are deleted from the first target document to achieve automatic denoising processing of the target documents obtained through the first keyword retrieval, and the technical problems that in the prior art, the retrieval result is denoised through manual keyword screening, the denoising process is time-consuming and labor-consuming, omission is easy to occur, denoising is incomplete, and the accuracy of the retrieval result cannot be guaranteed are solved. The keyword is effectively determined through analysis processing of the keyword, automatic denoising processing is carried out on the retrieval data, accuracy of retrieval results is improved, and time and labor waste caused by manual denoising is effectively avoided.
Further, before obtaining the first denoising instruction, the method includes: obtaining a denoising keyword; obtaining a first relevance according to the denoising keyword and the second keyword; judging whether the first relevance meets a first preset threshold value or not; obtaining the first denoising instruction when the first relevance meets the first predetermined threshold.
Specifically, before determining the first denoising instruction, determining a denoising keyword according to the first target document, wherein the denoising keyword is determined by analyzing and processing according to the attribute of the first keyword to obtain a keyword associated with the first keyword, performing human input, performing data analysis by using a neural network model, performing data acquisition by using the category and the conventional classification of the first keyword, and the keyword and the occurrence frequency of the first target document, performing data classification by using the determined associated data as analysis data, which can be divided into training data and correction data, performing model establishment by using data characteristics and a denoising calculation method, performing model training by using the training data, performing verification by using the correction data, and outputting the denoising keyword as a main denoising object in the first target document through the model, and performing relevance analysis according to the obtained denoising keyword and the second keyword, wherein in the relevance analysis, performing data processing according to the word meaning and attribute of the keyword to obtain corresponding relevance data, comparing with preset threshold, if the preset threshold is set to 80%, i.e., the correlation between the two is high, for example, whether the second keyword is a keyword associated with the denoising keyword, such as synonyms, similar words, alternative words, etc., if the requirement of relevance is satisfied, the second keyword is determined as the corresponding keyword for denoising, automatically denoising the first target document according to the second keyword to realize automatic processing of denoising process, the system analysis of the literature keywords ensures the denoising accuracy, and the denoising process is carried out automatically, so that the problems of manual processing omission and incomplete denoising are solved.
Further, after determining whether the first association satisfies a first predetermined threshold, the method further includes: when the first relevance does not meet the first preset threshold, obtaining a second relevance according to the first preset threshold; obtaining a third keyword according to the denoising keyword and the second relevance, wherein the third keyword is different from the second keyword; obtaining a third denoising instruction, wherein the third denoising instruction is used for searching in the first target document according to the third key word to obtain a fourth target document; and obtaining a fourth denoising instruction according to the fourth target document, wherein the fourth denoising instruction is used for deleting the fourth target document from the first target document to obtain a fifth target document.
Specifically, after calculation and judgment, the relevance between the second keyword and the denoising keyword does not meet the requirement of a set threshold, that is, the relevance between the second keyword and the denoising keyword is not large enough, automatic denoising processing cannot be performed according to the second keyword, at this time, the second relevance is obtained according to a first preset threshold, the second relevance can be a range or a plurality of values, a third keyword is determined according to the denoising keyword and the second relevance, the third keyword is a keyword which meets the second relevance with the denoising keyword, that is, meets the requirement of the first preset threshold, and can be used as the denoising keyword, but the keyword is not obtained by analyzing the keywords in the patent of the first target document, but is confirmed by the denoising keyword and the relevance requirement to avoid the incompleteness of denoising, and the obtained third keyword is retrieved in the first target document, and the included documents are deleted, so that a multidirectional denoising keyword determining process is realized, the comprehensiveness of denoising is ensured, and the accuracy of data retrieval is ensured.
Further, the obtaining a first correlation according to the denoising keyword and the second keyword includes: obtaining a first attribute according to the denoising keyword; obtaining a second attribute according to the second keyword; and obtaining the first relevance according to the first attribute and the second attribute.
Specifically, when calculating the relevance between keywords, the corresponding denoising keyword attribute, i.e., a first attribute, is obtained according to the denoising keyword, the corresponding keyword attribute, i.e., a second attribute, is obtained according to the second keyword, the first attribute and the second attribute include the category, the characteristic, the classification number and the like of the keyword, the duty division is performed according to the importance of each data, the value and the duty of each data in the attribute data are used for performing weighted calculation to obtain the corresponding attribute value, and the relevance calculation is performed by using the respective attribute value to obtain the relevance value between the two. For example, the closer the attribute values calculated by the denoising keyword and the second keyword are, the higher the correlation between the denoising keyword and the second keyword is.
Further, the method further comprises: obtaining a fourth keyword according to the first attribute and the first preset threshold; judging whether the fourth keyword is the same as the second keyword and the third keyword; when the fourth keyword is different from the second keyword and the third keyword, obtaining a fifth denoising instruction, wherein the fifth denoising instruction is used for retrieving in the third target document according to the fourth keyword to obtain a sixth target document; and obtaining a sixth denoising instruction according to the fifth denoising instruction and the fourth keyword, wherein the sixth denoising instruction is used for deleting the sixth target document from the third target document to obtain a seventh target document.
Specifically, for the comprehensiveness of denoising, the embodiment further includes obtaining a fourth keyword according to the first attribute and a first predetermined threshold, where the fourth keyword is a keyword having the same first attribute as the denoising keyword, that is, having a certain similarity therebetween, and the correlation between the fourth keyword and the denoising keyword satisfies the first predetermined threshold, the fourth keyword determined in this way is also a target denoising keyword, and according to the fourth keyword, it is determined whether the fourth keyword is the same as the second keyword and the third keyword, if the fourth keyword is the same as the second keyword and the third keyword, repeated denoising is not required, if the fourth keyword is different from the second keyword and the third keyword, it is possible that omission and incomplete denoising are still present, at this time, the fourth keyword is used to search for a third target document, and a patent document having the fourth keyword is deleted, the retrieval data are denoised again, and the completeness of denoising is ensured by automatically denoising the retrieval data for a plurality of times for different keywords, so that omission in the manual denoising process is avoided, the accuracy of the retrieval result is realized, and the comprehensiveness of denoising is realized. Therefore, the technical problems that in the prior art, the search result is denoised by manually screening keywords, the denoising process is time-consuming and labor-consuming, omission is easily caused, denoising is incomplete, and the accuracy of the search result cannot be guaranteed are effectively solved.
Example two
Based on the same inventive concept as the automatic denoising method based on the keyword retrieval data in the foregoing embodiment, the present invention further provides an automatic denoising method device based on the keyword retrieval data, as shown in fig. 2, the device includes:
a first obtaining unit 11, wherein the first obtaining unit 11 is used for obtaining a first keyword according to a first document;
a second obtaining unit 12, where the second obtaining unit 12 is configured to obtain a first target document according to the first keyword;
a third obtaining unit 13, configured to obtain, according to the first target document, a first patent, where the first patent has a second keyword, where the second keyword is different from the first keyword;
a fourth obtaining unit 14, wherein the fourth obtaining unit 14 is configured to obtain a first denoising instruction, and the first denoising instruction is used for deleting the first patent from the first target document;
a fifth obtaining unit 15, where the fifth obtaining unit 15 is configured to obtain a second denoising instruction according to the first denoising instruction and a second keyword, and the second denoising instruction is configured to perform a search in the first target document according to the second keyword to obtain a second target document;
a first executing unit 16, where the first executing unit 16 is configured to delete the second target document from the first target file, and obtain a third target document.
Preferably, the apparatus further comprises:
a sixth obtaining unit, configured to obtain a denoising keyword;
a seventh obtaining unit, configured to obtain a first relevance according to the denoising keyword and the second keyword;
a first judging unit configured to judge whether the first correlation satisfies a first predetermined threshold;
an eighth obtaining unit, configured to obtain the first denoising instruction when the first correlation satisfies the first predetermined threshold.
Preferably, the apparatus further comprises:
a ninth obtaining unit, configured to obtain a second relevance according to the first predetermined threshold when the first relevance does not satisfy the first predetermined threshold;
a tenth obtaining unit, configured to obtain a third keyword according to the denoising keyword and the second relevance, where the third keyword is different from the second keyword;
an eleventh obtaining unit, configured to obtain a third denoising instruction, where the third denoising instruction is used to perform a search in the first target document according to the third keyword to obtain a fourth target document;
a second execution unit, configured to obtain a fourth denoising instruction according to the fourth target document, where the fourth denoising instruction is used to delete the fourth target document from the first target document and obtain a fifth target document.
Preferably, the apparatus further comprises:
a twelfth obtaining unit, configured to obtain a first attribute according to the denoising keyword;
a thirteenth obtaining unit, configured to obtain a second attribute according to the second keyword;
a fourteenth obtaining unit, configured to obtain the first association according to the first attribute and the second attribute.
Preferably, the apparatus further comprises:
a fifteenth obtaining unit, configured to obtain a fourth keyword according to the first attribute and the first predetermined threshold;
a second judging unit, configured to judge whether the fourth keyword is the same as the second keyword and a third keyword;
a sixteenth obtaining unit, configured to obtain a fifth denoising instruction when the fourth keyword is different from the second keyword and the third keyword, where the fifth denoising instruction is used to perform a search in the third target document according to the fourth keyword to obtain a sixth target document;
a third execution unit, configured to obtain a sixth denoising instruction according to the fifth denoising instruction and the fourth keyword, where the sixth denoising instruction is used to delete the sixth target document from the third target document, and obtain a seventh target document.
Various changes and specific examples of the automatic denoising method based on keyword search data in the first embodiment of fig. 1 are also applicable to the automatic denoising device based on keyword search data in the present embodiment, and through the foregoing detailed description of the automatic denoising method based on keyword search data, those skilled in the art can clearly know the implementation method of the automatic denoising device based on keyword search data in the present embodiment, so for the brevity of the description, detailed descriptions are not repeated here.
EXAMPLE III
Based on the same inventive concept as the automatic denoising method based on the keyword search data in the foregoing embodiment, the present invention further provides an automatic denoising device based on the keyword search data, as shown in fig. 3, including a memory 304, a processor 302, and a computer program stored on the memory 304 and operable on the processor 302, wherein the processor 302 implements the steps of any one of the foregoing automatic denoising methods based on the keyword search data when executing the program.
Where in fig. 3 a bus architecture (represented by bus 300), bus 300 may include any number of interconnected buses and bridges, bus 300 linking together various circuits including one or more processors, represented by processor 302, and memory, represented by memory 304. The bus 300 may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface 306 provides an interface between the bus 300 and the receiver 301 and transmitter 303. The receiver 301 and the transmitter 303 may be the same element, i.e., a transceiver, providing a means for communicating with various other apparatus over a transmission medium. The processor 302 is responsible for managing the bus 300 and general processing, and the memory 304 may be used for storing data used by the processor 302 in performing operations.
Example four
Based on the same inventive concept as the automatic denoising method based on keyword retrieval data in the foregoing embodiments, the present invention also provides a computer-readable storage medium having a computer program stored thereon, which when executed by a processor, implements the following steps: obtaining a first keyword according to a first document; obtaining a first target document according to the first keyword; obtaining a first patent according to the first target document, wherein the first patent has a second keyword, and the second keyword is different from the first keyword; obtaining a first denoising instruction, wherein the first denoising instruction is used for deleting the first patent from the first target document; obtaining a second denoising instruction according to the first denoising instruction and a second keyword, wherein the second denoising instruction is used for retrieving in the first target document according to the second keyword to obtain a second target document; and deleting the second target document from the first target document to obtain a third target document.
In a specific implementation, when the program is executed by a processor, any method step in the first embodiment may be further implemented.
One or more technical solutions in the embodiments of the present application have at least one or more of the following technical effects:
according to the automatic denoising method and device based on the keyword retrieval data, provided by the embodiment of the invention, a first keyword is obtained according to a first document; obtaining a first target document according to the first keyword; obtaining a first patent according to the first target document, wherein the first patent has a second keyword, and the second keyword is different from the first keyword; obtaining a first denoising instruction, wherein the first denoising instruction is used for deleting the first patent from the first target document; obtaining a second denoising instruction according to the first denoising instruction and a second keyword, wherein the second denoising instruction is used for retrieving in the first target document according to the second keyword to obtain a second target document; and deleting the second target document from the first target document to obtain a third target document. The keyword is effectively determined through analysis processing of the keyword, automatic denoising processing is carried out on the retrieval data, accuracy of retrieval results is improved, and time and labor waste caused by manual denoising is effectively avoided. Therefore, the technical problems that in the prior art, the search result is denoised through manually screening keywords, the denoising process is time-consuming and labor-consuming, omission is easily caused, denoising is incomplete, and the accuracy of the search result cannot be guaranteed are solved.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (9)

1. An automatic denoising method based on keyword retrieval data, characterized in that the method comprises:
obtaining a first keyword according to a first document;
obtaining a first target document according to the first keyword;
obtaining a first patent according to the first target document, wherein the first patent has a second keyword, and the second keyword is different from the first keyword;
obtaining a first denoising instruction, wherein the first denoising instruction is used for deleting the first patent from the first target document;
obtaining a second denoising instruction according to the first denoising instruction and a second keyword, wherein the second denoising instruction is used for retrieving in the first target document according to the second keyword to obtain a second target document;
and deleting the second target document from the first target document to obtain a third target document.
2. The method of claim 1, wherein obtaining the first denoising instruction comprises:
obtaining a denoising keyword;
obtaining a first relevance according to the denoising keyword and the second keyword;
judging whether the first relevance meets a first preset threshold value or not;
obtaining the first denoising instruction when the first relevance meets the first predetermined threshold.
3. The method of claim 2, wherein said determining whether said first association satisfies a first predetermined threshold further comprises:
when the first relevance does not meet the first preset threshold, obtaining a second relevance according to the first preset threshold;
obtaining a third keyword according to the denoising keyword and the second relevance, wherein the third keyword is different from the second keyword;
obtaining a third denoising instruction, wherein the third denoising instruction is used for searching in the first target document according to the third key word to obtain a fourth target document;
and obtaining a fourth denoising instruction according to the fourth target document, wherein the fourth denoising instruction is used for deleting the fourth target document from the first target document to obtain a fifth target document.
4. The method of claim 2, wherein obtaining a first correlation based on the de-noised keyword and the second keyword comprises:
obtaining a first attribute according to the denoising keyword;
obtaining a second attribute according to the second keyword;
and obtaining the first relevance according to the first attribute and the second attribute.
5. The method of claim 4, wherein the method further comprises:
obtaining a fourth keyword according to the first attribute and the first preset threshold;
judging whether the fourth keyword is the same as the second keyword and the third keyword;
when the fourth keyword is different from the second keyword and the third keyword, obtaining a fifth denoising instruction, wherein the fifth denoising instruction is used for retrieving in the third target document according to the fourth keyword to obtain a sixth target document;
and obtaining a sixth denoising instruction according to the fifth denoising instruction and the fourth keyword, wherein the sixth denoising instruction is used for deleting the sixth target document from the third target document to obtain a seventh target document.
6. An automatic denoising apparatus for retrieving data based on a keyword, the apparatus comprising:
a first obtaining unit configured to obtain a first keyword from a first document;
a second obtaining unit, configured to obtain a first target document according to the first keyword;
a third obtaining unit, configured to obtain a first patent according to the first target document, where the first patent has a second keyword, where the second keyword is different from the first keyword;
a fourth obtaining unit, configured to obtain a first denoising instruction, where the first denoising instruction is used to delete the first patent from the first target document;
a fifth obtaining unit, configured to obtain a second denoising instruction according to the first denoising instruction and a second keyword, where the second denoising instruction is used to perform retrieval in the first target document according to the second keyword to obtain a second target document;
a first executing unit, configured to delete the second target document from the first target file, and obtain a third target document.
7. The apparatus of claim 6, wherein the apparatus further comprises:
a sixth obtaining unit, configured to obtain a denoising keyword;
a seventh obtaining unit, configured to obtain a first relevance according to the denoising keyword and the second keyword;
a first judging unit configured to judge whether the first correlation satisfies a first predetermined threshold;
an eighth obtaining unit, configured to obtain the first denoising instruction when the first correlation satisfies the first predetermined threshold.
8. An automatic denoising apparatus for retrieving data based on keywords, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method according to any one of claims 1 to 5 when executing the program.
9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.
CN202010092898.5A 2020-02-14 2020-02-14 Automatic denoising method and device based on keyword retrieval data Withdrawn CN111274364A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010092898.5A CN111274364A (en) 2020-02-14 2020-02-14 Automatic denoising method and device based on keyword retrieval data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010092898.5A CN111274364A (en) 2020-02-14 2020-02-14 Automatic denoising method and device based on keyword retrieval data

Publications (1)

Publication Number Publication Date
CN111274364A true CN111274364A (en) 2020-06-12

Family

ID=70999537

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010092898.5A Withdrawn CN111274364A (en) 2020-02-14 2020-02-14 Automatic denoising method and device based on keyword retrieval data

Country Status (1)

Country Link
CN (1) CN111274364A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112115236A (en) * 2020-10-09 2020-12-22 湖北中烟工业有限责任公司 Method and device for constructing tobacco scientific and technical literature data deduplication model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112115236A (en) * 2020-10-09 2020-12-22 湖北中烟工业有限责任公司 Method and device for constructing tobacco scientific and technical literature data deduplication model
CN112115236B (en) * 2020-10-09 2024-02-02 湖北中烟工业有限责任公司 Construction method and device of tobacco science and technology literature data deduplication model

Similar Documents

Publication Publication Date Title
Bakar et al. Feature extraction approaches from natural language requirements for reuse in software product lines: A systematic literature review
CN106649260B (en) Product characteristic structure tree construction method based on comment text mining
US20030172357A1 (en) Knowledge management using text classification
US20100268703A1 (en) Method of search strategy visualization and interaction
CN111949855A (en) Knowledge map-based engineering technology knowledge retrieval platform and method thereof
CN112000790B (en) Legal text accurate retrieval method, terminal system and readable storage medium
CN113190687A (en) Knowledge graph determining method and device, computer equipment and storage medium
CN115757831A (en) Method and device for semi-automatically constructing domain knowledge graph
CN111274364A (en) Automatic denoising method and device based on keyword retrieval data
CN116049376B (en) Method, device and system for retrieving and replying information and creating knowledge
CN111444312A (en) Method and device for multi-platform combined patent retrieval
CN111062832A (en) Auxiliary analysis method and device for intelligently providing patent answer and debate opinions
CN115409541A (en) Cigarette brand data processing method based on data blood relationship
CN113987204A (en) Method and system for constructing field encyclopedia map
Mashina Application of statistical methods to solve the problem of enriching ontologies of developing subject areas
CN111353023A (en) Target database optimization method and device based on keyword retrieval
CN111339123A (en) Double-retrieval patent database establishing method and device
CN111309895A (en) Automatic denoising method and device for retrieval data
CN111274229A (en) Method and device for verifying denoising result of retrieved data
CN111324726A (en) Method and device for automatically drying patent database
CN111368062A (en) Verification method and device for denoising patent retrieval database
CN111339239B (en) Knowledge retrieval method and device, storage medium and server
US11960549B2 (en) Guided source collection for a machine learning model
CN111339243A (en) Method and device for denoising and checking retrieval data based on competitive product information
CN111324640A (en) Method and device for automatically expanding database

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20200612

WW01 Invention patent application withdrawn after publication