CN110895556B - Text retrieval method and device, storage medium and electronic device - Google Patents

Text retrieval method and device, storage medium and electronic device Download PDF

Info

Publication number
CN110895556B
CN110895556B CN201811069929.4A CN201811069929A CN110895556B CN 110895556 B CN110895556 B CN 110895556B CN 201811069929 A CN201811069929 A CN 201811069929A CN 110895556 B CN110895556 B CN 110895556B
Authority
CN
China
Prior art keywords
text
classification number
key
abstract
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811069929.4A
Other languages
Chinese (zh)
Other versions
CN110895556A (en
Inventor
詹焯扬
张晓泉
程昊
蔡健
袁子斌
李文文
邬龙
江涛
乔宝琛
杨妤卿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Blue Lantern Fish Intelligent Technology Co ltd
Original Assignee
Beijing Blue Lantern Fish Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Blue Lantern Fish Intelligent Technology Co ltd filed Critical Beijing Blue Lantern Fish Intelligent Technology Co ltd
Priority to CN201811069929.4A priority Critical patent/CN110895556B/en
Publication of CN110895556A publication Critical patent/CN110895556A/en
Application granted granted Critical
Publication of CN110895556B publication Critical patent/CN110895556B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a text retrieval method and device, a storage medium and an electronic device. Wherein the method comprises the following steps: acquiring a first patent text uploaded by a client; acquiring a key text extracted from a first patent text and a patent classification number matched with the first patent text; sending a first search request generated by utilizing the key text and the patent classification number to a server, wherein the first search request is used for requesting to search the first patent text; and acquiring a first patent text list which is returned by the server and matched with the first patent text, wherein the text similarity between the object patent text contained in the first patent text list and the first patent text is larger than a first threshold value. The invention solves the technical problem of lower retrieval efficiency in the related technology.

Description

Text retrieval method and device, storage medium and electronic device
Technical Field
The present invention relates to the field of computers, and in particular, to a text retrieval method and apparatus, a storage medium, and an electronic apparatus.
Background
In order to predict the future patent application prospect, many applicant will use the published patent application file to search the patent text.
However, in the process of performing the above-mentioned search by using the patent text search platform at present, a user is often required to perform preprocessing on the patent text to be searched, such as manually extracting keywords in the patent text in advance, writing a boolean search formula corresponding to the patent text in advance, and the like, and then performing the search by using the processed content. That is, for retrieval of patent text, the method provided by the related art has high operation complexity, resulting in a problem of low retrieval efficiency.
In view of the above problems, no effective solution has been proposed at present.
Disclosure of Invention
The embodiment of the invention provides a text retrieval method and device, a storage medium and an electronic device, which are used for at least solving the technical problem of low retrieval efficiency in the related technology.
According to an aspect of an embodiment of the present invention, there is provided a text retrieval method including: acquiring a first patent text uploaded by a client; acquiring a key text extracted from the first patent text and a patent classification number matched with the first patent text, wherein the key text is a text extracted from the first patent text through a text extraction model, and the text extraction model is a model for extracting the key text in the patent text, which is obtained after machine training is carried out by using the published patent text; the patent classification number is obtained through recognition of a text classification model, and the text classification model is a model which is obtained after machine training is carried out on the disclosed patent text and is used for recognizing the belonged classification of the patent text; sending a first search request generated by using the key text and the patent classification number to a server, wherein the first search request is used for requesting to search the first patent text; and acquiring a first patent text list which is returned by the server and matched with the first patent text, wherein the text similarity between the object patent text contained in the first patent text list and the first patent text is larger than a first threshold value.
According to another aspect of the embodiment of the present invention, there is also provided a text retrieval method, including: receiving a first patent text sent by a client; acquiring a key text extracted from the first patent text and a patent classification number matched with the first patent text, wherein the key text is a text extracted from the first patent text through a text extraction model, and the text extraction model is a model for extracting the key text in the patent text, which is obtained after machine training is carried out by using the published patent text; the patent classification number is obtained through recognition of a text classification model, and the text classification model is a model which is obtained after machine training is carried out on the disclosed patent text and is used for recognizing the belonged classification of the patent text; transmitting the key text and the patent classification number to the client; receiving a first search request generated by the client by using the key text and the patent classification number, wherein the first search request is used for requesting to search the first patent text; and returning a first patent text list matched with the first patent text to the client, wherein the text similarity between the object patent text contained in the first patent text list and the first patent text is larger than a first threshold value.
According to still another aspect of the embodiment of the present invention, there is also provided a text retrieval apparatus including: the first acquisition unit is used for acquiring a first patent text uploaded by the client; a second obtaining unit, configured to obtain a key text extracted from the first patent text and a patent classification number matched with the first patent text, where the key text is a text extracted from the first patent text by using a text extraction model, and the text extraction model is a model obtained by performing machine training using a published patent text and used for extracting a key text in the patent text; the patent classification number is obtained through recognition of a text classification model, and the text classification model is a model which is obtained after machine training is carried out on the disclosed patent text and is used for recognizing the belonged classification of the patent text; a first sending unit, configured to send a first search request generated by using the key text and the patent classification number to a server, where the first search request is used to request to search for the first patent text; and a third obtaining unit, configured to obtain a first patent text list that is returned by the server and matches the first patent text, where a text similarity between an object patent text included in the first patent text list and the first patent text is greater than a first threshold.
According to still another aspect of the embodiment of the present invention, there is also provided a text retrieval apparatus including: the first receiving unit is used for receiving a first patent text sent by the client; an obtaining unit, configured to obtain a key text extracted from the first patent text and a patent classification number matched with the first patent text, where the key text is a text extracted from the first patent text by using a text extraction model, and the text extraction model is a model for extracting a key text in the patent text obtained after machine training using a published patent text; the patent classification number is obtained through recognition of a text classification model, and the text classification model is a model which is obtained after machine training is carried out on the disclosed patent text and is used for recognizing the belonged classification of the patent text; a transmitting unit, configured to transmit the key text and the patent classification number to the client; a second receiving unit, configured to receive a first search request generated by the client using the key text and the patent classification number, where the first search request is used to request to search for the first patent text; and the return unit is used for returning a first patent text list matched with the first patent text to the client, wherein the text similarity between the object patent text contained in the first patent text list and the first patent text is larger than a first threshold value.
According to a further aspect of embodiments of the present invention, there is also provided a storage medium having stored therein a computer program, wherein the computer program is arranged to perform the above text retrieval method at run-time.
According to still another aspect of the embodiments of the present invention, there is further provided an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the text retrieval method described above through the computer program.
In the embodiment of the invention, after the first patent text uploaded by the client is acquired, a text extraction model is used for extracting key text from the first patent text, and a text classification model is used for identifying the patent classification number in the first patent text. And generating a first search request according to the key text and the patent classification number and sending the first search request to the server, so that the server searches the first patent text list according to the patent search request to obtain a search result. In the process, the key text and the patent classification number are accurately acquired through the model, so that the first patent text can be accurately and efficiently searched, and the technical problem of low searching efficiency in the related technology is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:
FIG. 1 is a flow diagram of an alternative text retrieval method according to an embodiment of the invention;
FIG. 2 is a schematic diagram of an alternative text retrieval method according to an embodiment of the invention;
FIG. 3 is a schematic diagram of another alternative text retrieval method according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of yet another alternative text retrieval method according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of yet another alternative text retrieval method according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of yet another alternative text retrieval method according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of yet another alternative text retrieval method according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of yet another alternative text retrieval method according to an embodiment of the present invention;
FIG. 9 is a flow diagram of another alternative text retrieval method according to an embodiment of the present invention;
FIG. 10 is a schematic diagram of an alternative text retrieval device according to an embodiment of the present invention;
FIG. 11 is a schematic diagram of an alternative text retrieval device according to an embodiment of the present invention;
fig. 12 is a schematic structural view of an alternative electronic device according to an embodiment of the present invention.
Fig. 13 is a schematic structural view of another alternative electronic device according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
According to an aspect of the embodiment of the present invention, there is provided a text retrieval method, optionally, as an optional implementation manner, as shown in fig. 1, the text retrieval method includes:
s102, acquiring a first patent text uploaded by a client;
s104, acquiring a key text extracted from the first patent text and a patent classification number matched with the first patent text;
s106, sending a first search request generated by utilizing the key text and the patent classification number to a server, wherein the first search request is used for requesting to search the first patent text;
s108, acquiring a first patent text list which is returned by the server and matched with the first patent text, wherein the text similarity between the object patent text contained in the first patent text list and the first patent text is larger than a first threshold value.
Alternatively, the text retrieval method described above may be applied, but not limited to, in retrieving similar patent text. In the related art, in the process of searching similar patent texts, the key texts are usually manually extracted, and the search formulas corresponding to the patent texts are manually written, so that the search efficiency of the patent is low. In the scheme, after the first patent text is obtained, the key text in the first patent text is extracted by using the text extraction model trained by a machine, and the patent classification number of the first patent text is identified by using the text classification model trained by the machine, so that a first search request can be generated according to the key text and the patent classification number, and the first patent text is searched, thereby simplifying the search step of searching the first patent text, and improving the search efficiency of the first patent text.
Alternatively, the file format of the first patent text may be. Txt or. Doc or. Docx or. Wps, etc.
Optionally, as an optional example, the acquiring the first patent text uploaded by the client includes at least one of:
(1) Acquiring a selection instruction, wherein the selection instruction is used for indicating to select a first patent text stored under a target path; responding to the selection instruction, and uploading a first patent text;
alternatively, the selection instruction may be, but not limited to, a click instruction or a long press instruction. For example, the selection instruction is a click instruction, as shown in fig. 2. After logging in the account, a prompt for adding the first patent text is displayed on a display interface of the client, and after a button is pressed, a plurality of patent texts are displayed, namely a text W-1, a text W-2, a text W-3 and a text W-4. At this time, the text W-1 is selected by a click command, and the text W-1 is used as the first patent text.
(2) Acquiring a dragging instruction, wherein the dragging instruction is used for indicating to drag the first patent text to a target area of an interface displayed by a client; and responding to the dragging instruction, and uploading the first patent text.
Optionally, as an alternative manner, as shown in fig. 3, a prompt for adding the first patent text is prompted on the display interface of the client, and a target area is displayed, where the target area is represented by shading. Meanwhile, the text W-1, the text W-2, the text W-3 and the text W-4 exist outside the target area. If the text W-1 is to be used as the first patent text, the text W-1 is dragged into the target area, and the client can acquire the uploaded first patent text.
It should be noted that, as an alternative manner, after the text W-1 is dragged to the target area according to the drag instruction, the appearance of the text W-1 may be changed or a prompt tone may be generated, so as to prompt the user that the current text W-1 may be uploaded to the client as the first patent text after the user terminates the drag instruction.
By the method, the first patent text can be flexibly selected by acquiring the selection instruction or the dragging instruction, and the flexibility of acquiring the first patent text is improved. The efficiency of searching the first patent text is further improved.
Optionally, as an optional implementation manner, before sending the first search request generated by using the key text and the patent classification number to the server, the method further includes:
S1, displaying the obtained key text and the patent classification number in a client, wherein the key text comprises a first abstract text and a first keyword set, the first abstract text is used for representing a first patent text, and object keywords contained in the first keyword set are keywords extracted from the first abstract text;
s2, generating a first search request by utilizing the key text and the patent classification number.
Alternatively, after the first patent text is obtained, the text extraction model and the text classification model may be trained by, but not limited to, the published patent text before the text extraction model is used to extract the key text. For example, after the text of the published patent text is obtained, the key text and the patent classification number in the published patent text are labeled. Inputting the published patent text into a text extraction model, extracting words, sentences and segments in the patent text by the text extraction model to generate key texts, inputting the published patent text into a text classification model, and generating a patent classification number by the text classification model. And adjusting parameters of the text extraction model according to the matching degree of the generated key text and the marked key text, and adjusting the text classification model according to the matching degree of the generated patent classification number and the marked patent classification number until the matching degree of the key text generated by the text extraction model and the marked key text is larger than a preset threshold value, and the accuracy of the patent classification number generated by the text classification model is larger than another preset threshold value, so that the text extraction model and the text classification model are trained to be mature.
Alternatively, after the key text and the text classification model are obtained through the text classification model, the above key text and the patent classification number may be displayed, but not limited to.
Taking the acquired first patent text as W-1, text W-2, text W-3 and text W-4 as examples, after the first patent text is acquired, inputting the W-1, the text W-2, the text W-3 and the text W-4 into a text extraction model and a text classification model. And obtaining the key text and the patent classification number output by the two models. For example, as shown in FIG. 4, after the key text and patent classification numbers of W-1, text W-2, text W-3, text W-4 are obtained, the key text and patent classification numbers are displayed. In fig. 4, "machine", "computer", etc. are the first keyword set, and "a … …" etc. are the first digest text.
It should be noted that, the patent classification numbers and the key texts in fig. 4 are only for explaining the display process, and the specific meaning of the text is not limited to the present application.
Optionally, after displaying the above-mentioned key text and patent classification number, before generating the first search request by using the key text and patent classification number, the method further includes:
S1, acquiring a first adjustment instruction generated by editing operation executed in a client;
s2, executing at least one of the following adjustment operations according to the first adjustment instruction: and adjusting the first abstract text into a second abstract text, adjusting the first keyword set into a second keyword set, and adjusting the patent classification number into an adjusted patent classification number.
Optionally, the first adjustment operation may be, but is not limited to, adding, deleting, or modifying the key text and/or the patent classification number.
For example, taking a modification operation performed on the key text and the patent classification number after the key text and the patent classification number are acquired as an example. The description is given with reference to fig. 5 and 6. As shown in fig. 5, a first keyword set of text W-1, a patent classification number, and a first abstract text are displayed in fig. 5. The contents of the first abstract text and the contents of the patent classification number are changed to the contents enclosed by the dashed line box of fig. 6. And generating a first sword request according to the changed key text and the patent classification number so that the server returns a first patent text list according to the first retrieval request.
By the method, the retrieval accuracy of retrieving the patent is improved and the retrieval efficiency is further improved by adjusting the key text and the patent classification number.
Optionally, after obtaining the first patent text list that matches the first patent text returned by the server, the method further includes:
s1, acquiring a second adjustment instruction generated by editing operation executed in a client;
s2, executing at least one of the following adjustment operations according to the second adjustment instruction: adjusting the first abstract text into a third abstract text, adjusting the first keyword set into a third keyword set, and adjusting the patent classification number into an adjusted patent classification number;
s3, obtaining an adjustment result obtained according to the second adjustment instruction;
s4, sending a second search request generated by utilizing the adjustment result to the server, wherein the second search request is used for requesting to search the first patent text;
s5, a second patent text list matched with the first patent text and returned by the server is obtained, wherein the text similarity between the object patent text and the first patent text contained in the second patent text list is larger than a second threshold, and the second threshold is larger than the first threshold.
Optionally, the second adjustment instruction may be, but is not limited to, adding or deleting or modifying the key text and/or the patent classification number.
For example, after the first patent text list returned by the server is obtained, the adding operation is performed on the key text, which is described with reference to fig. 7 and 8. As shown in fig. 7, after the text W-1 is retrieved, a first patent text list is obtained, and patents-1 and-2 are displayed in the first patent text list. As shown in fig. 8, after the augmentation operation is performed on the key text and the patent classification number, a second patent text list is obtained, where the second patent text list includes patent 3 and patent 4. Therefore, the accuracy of searching the patent is improved, and the efficiency of searching the patent is further improved.
Optionally, the acquiring the key text extracted from the first patent text and the patent classification number matched with the first patent text includes:
s1, sending a first patent text to a server to perform text preprocessing on the first patent text;
s2, acquiring a key text and a patent classification number returned by the server.
Optionally, the key text and the patent classification number returned by the obtaining server include:
s1, a server performs segmentation processing on a first patent text to obtain a text segment set corresponding to the first patent text;
s2, the server extracts first text features of the text segment set through the text extraction model, and performs word sense analysis and text recombination on the first patent text according to the first text features to obtain a first abstract text;
s3, the server extracts a first keyword set from the first abstract text;
and S4, the server extracts second text features of the text segment set through the text classification model, and identifies the patent classification number of the first patent text according to the second text features.
Alternatively, the segmenting of the first patent text may be, but is not limited to, dividing the first patent text into different paragraphs according to the line-feed character, or dividing the first patent text into a plurality of paragraphs with the same number of words according to the number of words.
Alternatively, the text extraction model and the text classification model may be, but are not limited to being, applied in a server.
Taking the example of segmenting the first patent text according to the line-feed character and acquiring the first abstract text according to the occurrence frequency of words. After the client acquires the first patent text, the first patent text is sent to the server, the text extraction model in the server segments the first patent text according to the line feed character, the first text characteristics of the text in each segment are acquired, and the first abstract text is extracted from the first patent text according to the first text characteristics. After the first abstract text is extracted, determining a first keyword set according to the occurrence frequency and the occurrence position of words in the first abstract text. Meanwhile, the server also extracts second text features of the text segment set by using the text classification model, and outputs a patent classification number according to the second text features. After the first abstract text, the first keyword set and the patent classification number are obtained, the first abstract text, the first keyword set and the patent classification number are sent to the client, and after a first search instruction sent by the client is received, the server searches the first abstract text, the first keyword set and the patent classification number according to the first search instruction, so that a first patent text list is obtained. And returns the first patent text list to the client. According to the method, after the search instruction of the client is obtained, the first patent text list is obtained by searching according to the key text and the patent classification number, so that the search efficiency of searching the patent text is improved.
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present invention is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present invention. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present invention.
According to another aspect of the embodiment of the invention, a text retrieval method is also provided. Optionally, as shown in fig. 9, the text retrieval method includes:
s902, receiving a first patent text sent by a client;
s904, acquiring a key text extracted from the first patent text and a patent classification number matched with the first patent text;
s906, sending the key text and the patent classification number to the client;
s908, receiving a first search request generated by the client side by utilizing the key text and the patent classification number, wherein the first search request is used for requesting to search the first patent text;
s910, a first patent text list matched with the first patent text is returned to the client, wherein the text similarity between the object patent text contained in the first patent text list and the first patent text is larger than a first threshold value.
Alternatively, the text retrieval method described above may be applied, but not limited to, in retrieving similar patent text. In the related art, in the process of searching similar patent texts, the key texts are usually manually extracted, and the search formulas corresponding to the patent texts are manually written, so that the search efficiency of the patent is low. In the scheme, after the first patent text is obtained, the key text in the first patent text is extracted by using the text extraction model trained by a machine, and the patent classification number of the first patent text is identified by using the text classification model trained by the machine, so that a first search request can be generated according to the key text and the patent classification number, and the first patent text is searched, thereby simplifying the search step of searching the first patent text, and improving the search efficiency of the first patent text.
Optionally, acquiring the key text extracted from the first patent text and the patent classification number matched with the first patent text includes:
s1, segmenting a first patent text to obtain a text segment set corresponding to the first patent text;
s2, extracting first text features of a text segment set through a text extraction model, and performing word sense analysis and text recombination on a first patent text according to the first text features to obtain a first abstract text;
S3, extracting a first keyword set from the first abstract text;
and S4, extracting second text features of the text segment set through the text classification model, and identifying patent classification numbers of the first patent text according to the second text features.
Alternatively, the segmenting of the first patent text may be, but is not limited to, dividing the first patent text into different paragraphs according to the line-feed character, or dividing the first patent text into a plurality of paragraphs with the same number of words according to the number of words.
Taking the example of segmenting the first patent text according to the line-feed character and acquiring the first abstract text according to the occurrence frequency of words. After the client acquires the first patent text, the first patent text is sent to the server, the text extraction model in the server segments the first patent text according to the line feed character, the first text characteristics of the text in each segment are acquired, and the first abstract text is extracted from the first patent text according to the first text characteristics. After the first abstract text is extracted, determining a first keyword set according to the occurrence frequency and the occurrence position of words in the first abstract text. Meanwhile, the server also extracts second text features of the text segment set by using the text classification model, and outputs a patent classification number according to the second text features. After the first abstract text, the first keyword set and the patent classification number are obtained, the first abstract text, the first keyword set and the patent classification number are sent to the client, and after a first search instruction sent by the client is received, the server searches the first abstract text, the first keyword set and the patent classification number according to the first search instruction, so that a first patent text list is obtained. And returns the first patent text list to the client. According to the method, after the search instruction of the client is obtained, the first patent text list is obtained by searching according to the key text and the patent classification number, so that the search efficiency of searching the patent text is improved.
Optionally, before returning the first patent text list matched with the first patent text to the client, the method further comprises:
s1, responding to a first retrieval request, and retrieving a first patent text list matched with a first patent text from a database through a text retrieval model, wherein the text retrieval model is a model for performing text retrieval according to text similarity, which is obtained after machine training is performed by using the disclosed patent text.
Alternatively, the text retrieval model may be, but is not limited to being, obtained through training. And obtaining a patent sample, wherein the patent sample comprises the patent to be searched and the target patent. And inputting the patent sample into a text retrieval model for training, and adjusting parameters of the text retrieval model to finally obtain a mature text retrieval model. And searching the first patent text by using a mature text search model, wherein the similarity between the patents in the obtained first patent text list and the first patent text is larger than a preset threshold value.
According to still another aspect of the embodiment of the invention, a text retrieval device is also provided. Alternatively, as shown in fig. 10, the above text retrieval apparatus includes:
(1) A first obtaining unit 1002, configured to obtain a first patent text uploaded by a client;
(2) A second obtaining unit 1004, configured to obtain a key text extracted from a first patent text and a patent classification number matched with the first patent text, where the key text is a text extracted from the first patent text by a text extraction model, and the text extraction model is a model for extracting a key text in the patent text obtained after machine training using a published patent text; the patent classification number is obtained through text classification model identification, and the text classification model is a model which is obtained after machine training is carried out on the disclosed patent text and is used for identifying the belonged classification of the patent text;
(3) A first sending unit 1006, configured to send, to a server, a first search request generated by using the key text and the patent classification number, where the first search request is used to request to search for the first patent text;
(4) And a third obtaining unit 1008, configured to obtain a first patent text list that is returned by the server and matches the first patent text, where a text similarity between the object patent text and the first patent text included in the first patent text list is greater than a first threshold.
Alternatively, the text retrieval device can be applied to the process of retrieving similar patent texts, but is not limited to the process. In the related art, in the process of searching similar patent texts, the key texts are usually manually extracted, and the search formulas corresponding to the patent texts are manually written, so that the search efficiency of the patent is low. In the scheme, after the first patent text is obtained, the key text in the first patent text is extracted by using the text extraction model trained by a machine, and the patent classification number of the first patent text is identified by using the text classification model trained by the machine, so that a first search request can be generated according to the key text and the patent classification number, and the first patent text is searched, thereby simplifying the search step of searching the first patent text, and improving the search efficiency of the first patent text.
Alternatively, the file format of the first patent text may be. Txt or. Doc or. Docx or. Wps, etc.
Optionally, the first acquiring unit includes at least one of:
(1) The second acquisition module is used for acquiring a selection instruction, wherein the selection instruction is used for indicating to select the first patent text stored under the target path; responding to the selection instruction, and uploading a first patent text;
(2) The third acquisition module is used for acquiring a dragging instruction, wherein the dragging instruction is used for indicating to drag the first patent text to a target area of an interface displayed by the client; and responding to the dragging instruction, and uploading the first patent text.
Optionally, the apparatus further includes:
(1) The display unit is used for displaying the acquired key text and the acquired patent classification number in the client before sending a first search request generated by utilizing the key text and the patent classification number to the server, wherein the key text comprises a first abstract text and a first keyword set, the first abstract text is used for representing the first patent text, and the object keywords contained in the first keyword set are keywords extracted from the first abstract text;
(2) And the generation unit is used for generating a first search request by utilizing the key text and the patent classification number.
Alternatively, after the first patent text is obtained, the text extraction model and the text classification model may be trained by, but not limited to, the published patent text before the text extraction model is used to extract the key text. For example, after the text of the published patent text is obtained, the key text and the patent classification number in the published patent text are labeled. Inputting the published patent text into a text extraction model, extracting words, sentences and segments in the patent text by the text extraction model to generate key texts, inputting the published patent text into a text classification model, and generating a patent classification number by the text classification model. And adjusting parameters of the text extraction model according to the matching degree of the generated key text and the marked key text, and adjusting the text classification model according to the matching degree of the generated patent classification number and the marked patent classification number until the matching degree of the key text generated by the text extraction model and the marked key text is larger than a preset threshold value, and the accuracy of the patent classification number generated by the text classification model is larger than another preset threshold value, so that the text extraction model and the text classification model are trained to be mature.
Alternatively, after the key text and the text classification model are obtained through the text classification model, the above key text and the patent classification number may be displayed, but not limited to.
Taking the acquired first patent text as W-1, text W-2, text W-3 and text W-4 as examples, after the first patent text is acquired, inputting the W-1, the text W-2, the text W-3 and the text W-4 into a text extraction model and a text classification model. And obtaining the key text and the patent classification number output by the two models. For example, as shown in FIG. 4, after the key text and patent classification numbers of W-1, text W-2, text W-3, text W-4 are obtained, the key text and patent classification numbers are displayed. In fig. 4, "machine", "computer", etc. are the first keyword set, and "a … …" etc. are the first digest text.
It should be noted that, the patent classification numbers and the key texts in fig. 4 are only for explaining the display process, and the specific meaning of the text is not limited to the present application.
Optionally, the apparatus further includes:
(1) A fourth obtaining unit configured to obtain a first adjustment instruction generated by an editing operation performed in the client before generating the first search request using the key text and the patent classification number;
(2) The first adjusting unit is used for executing at least one of the following adjusting operations according to the first adjusting instruction: and adjusting the first abstract text into a second abstract text, adjusting the first keyword set into a second keyword set, and adjusting the patent classification number into an adjusted patent classification number.
Optionally, the first adjustment operation may be, but is not limited to, adding, deleting, or modifying the key text and/or the patent classification number.
For example, taking a modification operation performed on the key text and the patent classification number after the key text and the patent classification number are acquired as an example. The description is given with reference to fig. 5 and 6. As shown in fig. 5, a first keyword set of text W-1, a patent classification number, and a first abstract text are displayed in fig. 5. The contents of the first abstract text and the contents of the patent classification number are changed to the contents enclosed by the dashed line box of fig. 6. And generating a first sword request according to the changed key text and the patent classification number so that the server returns a first patent text list according to the first retrieval request.
By the method, the retrieval accuracy of retrieving the patent is improved and the retrieval efficiency is further improved by adjusting the key text and the patent classification number.
Optionally, the apparatus further includes:
(1) A fifth obtaining unit, configured to obtain a second adjustment instruction generated by an editing operation performed in the client after obtaining a first patent text list that matches the first patent text returned by the server;
(2) The second adjusting unit is used for executing at least one of the following adjusting operations according to the second adjusting instruction: adjusting the first abstract text into a third abstract text, adjusting the first keyword set into a third keyword set, and adjusting the patent classification number into an adjusted patent classification number;
(3) A sixth obtaining unit, configured to obtain an adjustment result obtained according to the second adjustment instruction;
(4) A second sending unit, configured to send a second search request generated by using the adjustment result to the server, where the second search request is used to request to search the first patent text;
(5) A seventh obtaining unit, configured to obtain a second patent text list that is returned by the server and matches the first patent text, where a text similarity between the object patent text and the first patent text included in the second patent text list is greater than a second threshold, and the second threshold is greater than the first threshold.
Optionally, the second adjustment instruction may be, but is not limited to, adding or deleting or modifying the key text and/or the patent classification number.
For example, after the first patent text list returned by the server is obtained, the adding operation is performed on the key text, which is described with reference to fig. 7 and 8. As shown in fig. 7, after the text W-1 is retrieved, a first patent text list is obtained, and patents-1 and-2 are displayed in the first patent text list. As shown in fig. 8, after the augmentation operation is performed on the key text and the patent classification number, a second patent text list is obtained, where the second patent text list includes patent 3 and patent 4. Therefore, the accuracy of searching the patent is improved, and the efficiency of searching the patent is further improved.
Optionally, the second obtaining unit includes:
(1) The sending module is used for sending the first patent text to the server so as to perform text preprocessing on the first patent text;
(2) The first acquisition module is used for acquiring the key text and the patent classification number returned by the server.
Optionally, the key text and the patent classification number returned by the obtaining server include: the server performs segmentation processing on the first patent text to obtain a text segment set corresponding to the first patent text; the server extracts first text features of the text segment set through the text extraction model, and performs word sense analysis and text recombination on the first patent text according to the first text features to obtain a first abstract text; the server extracts a first keyword set from the first abstract text; and the server extracts second text features of the text segment set through the text classification model, and identifies the patent classification number of the first patent text according to the second text features.
Alternatively, the segmenting of the first patent text may be, but is not limited to, dividing the first patent text into different paragraphs according to the line-feed character, or dividing the first patent text into a plurality of paragraphs with the same number of words according to the number of words.
Alternatively, the text extraction model and the text classification model may be, but are not limited to being, applied in a server.
Taking the example of segmenting the first patent text according to the line-feed character and acquiring the first abstract text according to the occurrence frequency of words. After the client acquires the first patent text, the first patent text is sent to the server, the text extraction model in the server segments the first patent text according to the line feed character, the first text characteristics of the text in each segment are acquired, and the first abstract text is extracted from the first patent text according to the first text characteristics. After the first abstract text is extracted, determining a first keyword set according to the occurrence frequency and the occurrence position of words in the first abstract text. Meanwhile, the server also extracts second text features of the text segment set by using the text classification model, and outputs a patent classification number according to the second text features. After the first abstract text, the first keyword set and the patent classification number are obtained, the first abstract text, the first keyword set and the patent classification number are sent to the client, and after a first search instruction sent by the client is received, the server searches the first abstract text, the first keyword set and the patent classification number according to the first search instruction, so that a first patent text list is obtained. And returns the first patent text list to the client. According to the method, after the search instruction of the client is obtained, the first patent text list is obtained by searching according to the key text and the patent classification number, so that the search efficiency of searching the patent text is improved.
According to still another aspect of the embodiment of the present invention, there is further provided a text retrieval device, optionally, as shown in fig. 11, the text retrieval device includes:
(1) A first receiving unit 1102, configured to receive a first patent text sent by a client;
(2) An obtaining unit 1104, configured to obtain a key text extracted from a first patent text and a patent classification number matched with the first patent text, where the key text is a text extracted from the first patent text by a text extraction model, and the text extraction model is a model for extracting a key text in the patent text obtained after machine training using a published patent text; the patent classification number is obtained through text classification model identification, and the text classification model is a model which is obtained after machine training is carried out on the disclosed patent text and is used for identifying the belonged classification of the patent text;
(3) A sending unit 1106, configured to send the key text and the patent classification number to the client;
(4) A second receiving unit 1108, configured to receive a first search request generated by the client using the key text and the patent classification number, where the first search request is used to request to search for the first patent text;
(5) And a returning unit 1110, configured to return, to the client, a first patent text list that matches the first patent text, where a text similarity between the object patent text included in the first patent text list and the first patent text is greater than a first threshold.
Alternatively, the text retrieval method described above may be applied, but not limited to, in retrieving similar patent text. In the related art, in the process of searching similar patent texts, the key texts are usually manually extracted, and the search formulas corresponding to the patent texts are manually written, so that the search efficiency of the patent is low. In the scheme, after the first patent text is obtained, the key text in the first patent text is extracted by using the text extraction model trained by a machine, and the patent classification number of the first patent text is identified by using the text classification model trained by the machine, so that a first search request can be generated according to the key text and the patent classification number, and the first patent text is searched, thereby simplifying the search step of searching the first patent text, and improving the search efficiency of the first patent text.
Optionally, the acquiring unit includes:
(1) The processing module is used for carrying out segmentation processing on the first patent text to obtain a text segment set corresponding to the first patent text;
(2) The first extraction module is used for extracting first text features of the text segment set through the text extraction model, and performing word sense analysis and text recombination on the first patent text according to the first text features so as to obtain a first abstract text;
(3) The second extraction module is used for extracting a first keyword set from the first abstract text;
(4) And the third extraction module is used for extracting second text features of the text segment set through the text classification model and identifying the patent classification number of the first patent text according to the second text features.
Alternatively, the segmenting of the first patent text may be, but is not limited to, dividing the first patent text into different paragraphs according to the line-feed character, or dividing the first patent text into a plurality of paragraphs with the same number of words according to the number of words.
Taking the example of segmenting the first patent text according to the line-feed character and acquiring the first abstract text according to the occurrence frequency of words. After the client acquires the first patent text, the first patent text is sent to the server, the text extraction model in the server segments the first patent text according to the line feed character, the first text characteristics of the text in each segment are acquired, and the first abstract text is extracted from the first patent text according to the first text characteristics. After the first abstract text is extracted, determining a first keyword set according to the occurrence frequency and the occurrence position of words in the first abstract text. Meanwhile, the server also extracts second text features of the text segment set by using the text classification model, and outputs a patent classification number according to the second text features. After the first abstract text, the first keyword set and the patent classification number are obtained, the first abstract text, the first keyword set and the patent classification number are sent to the client, and after a first search instruction sent by the client is received, the server searches the first abstract text, the first keyword set and the patent classification number according to the first search instruction, so that a first patent text list is obtained. And returns the first patent text list to the client. According to the method, after the search instruction of the client is obtained, the first patent text list is obtained by searching according to the key text and the patent classification number, so that the search efficiency of searching the patent text is improved.
Optionally, the apparatus further includes:
(1) And the retrieval unit is used for responding to the first retrieval request and retrieving the first patent text list matched with the first patent text from the database through a text retrieval model before returning the first patent text list matched with the first patent text to the client, wherein the text retrieval model is a model for performing text retrieval according to the text similarity, which is obtained after machine training is performed by using the disclosed patent text.
Alternatively, the text retrieval model may be, but is not limited to being, obtained through training. And obtaining a patent sample, wherein the patent sample comprises the patent to be searched and the target patent. And inputting the patent sample into a text retrieval model for training, and adjusting parameters of the text retrieval model to finally obtain a mature text retrieval model. And searching the first patent text by using a mature text search model, wherein the similarity between the patents in the obtained first patent text list and the first patent text is larger than a preset threshold value.
According to a further aspect of embodiments of the present invention there is also provided an electronic device for implementing the above text retrieval method, as shown in fig. 12, the electronic device comprising a memory having stored therein a computer program and a processor arranged to perform the steps of any of the method embodiments described above by means of the computer program.
Alternatively, in this embodiment, the electronic apparatus may be located in at least one network device of a plurality of network devices of the computer network.
Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:
s1, acquiring a first patent text uploaded by a client;
s2, acquiring a key text extracted from a first patent text and a patent classification number matched with the first patent text, wherein the key text is a text extracted from the first patent text through a text extraction model, and the text extraction model is a model for extracting the key text in the patent text, which is obtained after machine training is carried out by using the disclosed patent text; the patent classification number is obtained through text classification model identification, and the text classification model is a model which is obtained after machine training is carried out on the disclosed patent text and is used for identifying the belonged classification of the patent text;
s3, sending a first search request generated by utilizing the key text and the patent classification number to a server, wherein the first search request is used for requesting to search the first patent text;
s4, acquiring a first patent text list which is returned by the server and matched with the first patent text, wherein the text similarity between the object patent text contained in the first patent text list and the first patent text is larger than a first threshold value.
Alternatively, it will be understood by those skilled in the art that the structure shown in fig. 12 is only schematic, and the electronic device may also be a terminal device such as a smart phone (e.g. an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, and a mobile internet device (Mobile Internet Devices, MID), a PAD, etc. Fig. 12 is not limited to the structure of the electronic device. For example, the electronic device may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 12, or have a different configuration than shown in FIG. 12.
The memory 1202 may be used to store software programs and modules, such as program instructions/modules corresponding to the text retrieval method and apparatus in the embodiments of the present invention, and the processor 1204 executes the software programs and modules stored in the memory 1202 to perform various functional applications and data processing, i.e., implement the text retrieval method described above. Memory 1202 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 1202 may further include memory located remotely from the processor 1204, which may be connected to the terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 1202 may be used for storing information such as, but not limited to, a first patent text, a key text, a patent classification number, etc. As an example, as shown in fig. 12, the memory 1202 may include, but is not limited to, a first obtaining unit 1002, a second obtaining unit 1004, a first transmitting unit 1006, and a third obtaining unit 1008 in the text retrieving device. In addition, other module units in the text retrieval device may be included, but are not limited to, and are not described in detail in this example.
Optionally, the transmission device 1206 is configured to receive or transmit data via a network. Specific examples of the network described above may include wired networks and wireless networks. In one example, the transmission means 1206 comprises a network adapter (Network Interface Controller, NIC) that can be connected to other network devices and routers via a network cable to communicate with the internet or a local area network. In one example, the transmission device 1206 is a Radio Frequency (RF) module for communicating wirelessly with the internet.
In addition, the electronic device further includes: a connection bus 1208 for connecting the respective module components in the above-described electronic apparatus.
According to a further aspect of embodiments of the present invention there is also provided an electronic device, as shown in fig. 13, comprising a memory in which a computer program is stored and a processor arranged to perform the steps of any of the method embodiments described above by means of the computer program.
Alternatively, in this embodiment, the electronic apparatus may be located in at least one network device of a plurality of network devices of the computer network.
Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:
S1, receiving a first patent text sent by a client;
s2, acquiring a key text extracted from a first patent text and a patent classification number matched with the first patent text, wherein the key text is a text extracted from the first patent text through a text extraction model, and the text extraction model is a model for extracting the key text in the patent text, which is obtained after machine training is carried out by using the disclosed patent text; the patent classification number is obtained through text classification model identification, and the text classification model is a model which is obtained after machine training is carried out on the disclosed patent text and is used for identifying the belonged classification of the patent text;
s3, sending the key text and the patent classification number to the client;
s4, receiving a first search request generated by the client side by utilizing the key text and the patent classification number, wherein the first search request is used for requesting to search the first patent text;
and S5, returning a first patent text list matched with the first patent text to the client, wherein the text similarity between the object patent text contained in the first patent text list and the first patent text is larger than a first threshold value.
Alternatively, it will be understood by those skilled in the art that the structure shown in fig. 13 is only schematic, and the electronic device may also be a terminal device such as a smart phone (e.g. an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, and a mobile internet device (Mobile Internet Devices, MID), a PAD, etc. Fig. 13 is not limited to the structure of the electronic device. For example, the electronic device may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in fig. 13, or have a different configuration than shown in fig. 13.
The memory 1302 may be used to store software programs and modules, such as program instructions/modules corresponding to the text retrieval method and apparatus in the embodiments of the present invention, and the processor 1304 executes the software programs and modules stored in the memory 1302, thereby performing various functional applications and data processing, that is, implementing the text retrieval method described above. Memory 1302 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 1302 may further include memory located remotely from processor 1304, which may be connected to the terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 1302 may be used to store, but is not limited to, information such as a first patent text, a key text, a patent classification number, etc. As an example, as shown in fig. 13, the memory 1302 may include, but is not limited to, a first receiving unit 1102, an obtaining unit 1104, a transmitting unit 1106, a second receiving unit 1108, and a returning unit 1110 in the text retrieving device. In addition, other module units in the text retrieval device may be included, but are not limited to, and are not described in detail in this example.
Optionally, the transmission device 1306 is configured to receive or transmit data via a network. Specific examples of the network described above may include wired networks and wireless networks. In one example, the transmission means 1306 comprises a network adapter (Network Interface Controller, NIC) which can be connected to other network devices and routers via network lines so as to communicate with the internet or a local area network. In one example, the transmission device 1306 is a Radio Frequency (RF) module for communicating wirelessly with the internet.
In addition, the electronic device further includes: a connection bus 1308 for connecting the various modular components of the electronic device described above.
According to a further aspect of embodiments of the present invention there is also provided a storage medium having stored therein a computer program, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.
Alternatively, in the present embodiment, the above-described storage medium may be configured to store a computer program for performing the steps of:
s1, acquiring a first patent text uploaded by a client;
s2, acquiring a key text extracted from a first patent text and a patent classification number matched with the first patent text, wherein the key text is a text extracted from the first patent text through a text extraction model, and the text extraction model is a model for extracting the key text in the patent text, which is obtained after machine training is carried out by using the disclosed patent text; the patent classification number is obtained through text classification model identification, and the text classification model is a model which is obtained after machine training is carried out on the disclosed patent text and is used for identifying the belonged classification of the patent text;
S3, sending a first search request generated by utilizing the key text and the patent classification number to a server, wherein the first search request is used for requesting to search the first patent text;
s4, acquiring a first patent text list which is returned by the server and matched with the first patent text, wherein the text similarity between the object patent text contained in the first patent text list and the first patent text is larger than a first threshold value.
Alternatively, in the present embodiment, the above-described storage medium may be configured to store a computer program for performing the steps of:
s1, receiving a first patent text sent by a client;
s2, acquiring a key text extracted from a first patent text and a patent classification number matched with the first patent text, wherein the key text is a text extracted from the first patent text through a text extraction model, and the text extraction model is a model for extracting the key text in the patent text, which is obtained after machine training is carried out by using the disclosed patent text; the patent classification number is obtained through text classification model identification, and the text classification model is a model which is obtained after machine training is carried out on the disclosed patent text and is used for identifying the belonged classification of the patent text;
S3, sending the key text and the patent classification number to the client;
s4, receiving a first search request generated by the client side by utilizing the key text and the patent classification number, wherein the first search request is used for requesting to search the first patent text;
and S5, returning a first patent text list matched with the first patent text to the client, wherein the text similarity between the object patent text contained in the first patent text list and the first patent text is larger than a first threshold value.
Alternatively, in this embodiment, it will be understood by those skilled in the art that all or part of the steps in the methods of the above embodiments may be performed by a program for instructing a terminal device to execute the steps, where the program may be stored in a computer readable storage medium, and the storage medium may include: flash disk, read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic or optical disk, and the like.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
The integrated units in the above embodiments may be stored in the above-described computer-readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present invention may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing one or more computer devices (which may be personal computers, servers or network devices, etc.) to perform all or part of the steps of the method described in the embodiments of the present invention.
In the foregoing embodiments of the present invention, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.
In several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the division of the units, is merely a logical function division, and may be implemented in another manner, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims (12)

1. A text retrieval method, comprising:
acquiring a first patent text uploaded by a client;
acquiring a key text extracted from the first patent text and a patent classification number matched with the first patent text, wherein the key text is a text extracted from the first patent text through a text extraction model, and the text extraction model is a model for extracting the key text in the patent text, which is obtained after machine training is carried out by using the published patent text; the patent classification number is obtained through text classification model identification, and the text classification model is a model which is obtained after machine training is carried out on the disclosed patent text and is used for identifying the belonged classification of the patent text;
Sending a first retrieval request generated by utilizing the key text and the patent classification number to a server, wherein the first retrieval request is used for requesting to retrieve the first patent text;
acquiring a first patent text list which is returned by the server and matched with the first patent text, wherein the text similarity between an object patent text contained in the first patent text list and the first patent text is larger than a first threshold;
before the first search request generated by using the key text and the patent classification number is sent to the server, the method further comprises: displaying the obtained key text and the patent classification number in the client, wherein the key text comprises a first abstract text and a first keyword set, the first abstract text is used for representing the first patent text, and the object keywords contained in the first keyword set are keywords extracted from the first abstract text; generating the first search request by utilizing the key text and the patent classification number;
before the generating the first search request by using the key text and the patent classification number, the method further comprises: acquiring a first adjustment instruction generated by an editing operation executed in the client; executing at least one of the following adjustment operations according to the first adjustment instruction: adjusting the first abstract text into a second abstract text, adjusting the first keyword set into a second keyword set, and adjusting the patent classification number into the adjusted patent classification number;
The step of obtaining the key text extracted from the first patent text and the patent classification number matched with the first patent text comprises the following steps: sending the first patent text to the server to perform text preprocessing on the first patent text; acquiring the key text returned by the server and the patent classification number;
the obtaining the key text and the patent classification number returned by the server comprises: the server performs segmentation processing on the first patent text according to a line feed character or according to the number of words to obtain a text segment set corresponding to the first patent text; the server extracts first text features of the text segment set through the text extraction model, and performs word sense analysis and text recombination on the first patent text according to the first text features to obtain the first abstract text; the server extracts the first keyword set from the first abstract text; the server extracts second text features of the text segment set through the text classification model, and identifies the patent classification number of the first patent text according to the second text features;
The server extracting the first keyword set from the first abstract text includes: the first keyword set is determined based on the occurrence frequency and the occurrence position of the words in the first abstract text.
2. The method of claim 1, further comprising, after the obtaining the first list of patent texts returned by the server that matches the first patent text:
acquiring a second adjustment instruction generated by an editing operation executed in the client;
executing at least one of the following adjustment operations according to the second adjustment instruction: adjusting the first abstract text into a third abstract text, adjusting the first keyword set into a third keyword set, and adjusting the patent classification number into the adjusted patent classification number;
acquiring an adjustment result obtained according to the second adjustment instruction;
sending a second search request generated by using the adjustment result to the server, wherein the second search request is used for requesting to search the first patent text;
and acquiring a second patent text list which is returned by the server and matched with the first patent text, wherein the text similarity between the object patent text contained in the second patent text list and the first patent text is larger than a second threshold value, and the second threshold value is larger than the first threshold value.
3. The method according to any one of claims 1 to 2, wherein the obtaining the first patent text uploaded by the client comprises at least one of:
acquiring a selection instruction, wherein the selection instruction is used for indicating to select the first patent text stored under a target path; responding to the selection instruction, and uploading the first patent text;
acquiring a dragging instruction, wherein the dragging instruction is used for indicating to drag the first patent text to a target area of an interface displayed by the client; and responding to the dragging instruction, and uploading the first patent text.
4. A text retrieval method, comprising:
receiving a first patent text sent by a client;
acquiring a key text extracted from the first patent text and a patent classification number matched with the first patent text, wherein the key text is a text extracted from the first patent text through a text extraction model, and the text extraction model is a model for extracting the key text in the patent text, which is obtained after machine training is carried out by using the published patent text; the patent classification number is obtained through text classification model identification, and the text classification model is a model which is obtained after machine training is carried out on the disclosed patent text and is used for identifying the belonged classification of the patent text;
Sending the key text and the patent classification number to the client;
receiving a first retrieval request generated by the client by utilizing the key text and the patent classification number, wherein the first retrieval request is used for requesting to retrieve the first patent text;
returning a first patent text list matched with the first patent text to the client, wherein the text similarity between the object patent text contained in the first patent text list and the first patent text is larger than a first threshold;
wherein the first search request is generated by: displaying the obtained key text and the patent classification number in the client, wherein the key text comprises a first abstract text and a first keyword set, the first abstract text is used for representing the first patent text, and the object keywords contained in the first keyword set are keywords extracted from the first abstract text; generating the first search request by utilizing the key text and the patent classification number;
the patent classification number is determined by: acquiring a first adjustment instruction generated by an editing operation executed in the client; executing at least one of the following adjustment operations according to the first adjustment instruction: adjusting the first abstract text into a second abstract text, adjusting the first keyword set into a second keyword set, and adjusting the patent classification number into the adjusted patent classification number;
The step of obtaining the key text extracted from the first patent text and the patent classification number matched with the first patent text comprises the following steps: segmenting the first patent text to obtain a text segment set corresponding to the first patent text; extracting first text features of the text segment set through the text extraction model according to line-feed symbols or according to word numbers, and performing word sense analysis and text recombination on the first patent text according to the first text features to obtain a first abstract text; extracting a first keyword set from the first abstract text; extracting second text features of the text segment set through the text classification model, and identifying the patent classification number of the first patent text according to the second text features;
extracting the first keyword set from the first abstract text comprises: the first keyword set is determined based on the occurrence frequency and the occurrence position of the words in the first abstract text.
5. The method of claim 4, further comprising, prior to said returning to said client a first list of patent text that matches said first patent text:
And responding to the first retrieval request, and retrieving the first patent text list matched with the first patent text from a database through a text retrieval model, wherein the text retrieval model is a model for performing text retrieval according to text similarity, which is obtained after machine training is performed by using the published patent text.
6. A text retrieval apparatus, comprising:
the first acquisition unit is used for acquiring a first patent text uploaded by the client;
a second obtaining unit, configured to obtain a key text extracted from the first patent text and a patent classification number matched with the first patent text, where the key text is a text extracted from the first patent text by using a text extraction model, and the text extraction model is a model obtained by performing machine training using a published patent text and used for extracting a key text in the patent text; the patent classification number is obtained through text classification model identification, and the text classification model is a model which is obtained after machine training is carried out on the disclosed patent text and is used for identifying the belonged classification of the patent text;
The first sending unit is used for sending a first search request generated by utilizing the key text and the patent classification number to a server, wherein the first search request is used for requesting to search the first patent text;
a third obtaining unit, configured to obtain a first patent text list that is returned by the server and matches with the first patent text, where a text similarity between an object patent text included in the first patent text list and the first patent text is greater than a first threshold;
the apparatus further comprises:
a display unit, configured to display, in the client, the obtained key text and the patent classification number before the first search request generated by using the key text and the patent classification number is sent to the server, where the key text includes a first abstract text and a first keyword set, where the first abstract text is used to represent the first patent text, and an object keyword included in the first keyword set is a keyword extracted from the first abstract text; the generation unit is used for generating the first search request by utilizing the key text and the patent classification number;
The apparatus further comprises: a fourth obtaining unit configured to obtain a first adjustment instruction generated by an editing operation performed in the client before the first search request is generated using the key text and the patent classification number; a first adjusting unit, configured to perform at least one of the following adjusting operations according to the first adjusting instruction: adjusting the first abstract text into a second abstract text, adjusting the first keyword set into a second keyword set, and adjusting the patent classification number into the adjusted patent classification number;
the second acquisition unit includes: the sending module is used for sending the first patent text to the server so as to perform text preprocessing on the first patent text; the first acquisition module is used for acquiring the key text and the patent classification number returned by the server; the obtaining the key text and the patent classification number returned by the server comprises: the server performs segmentation processing on the first patent text according to a line feed character or according to the number of words to obtain a text segment set corresponding to the first patent text; the server extracts first text features of the text segment set through the text extraction model, and performs word sense analysis and text recombination on the first patent text according to the first text features to obtain the first abstract text; the server extracts the first keyword set from the first abstract text; the server extracts second text features of the text segment set through the text classification model, and identifies the patent classification number of the first patent text according to the second text features;
The server extracts the first keyword set from the first abstract text by the following method: the first keyword set is determined based on the occurrence frequency and the occurrence position of the words in the first abstract text.
7. The apparatus of claim 6, wherein the apparatus further comprises:
a fifth obtaining unit, configured to obtain a second adjustment instruction generated by an editing operation performed in the client after the obtaining of the first patent text list that matches the first patent text and is returned by the server;
a second adjusting unit, configured to perform at least one of the following adjusting operations according to the second adjusting instruction: adjusting the first abstract text into a third abstract text, adjusting the first keyword set into a third keyword set, and adjusting the patent classification number into the adjusted patent classification number;
a sixth obtaining unit, configured to obtain an adjustment result obtained according to the second adjustment instruction;
a second sending unit, configured to send a second search request generated by using the adjustment result to the server, where the second search request is used to request to search the first patent text;
A seventh obtaining unit, configured to obtain a second patent text list that is returned by the server and matches the first patent text, where a text similarity between an object patent text included in the second patent text list and the first patent text is greater than a second threshold, and the second threshold is greater than the first threshold.
8. The apparatus according to any one of claims 6 to 7, wherein the first acquisition unit comprises at least one of:
the second acquisition module is used for acquiring a selection instruction, wherein the selection instruction is used for indicating to select the first patent text stored under the target path; responding to the selection instruction, and uploading the first patent text;
a third obtaining module, configured to obtain a drag instruction, where the drag instruction is used to instruct to drag the first patent text to a target area of an interface displayed by the client; and responding to the dragging instruction, and uploading the first patent text.
9. A text retrieval apparatus, comprising:
the first receiving unit is used for receiving a first patent text sent by the client;
an obtaining unit, configured to obtain a key text extracted from the first patent text and a patent classification number matched with the first patent text, where the key text is a text extracted from the first patent text by using a text extraction model, and the text extraction model is a model for extracting a key text in the patent text obtained after machine training is performed by using a published patent text; the patent classification number is obtained through text classification model identification, and the text classification model is a model which is obtained after machine training is carried out on the disclosed patent text and is used for identifying the belonged classification of the patent text;
The sending unit is used for sending the key text and the patent classification number to the client;
the second receiving unit is used for receiving a first search request generated by the client side by utilizing the key text and the patent classification number, wherein the first search request is used for requesting to search the first patent text;
a return unit, configured to return, to the client, a first patent text list that matches the first patent text, where a text similarity between an object patent text included in the first patent text list and the first patent text is greater than a first threshold;
wherein the first search request is generated by: displaying the obtained key text and the patent classification number in the client, wherein the key text comprises a first abstract text and a first keyword set, the first abstract text is used for representing the first patent text, and the object keywords contained in the first keyword set are keywords extracted from the first abstract text; generating the first search request by utilizing the key text and the patent classification number;
The patent classification number is determined by: acquiring a first adjustment instruction generated by an editing operation executed in the client; executing at least one of the following adjustment operations according to the first adjustment instruction: adjusting the first abstract text into a second abstract text, adjusting the first keyword set into a second keyword set, and adjusting the patent classification number into the adjusted patent classification number;
the acquisition unit includes: the processing module is used for carrying out segmentation processing on the first patent text according to the line feed character or the word number to obtain a text segment set corresponding to the first patent text; the first extraction module is used for extracting first text features of the text segment set through the text extraction model, and performing word sense analysis and text recombination on the first patent text according to the first text features so as to obtain a first abstract text; the second extraction module is used for extracting a first keyword set from the first abstract text; the third extraction module is used for extracting second text features of the text segment set through the text classification model and identifying the patent classification number of the first patent text according to the second text features;
The second extraction module is used for extracting the first keyword set from the first abstract text by the following steps: the first keyword set is determined based on the occurrence frequency and the occurrence position of the words in the first abstract text.
10. The apparatus of claim 9, wherein the apparatus further comprises:
and the searching unit is used for responding to the first searching request before returning the first patent text list matched with the first patent text to the client, and searching the first patent text list matched with the first patent text from a database through a text searching model, wherein the text searching model is a model for searching texts according to text similarity, which is obtained after machine training is carried out on the disclosed patent text.
11. A storage medium comprising a stored program, wherein the program when run performs the method of any one of the preceding claims 1 to 3 or 4 to 5.
12. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method according to any of the claims 1-3 or 4-5 by means of the computer program.
CN201811069929.4A 2018-09-13 2018-09-13 Text retrieval method and device, storage medium and electronic device Active CN110895556B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811069929.4A CN110895556B (en) 2018-09-13 2018-09-13 Text retrieval method and device, storage medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811069929.4A CN110895556B (en) 2018-09-13 2018-09-13 Text retrieval method and device, storage medium and electronic device

Publications (2)

Publication Number Publication Date
CN110895556A CN110895556A (en) 2020-03-20
CN110895556B true CN110895556B (en) 2023-07-28

Family

ID=69785761

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811069929.4A Active CN110895556B (en) 2018-09-13 2018-09-13 Text retrieval method and device, storage medium and electronic device

Country Status (1)

Country Link
CN (1) CN110895556B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008071198A (en) * 2006-09-14 2008-03-27 Ricoh Co Ltd Document retrieval device, document retrieval method, document retrieval program and storage medium
CN107247780A (en) * 2017-06-12 2017-10-13 北京理工大学 A kind of patent document method for measuring similarity of knowledge based body

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8200695B2 (en) * 2006-04-13 2012-06-12 Lg Electronics Inc. Database for uploading, storing, and retrieving similar documents
CN101276340A (en) * 2007-03-29 2008-10-01 上海汉光知识产权数据科技有限公司 Patent data retrieval system
CN106156111B (en) * 2015-04-03 2021-10-19 北京中知智慧科技有限公司 Patent document retrieval method, device and system
US20170075877A1 (en) * 2015-09-16 2017-03-16 Marie-Therese LEPELTIER Methods and systems of handling patent claims
CN106372226B (en) * 2016-09-07 2020-08-25 知识产权出版社有限责任公司 Information retrieval device and method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008071198A (en) * 2006-09-14 2008-03-27 Ricoh Co Ltd Document retrieval device, document retrieval method, document retrieval program and storage medium
CN107247780A (en) * 2017-06-12 2017-10-13 北京理工大学 A kind of patent document method for measuring similarity of knowledge based body

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
网络环境下的专利信息检索;张晓云;图书馆工作与研究(01);第43-46页 *

Also Published As

Publication number Publication date
CN110895556A (en) 2020-03-20

Similar Documents

Publication Publication Date Title
CN107346336B (en) Information processing method and device based on artificial intelligence
US20200301954A1 (en) Reply information obtaining method and apparatus
CN108280155B (en) Short video-based problem retrieval feedback method, device and equipment
CN109513211B (en) Art resource file processing method and device and game resource display system
CN108228873A (en) Object recommendation, publication content delivery method, device, storage medium and equipment
CN105677931B (en) Information search method and device
US11055373B2 (en) Method and apparatus for generating information
CN104765791A (en) Information inputting method and device
CN106844685B (en) Method, device and server for identifying website
CN105069077A (en) Search method and device
CN103440243A (en) Teaching resource recommendation method and device thereof
CN109948154B (en) Character acquisition and relationship recommendation system and method based on mailbox names
CN111460185A (en) Book searching method, device and system
CN110968664A (en) Document retrieval method, device, equipment and medium
CN111126058A (en) Text information automatic extraction method and device, readable storage medium and electronic equipment
CN110162769B (en) Text theme output method and device, storage medium and electronic device
CN105550179A (en) Webpage collection method and browser plug-in
CN110929057A (en) Image processing method, device and system, storage medium and electronic device
CN110895556B (en) Text retrieval method and device, storage medium and electronic device
KR101333064B1 (en) System for extracting multimedia contents descriptor and method therefor
CN110895555B (en) Data retrieval method and device, storage medium and electronic device
CN109145124B (en) Information storage method and device, storage medium and electronic device
CN105608183A (en) Method and apparatus for providing answer of aggregation type
CN106570116B (en) Search result aggregation method and device based on artificial intelligence
CN110941711A (en) Electronic search report acquisition method and apparatus, storage medium, and electronic apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
TA01 Transfer of patent application right

Effective date of registration: 20210107

Address after: 17c, 14 / F, unit 3, building 3, No.48, Zhichun Road, Haidian District, Beijing 100098

Applicant after: Beijing Blue lantern fish Intelligent Technology Co.,Ltd.

Address before: 1411 Junyue Pavilion, 9 Yannan Road, Fuqiang community, Huaqiangbei street, Futian District, Shenzhen, Guangdong 518031

Applicant before: Shenzhen Blue Lantern Fish Intelligent Technology Co.,Ltd.

TA01 Transfer of patent application right
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant