CN110895556A - Text retrieval method and device, storage medium and electronic device - Google Patents

Text retrieval method and device, storage medium and electronic device Download PDF

Info

Publication number
CN110895556A
CN110895556A CN201811069929.4A CN201811069929A CN110895556A CN 110895556 A CN110895556 A CN 110895556A CN 201811069929 A CN201811069929 A CN 201811069929A CN 110895556 A CN110895556 A CN 110895556A
Authority
CN
China
Prior art keywords
text
key
classification number
model
retrieval
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811069929.4A
Other languages
Chinese (zh)
Other versions
CN110895556B (en
Inventor
詹焯扬
张晓泉
程昊
蔡健
袁子斌
李文文
邬龙
江涛
乔宝琛
杨妤卿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Blue Lantern Fish Intelligent Technology Co ltd
Original Assignee
Shenzhen Blue Lantern Fish Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Blue Lantern Fish Intelligent Technology Co ltd filed Critical Shenzhen Blue Lantern Fish Intelligent Technology Co ltd
Priority to CN201811069929.4A priority Critical patent/CN110895556B/en
Publication of CN110895556A publication Critical patent/CN110895556A/en
Application granted granted Critical
Publication of CN110895556B publication Critical patent/CN110895556B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a text retrieval method and device, a storage medium and an electronic device. Wherein, the method comprises the following steps: acquiring a first patent text uploaded by a client; acquiring a key text extracted from the first patent text and a patent classification number matched with the first patent text; sending a first retrieval request generated by using the key text and the patent classification number to a server, wherein the first retrieval request is used for requesting retrieval of the first patent text; and acquiring a first patent text list which is returned by the server and is matched with the first patent text, wherein the text similarity between the object patent text contained in the first patent text list and the first patent text is greater than a first threshold value. The invention solves the technical problem of lower retrieval efficiency in the related technology.

Description

Text retrieval method and device, storage medium and electronic device
Technical Field
The invention relates to the field of computers, in particular to a text retrieval method and device, a storage medium and an electronic device.
Background
In order to predict the authorization prospect of the patent document to be applied, many applicants will use the published patent application document to search the patent document for new purpose.
However, in the process of performing the above search by using the patent text search platform, the user is often required to perform preprocessing on the patent text to be searched, such as manually extracting keywords in the patent text in advance, writing a boolean search formula corresponding to the patent text in advance, and the like, and then use the processed content to realize the search. That is, for the retrieval of patent text, the operation complexity of the method provided by the related art is high, resulting in a problem of low retrieval efficiency.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the invention provides a text retrieval method and device, a storage medium and an electronic device, which are used for at least solving the technical problem of low retrieval efficiency in the related technology.
According to an aspect of an embodiment of the present invention, there is provided a text retrieval method, including: acquiring a first patent text uploaded by a client; acquiring a key text extracted from the first patent text and a patent classification number matched with the first patent text, wherein the key text is extracted from the first patent text through a text extraction model, and the text extraction model is a model for extracting the key text in the patent text obtained after machine training is carried out on a published patent text; the patent classification number is obtained by identifying a text classification model, and the text classification model is a model which is obtained by using a published patent text to perform machine training and is used for identifying the belonging classification of the patent text; sending a first retrieval request generated by the key text and the patent classification number to a server, wherein the first retrieval request is used for requesting to retrieve the first patent text; and acquiring a first patent text list which is returned by the server and is matched with the first patent text, wherein the text similarity between the object patent text contained in the first patent text list and the first patent text is greater than a first threshold value.
According to another aspect of the embodiments of the present invention, there is also provided a text retrieval method, including: receiving a first patent text sent by a client; acquiring a key text extracted from the first patent text and a patent classification number matched with the first patent text, wherein the key text is extracted from the first patent text through a text extraction model, and the text extraction model is a model for extracting the key text in the patent text obtained after machine training is carried out on a published patent text; the patent classification number is obtained by identifying a text classification model, and the text classification model is a model which is obtained by using a published patent text to perform machine training and is used for identifying the belonging classification of the patent text; sending the key text and the patent classification number to the client; receiving a first retrieval request generated by the client by using the key text and the patent classification number, wherein the first retrieval request is used for requesting to retrieve the first patent text; and returning a first patent text list matched with the first patent text to the client, wherein the text similarity between the object patent text contained in the first patent text list and the first patent text is greater than a first threshold value.
According to another aspect of the embodiments of the present invention, there is also provided a text retrieval apparatus, including: the first acquiring unit is used for acquiring a first patent text uploaded by a client; a second obtaining unit, configured to obtain a key text extracted from the first patent text and a patent classification number matched with the first patent text, where the key text is extracted from the first patent text by using a text extraction model, and the text extraction model is a model obtained by performing machine training using a published patent text and used for extracting the key text in the patent text; the patent classification number is obtained by identifying a text classification model, and the text classification model is a model which is obtained by using a published patent text to perform machine training and is used for identifying the belonging classification of the patent text; a first sending unit, configured to send a first search request generated by using the key text and the patent classification number to a server, where the first search request is used to request to search the first patent text; a third obtaining unit, configured to obtain a first patent text list that is returned by the server and matches the first patent text, where a text similarity between an object patent text included in the first patent text list and the first patent text is greater than a first threshold.
According to another aspect of the embodiments of the present invention, there is also provided a text retrieval apparatus, including: the first receiving unit is used for receiving a first patent text sent by a client; an obtaining unit, configured to obtain a key text extracted from the first patent text and a patent classification number matched with the first patent text, where the key text is extracted from the first patent text by using a text extraction model, and the text extraction model is a model obtained by performing machine training using a published patent text and used for extracting the key text in the patent text; the patent classification number is obtained by identifying a text classification model, and the text classification model is a model which is obtained by using a published patent text to perform machine training and is used for identifying the belonging classification of the patent text; a sending unit, configured to send the key text and the patent classification number to the client; a second receiving unit, configured to receive a first search request generated by the client using the key text and the patent classification number, where the first search request is used to request to search the first patent text; and a returning unit, configured to return a first patent text list matched with the first patent text to the client, where a text similarity between an object patent text included in the first patent text list and the first patent text is greater than a first threshold.
According to still another aspect of the embodiments of the present invention, there is also provided a storage medium having a computer program stored therein, wherein the computer program is configured to execute the above text retrieval method when running.
According to another aspect of the embodiments of the present invention, there is also provided an electronic apparatus, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the text retrieval method through the computer program.
In the embodiment of the invention, after the first patent text uploaded by the client is obtained, the text extraction model is used for extracting the key text from the first patent text, and the text classification model is used for identifying the patent classification number in the first patent text. And generating a first retrieval request according to the key text and the patent classification number, and sending the first retrieval request to the server so that the server retrieves the first patent text list according to the patent retrieval request to obtain a retrieval result. In the process, the key text and the patent classification number are accurately obtained through the model, so that the first patent text can be accurately and efficiently retrieved, and the technical problem of low retrieval efficiency in the related technology is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow diagram illustrating an alternative text retrieval method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an alternative text retrieval method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an alternative text retrieval method according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of yet another alternative text retrieval method according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of yet another alternative text retrieval method according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of yet another alternative text retrieval method according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of yet another alternative text retrieval method according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of yet another alternative text retrieval method according to an embodiment of the present invention;
FIG. 9 is a flow diagram illustrating an alternative text retrieval method according to an embodiment of the present invention;
FIG. 10 is a schematic structural diagram of an alternative text retrieval apparatus according to an embodiment of the present invention;
FIG. 11 is a schematic structural diagram of an alternative text retrieval device according to an embodiment of the present invention;
fig. 12 is a schematic structural diagram of an alternative electronic device according to an embodiment of the invention.
Fig. 13 is a schematic structural diagram of another alternative electronic device according to an embodiment of the invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
According to an aspect of the embodiments of the present invention, there is provided a text retrieval method, optionally, as an optional implementation manner, as shown in fig. 1, the text retrieval method includes:
s102, acquiring a first patent text uploaded by a client;
s104, acquiring a key text extracted from the first patent text and a patent classification number matched with the first patent text;
s106, sending a first retrieval request generated by the key text and the patent classification number to a server, wherein the first retrieval request is used for requesting retrieval of the first patent text;
s108, a first patent text list matched with the first patent text returned by the server is obtained, wherein the text similarity between the object patent text contained in the first patent text list and the first patent text is greater than a first threshold value.
Alternatively, the text retrieval method can be applied to the process of retrieving similar patent texts, but is not limited to the process. In the related art, in the process of retrieving similar patent texts, it is usually necessary to manually extract the key texts and manually write the corresponding retrieval formula of the patent texts, so that the efficiency of retrieving patents is low. In the scheme, after the first patent text is obtained, the machine-trained text extraction model is used for extracting the key text in the first patent text, and the machine-trained text classification model is used for identifying the patent classification number of the first patent text, so that a first retrieval request can be generated according to the key text and the patent classification number, and the first patent text is retrieved, thereby simplifying the retrieval step of retrieving the first patent text, and improving the efficiency of retrieving the first patent text.
Alternatively, the file format of the above-mentioned first patent text may be. txt or. doc or. docx or. wps, etc.
Optionally, as an optional example, the obtaining of the first patent text uploaded by the client includes at least one of:
(1) acquiring a selection instruction, wherein the selection instruction is used for indicating selection of a first patent text stored in a target path; responding to the selection instruction, and uploading a first patent text;
optionally, the selection instruction may be, but is not limited to, a single-click instruction or a long-press instruction. For example, the selection instruction is a single click instruction, as shown in fig. 2. After the account is logged in, a prompt for adding a first patent text is displayed on a display interface of the client, and after a button is pressed, a plurality of patent texts are displayed, wherein the patent texts are respectively a text W-1, a text W-2, a text W-3 and a text W-4. At this time, the text W-1 is selected by a single-click instruction, and the text W-1 is regarded as the first patent text.
(2) Acquiring a dragging instruction, wherein the dragging instruction is used for indicating that a first patent text is dragged to a target area of an interface displayed by a client; and responding to the dragging instruction, and uploading the first patent text.
Optionally, as an optional manner, as shown in fig. 3, a prompt for adding the first patent text is prompted on a display interface of the client, and a target area is displayed, where the target area is indicated by shading. Meanwhile, text W-1, text W-2, text W-3 and text W-4 exist outside the target area. If the text W-1 is used as the first patent text, the text W-1 is dragged into the target area, and then the client can obtain the uploaded first patent text.
It should be noted that, as an alternative, in the process of dragging the text W1 according to the dragging instruction, after dragging W-1 to the target area, the appearance of the text W-1 may be changed or a prompt sound is emitted, but is not limited to, to prompt the user that after the user terminates the dragging instruction, the current text W-1 will be uploaded to the client as the first patent text.
By the method, the first patent text can be flexibly selected by acquiring the selection instruction or the dragging instruction, and the flexibility of acquiring the first patent text is improved. The efficiency of searching the first patent text is further improved.
Optionally, as an optional implementation manner, before sending the first retrieval request generated by using the key text and the patent classification number to the server, the method further includes:
s1, displaying the obtained key texts and patent classification numbers in the client, wherein the key texts comprise first abstract texts and first keyword sets, the first abstract texts are used for representing the first patent texts, and object keywords contained in the first keyword sets are keywords extracted from the first abstract texts;
and S2, generating a first retrieval request by using the key text and the patent classification number.
Optionally, after the first patent text is acquired and before the key text is extracted by using the text extraction model, the text extraction model and the text classification model may be trained, but not limited to, through the published patent text. For example, after the text of the published patent text is obtained, the key text and the patent classification number in the published patent text are labeled. Inputting the disclosed patent text into a text extraction model, extracting characters, words, sentences and paragraphs in the patent text by the text extraction model to generate a key text, inputting the disclosed patent text into a text classification model, and generating a patent classification number by the text classification model. And adjusting parameters of the text extraction model according to the matching degree of the generated key text and the labeled key text, and adjusting the text classification model according to the matching degree of the generated patent classification number and the labeled patent classification number until the matching degree of the key text generated by the text extraction model and the labeled key text is greater than a preset threshold value, and the accuracy of the patent classification number generated by the text classification model is greater than another preset threshold value, so that the text extraction model and the text classification model are trained well.
Optionally, after the key text and the patent classification number are obtained by the text classification model, the key text and the patent classification number may be, but are not limited to being, displayed.
Taking the obtained first patent texts as W-1, W-2, W-3 and W-4 as examples, after obtaining the first patent text, inputting the W-1, the W-2, the W-3 and the W-4 into a text extraction model and a text classification model. And acquiring the key texts and the patent classification numbers output by the two models. For example, as shown in FIG. 4, after the key texts and patent classification numbers of W-1, W-2, W-3 and W-4 are obtained, the key texts and patent classification numbers are displayed. In fig. 4, "mechanical", "computer", etc. are the first set of keywords and "one … …", etc. are the first abstract text.
It should be noted that the patent classification number and the key text in fig. 4 are only used for explaining the display process, and the specific meaning of the words does not constitute a limitation of the present application.
Optionally, after displaying the key text and the patent classification number, before generating the first search request by using the key text and the patent classification number, the method further includes:
s1, acquiring a first adjusting instruction generated by the editing operation executed in the client;
s2, according to the first adjusting instruction, at least one of the following adjusting operations is executed: and adjusting the first abstract text into a second abstract text, adjusting the first keyword set into a second keyword set, and adjusting the patent classification number into an adjusted patent classification number.
Optionally, the first adjustment operation may be, but not limited to, adding, deleting, or changing the key text and/or the patent classification number.
For example, after the key text and the patent classification number are acquired, a modification operation is performed on the key text and the patent classification number. The description will be made with reference to fig. 5 and 6. As shown in FIG. 5, a first keyword set, a patent classification number and a first abstract text of the text W-1 are displayed in FIG. 5. The content of the first abstract text and the content of the patent classification number are changed to the content enclosed by the dashed line box in fig. 6. And generating a first sword request according to the changed key text and the patent classification number so that the server returns a first patent text list according to the first retrieval request.
By the method, the retrieval accuracy of retrieving the patent is improved and the retrieval efficiency is further improved by adjusting the key text and the patent classification number.
Optionally, after obtaining the first patent text list matched with the first patent text returned by the server, the method further includes:
s1, acquiring a second adjusting instruction generated by the editing operation executed in the client;
s2, according to the second adjusting instruction, at least one of the following adjusting operations is executed: adjusting the first abstract text into a third abstract text, adjusting the first keyword set into a third keyword set, and adjusting the patent classification number into an adjusted patent classification number;
s3, obtaining an adjusting result obtained according to the second adjusting instruction;
s4, sending a second retrieval request generated by the adjustment result to the server, wherein the second retrieval request is used for requesting retrieval of the first patent text;
and S5, acquiring a second patent text list matched with the first patent text and returned by the server, wherein the text similarity between the object patent text contained in the second patent text list and the first patent text is greater than a second threshold, and the second threshold is greater than the first threshold.
Optionally, the second adjustment instruction may be, but is not limited to, adding or deleting or changing the key text and/or the patent classification number.
For example, taking the operation of adding the key text after acquiring the first patent text list returned by the server as an example, the description will be given with reference to fig. 7 and 8. As shown in fig. 7, after the text W-1 is searched, a first patent text list is obtained, and patent-1 and patent-2 are displayed in the first patent text list. As shown in fig. 8, after the addition operation is performed on the key text and the patent classification number, a second patent text list is obtained, where the second patent text list includes patent 3 and patent 4. Therefore, the accuracy of patent retrieval is improved, and the efficiency of patent retrieval is further improved.
Optionally, the obtaining of the key text extracted from the first patent text and the patent classification number matched with the first patent text includes:
s1, sending the first patent text to a server to perform text preprocessing on the first patent text;
and S2, acquiring the key text and the patent classification number returned by the server.
Optionally, the key text and the patent classification number returned by the acquisition server include:
s1, the server carries out segmentation processing on the first patent text to obtain a text segment set corresponding to the first patent text;
s2, the server extracts a first text feature of the text segment set through the text extraction model, and performs word meaning analysis and text recombination on the first patent text according to the first text feature to obtain a first abstract text;
s3, the server extracts a first keyword set from the first abstract text;
s4, the server extracts the second text feature of the text segment set through the text classification model, and identifies the patent classification number of the first patent text according to the second text feature.
Alternatively, the segmenting process of the first patent text may be, but is not limited to, dividing the first patent text into different paragraphs according to the line break, or dividing the first patent text into a plurality of paragraphs with the same number of words according to the number of words.
Alternatively, the text extraction model and the text classification model may be applied, but not limited to, in a server.
Taking the example of segmenting the first patent text according to the line break and acquiring the first abstract text according to the occurrence frequency of the words. After the client side obtains the first patent text, the first patent text is sent to the server, the first patent text is segmented by a text extraction model in the server according to the line break, first text features of the text in each segment are obtained, and a first abstract text is extracted from the first patent text according to the first text features. After the first abstract text is extracted, a first keyword set is determined according to the occurrence frequency and the occurrence positions of the words in the first abstract text. Meanwhile, the server also extracts a second text characteristic of the text segment set by using the text classification model and outputs a patent classification number according to the second text characteristic. After the first abstract text, the first keyword set and the patent classification number are obtained, the first abstract text, the first keyword set and the patent classification number are sent to a client, and after a first retrieval instruction sent by the client is received, a server retrieves the first abstract text, the first keyword set and the patent classification number according to the first retrieval instruction to obtain a first patent text list. And returning the first patent text list to the client. By the method, the first patent text list is obtained by retrieving according to the key text and the patent classification number after the retrieval instruction of the client is obtained, so that the retrieval efficiency of retrieving the patent text is improved.
It should be noted that the foregoing method embodiments are described as a series of acts or combinations for simplicity in explanation, but it should be understood by those skilled in the art that the present invention is not limited by the order of acts or acts described, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
According to another aspect of the embodiment of the invention, the invention further provides a text retrieval method. Optionally, as shown in fig. 9, the text retrieval method includes:
s902, receiving a first patent text sent by a client;
s904, acquiring the key text extracted from the first patent text and the patent classification number matched with the first patent text;
s906, sending the key text and the patent classification number to the client;
s908, receiving a first retrieval request generated by the client by using the key text and the patent classification number, wherein the first retrieval request is used for requesting to retrieve the first patent text;
s910, a first patent text list matched with the first patent text is returned to the client, wherein the text similarity between the object patent text contained in the first patent text list and the first patent text is greater than a first threshold value.
Alternatively, the text retrieval method can be applied to the process of retrieving similar patent texts, but is not limited to the process. In the related art, in the process of retrieving similar patent texts, it is usually necessary to manually extract the key texts and manually write the corresponding retrieval formula of the patent texts, so that the efficiency of retrieving patents is low. In the scheme, after the first patent text is obtained, the machine-trained text extraction model is used for extracting the key text in the first patent text, and the machine-trained text classification model is used for identifying the patent classification number of the first patent text, so that a first retrieval request can be generated according to the key text and the patent classification number, and the first patent text is retrieved, thereby simplifying the retrieval step of retrieving the first patent text, and improving the efficiency of retrieving the first patent text.
Optionally, the obtaining of the key text extracted from the first patent text, and the patent classification number matched with the first patent text includes:
s1, carrying out segmentation processing on the first patent text to obtain a text segment set corresponding to the first patent text;
s2, extracting first text features of the text segment set through a text extraction model, and performing word meaning analysis and text recombination on the first patent text according to the first text features to obtain a first abstract text;
s3, extracting a first keyword set from the first abstract text;
and S4, extracting second text features of the text segment set through the text classification model, and identifying the patent classification number of the first patent text according to the second text features.
Alternatively, the segmenting process of the first patent text may be, but is not limited to, dividing the first patent text into different paragraphs according to the line break, or dividing the first patent text into a plurality of paragraphs with the same number of words according to the number of words.
Taking the example of segmenting the first patent text according to the line break and acquiring the first abstract text according to the occurrence frequency of the words. After the client side obtains the first patent text, the first patent text is sent to the server, the first patent text is segmented by a text extraction model in the server according to the line break, first text features of the text in each segment are obtained, and a first abstract text is extracted from the first patent text according to the first text features. After the first abstract text is extracted, a first keyword set is determined according to the occurrence frequency and the occurrence positions of the words in the first abstract text. Meanwhile, the server also extracts a second text characteristic of the text segment set by using the text classification model and outputs a patent classification number according to the second text characteristic. After the first abstract text, the first keyword set and the patent classification number are obtained, the first abstract text, the first keyword set and the patent classification number are sent to a client, and after a first retrieval instruction sent by the client is received, a server retrieves the first abstract text, the first keyword set and the patent classification number according to the first retrieval instruction to obtain a first patent text list. And returning the first patent text list to the client. By the method, the first patent text list is obtained by retrieving according to the key text and the patent classification number after the retrieval instruction of the client is obtained, so that the retrieval efficiency of retrieving the patent text is improved.
Optionally, before returning the first patent text list matching with the first patent text to the client, the method further includes:
and S1, in response to the first retrieval request, retrieving a first patent text list matched with the first patent text from the database through a text retrieval model, wherein the text retrieval model is obtained after machine training is carried out by using the published patent text and is used for carrying out text retrieval according to text similarity.
Alternatively, the text retrieval model may be, but is not limited to being, obtained by training. And acquiring a patent sample, wherein the patent sample comprises a patent to be retrieved and a target patent. And inputting the patent sample into a text retrieval model for training, and adjusting parameters of the text retrieval model to finally obtain a mature text retrieval model. And searching the first patent text by using a mature text search model, wherein the similarity between the patents in the first patent text list and the first patent text is greater than a preset threshold value.
According to another aspect of the embodiment of the invention, a text retrieval device is also provided. Alternatively, as shown in fig. 10, the text retrieval device includes:
(1) a first obtaining unit 1002, configured to obtain a first patent text uploaded by a client;
(2) a second obtaining unit 1004, configured to obtain a key text extracted from the first patent text and a patent classification number matched with the first patent text, where the key text is a text extracted from the first patent text by a text extraction model, and the text extraction model is a model obtained after machine training is performed on a published patent text and used for extracting the key text in the patent text; the patent classification number is obtained by identifying a text classification model, and the text classification model is a model which is obtained by using a published patent text to perform machine training and is used for identifying the belonging classification of the patent text;
(3) a first sending unit 1006, configured to send a first retrieval request generated by using the key text and the patent classification number to the server, where the first retrieval request is used to request to retrieve the first patent text;
(4) a third obtaining unit 1008, configured to obtain a first patent text list that is returned by the server and matches the first patent text, where a text similarity between an object patent text included in the first patent text list and the first patent text is greater than a first threshold.
Alternatively, the text retrieval device can be applied to the process of retrieving similar patent texts, but is not limited to the process. In the related art, in the process of retrieving similar patent texts, it is usually necessary to manually extract the key texts and manually write the corresponding retrieval formula of the patent texts, so that the efficiency of retrieving patents is low. In the scheme, after the first patent text is obtained, the machine-trained text extraction model is used for extracting the key text in the first patent text, and the machine-trained text classification model is used for identifying the patent classification number of the first patent text, so that a first retrieval request can be generated according to the key text and the patent classification number, and the first patent text is retrieved, thereby simplifying the retrieval step of retrieving the first patent text, and improving the efficiency of retrieving the first patent text.
Alternatively, the file format of the above-mentioned first patent text may be. txt or. doc or. docx or. wps, etc.
Optionally, the first obtaining unit includes at least one of:
(1) the second acquisition module is used for acquiring a selection instruction, wherein the selection instruction is used for indicating to select the first patent text stored in the target path; responding to the selection instruction, and uploading a first patent text;
(2) the third acquisition module is used for acquiring a dragging instruction, wherein the dragging instruction is used for indicating that the first patent text is dragged to a target area of an interface displayed by the client; and responding to the dragging instruction, and uploading the first patent text.
Optionally, the apparatus further comprises:
(1) the display unit is used for displaying the acquired key texts and patent classification numbers in the client before sending a first retrieval request generated by the key texts and the patent classification numbers to the server, wherein the key texts comprise first abstract texts and first keyword sets, the first abstract texts are used for representing the first patent texts, and object keywords contained in the first keyword sets are keywords extracted from the first abstract texts;
(2) and the generating unit is used for generating a first retrieval request by using the key text and the patent classification number.
Optionally, after the first patent text is acquired and before the key text is extracted by using the text extraction model, the text extraction model and the text classification model may be trained, but not limited to, through the published patent text. For example, after the text of the published patent text is obtained, the key text and the patent classification number in the published patent text are labeled. Inputting the disclosed patent text into a text extraction model, extracting characters, words, sentences and paragraphs in the patent text by the text extraction model to generate a key text, inputting the disclosed patent text into a text classification model, and generating a patent classification number by the text classification model. And adjusting parameters of the text extraction model according to the matching degree of the generated key text and the labeled key text, and adjusting the text classification model according to the matching degree of the generated patent classification number and the labeled patent classification number until the matching degree of the key text generated by the text extraction model and the labeled key text is greater than a preset threshold value, and the accuracy of the patent classification number generated by the text classification model is greater than another preset threshold value, so that the text extraction model and the text classification model are trained well.
Optionally, after the key text and the patent classification number are obtained by the text classification model, the key text and the patent classification number may be, but are not limited to being, displayed.
Taking the obtained first patent texts as W-1, W-2, W-3 and W-4 as examples, after obtaining the first patent text, inputting the W-1, the W-2, the W-3 and the W-4 into a text extraction model and a text classification model. And acquiring the key texts and the patent classification numbers output by the two models. For example, as shown in FIG. 4, after the key texts and patent classification numbers of W-1, W-2, W-3 and W-4 are obtained, the key texts and patent classification numbers are displayed. In fig. 4, "mechanical", "computer", etc. are the first set of keywords and "one … …", etc. are the first abstract text.
It should be noted that the patent classification number and the key text in fig. 4 are only used for explaining the display process, and the specific meaning of the words does not constitute a limitation of the present application.
Optionally, the apparatus further comprises:
(1) a fourth obtaining unit, configured to obtain a first adjustment instruction generated by an editing operation performed in the client before generating the first retrieval request using the key text and the patent classification number;
(2) a first adjusting unit, configured to perform at least one of the following adjusting operations according to the first adjusting instruction: and adjusting the first abstract text into a second abstract text, adjusting the first keyword set into a second keyword set, and adjusting the patent classification number into an adjusted patent classification number.
Optionally, the first adjustment operation may be, but not limited to, adding, deleting, or changing the key text and/or the patent classification number.
For example, after the key text and the patent classification number are acquired, a modification operation is performed on the key text and the patent classification number. The description will be made with reference to fig. 5 and 6. As shown in FIG. 5, a first keyword set, a patent classification number and a first abstract text of the text W-1 are displayed in FIG. 5. The content of the first abstract text and the content of the patent classification number are changed to the content enclosed by the dashed line box in fig. 6. And generating a first sword request according to the changed key text and the patent classification number so that the server returns a first patent text list according to the first retrieval request.
By the method, the retrieval accuracy of retrieving the patent is improved and the retrieval efficiency is further improved by adjusting the key text and the patent classification number.
Optionally, the apparatus further comprises:
(1) a fifth obtaining unit, configured to obtain a second adjustment instruction generated by an editing operation performed in the client after obtaining the first patent text list matched with the first patent text returned by the server;
(2) a second adjusting unit, configured to perform at least one of the following adjusting operations according to a second adjusting instruction: adjusting the first abstract text into a third abstract text, adjusting the first keyword set into a third keyword set, and adjusting the patent classification number into an adjusted patent classification number;
(3) a sixth obtaining unit, configured to obtain an adjustment result according to the second adjustment instruction;
(4) a second sending unit, configured to send a second retrieval request generated by using the adjustment result to the server, where the second retrieval request is used to request to retrieve the first patent text;
(5) and the seventh acquiring unit is used for acquiring a second patent text list which is returned by the server and is matched with the first patent text, wherein the text similarity between the object patent text contained in the second patent text list and the first patent text is greater than a second threshold value, and the second threshold value is greater than the first threshold value.
Optionally, the second adjustment instruction may be, but is not limited to, adding or deleting or changing the key text and/or the patent classification number.
For example, taking the operation of adding the key text after acquiring the first patent text list returned by the server as an example, the description will be given with reference to fig. 7 and 8. As shown in fig. 7, after the text W-1 is searched, a first patent text list is obtained, and patent-1 and patent-2 are displayed in the first patent text list. As shown in fig. 8, after the addition operation is performed on the key text and the patent classification number, a second patent text list is obtained, where the second patent text list includes patent 3 and patent 4. Therefore, the accuracy of patent retrieval is improved, and the efficiency of patent retrieval is further improved.
Optionally, the second obtaining unit includes:
(1) the sending module is used for sending the first patent text to the server so as to perform text preprocessing on the first patent text;
(2) the first acquisition module is used for acquiring the key text and the patent classification number returned by the server.
Optionally, the key text and the patent classification number returned by the acquisition server include: the server carries out segmentation processing on the first patent text to obtain a text segment set corresponding to the first patent text; the server extracts first text characteristics of the text segment set through a text extraction model, and performs word meaning analysis and text recombination on the first patent text according to the first text characteristics to obtain a first abstract text; the server extracts a first keyword set from the first abstract text; and the server extracts a second text characteristic of the text segment set through the text classification model and identifies the patent classification number of the first patent text according to the second text characteristic.
Alternatively, the segmenting process of the first patent text may be, but is not limited to, dividing the first patent text into different paragraphs according to the line break, or dividing the first patent text into a plurality of paragraphs with the same number of words according to the number of words.
Alternatively, the text extraction model and the text classification model may be applied, but not limited to, in a server.
Taking the example of segmenting the first patent text according to the line break and acquiring the first abstract text according to the occurrence frequency of the words. After the client side obtains the first patent text, the first patent text is sent to the server, the first patent text is segmented by a text extraction model in the server according to the line break, first text features of the text in each segment are obtained, and a first abstract text is extracted from the first patent text according to the first text features. After the first abstract text is extracted, a first keyword set is determined according to the occurrence frequency and the occurrence positions of the words in the first abstract text. Meanwhile, the server also extracts a second text characteristic of the text segment set by using the text classification model and outputs a patent classification number according to the second text characteristic. After the first abstract text, the first keyword set and the patent classification number are obtained, the first abstract text, the first keyword set and the patent classification number are sent to a client, and after a first retrieval instruction sent by the client is received, a server retrieves the first abstract text, the first keyword set and the patent classification number according to the first retrieval instruction to obtain a first patent text list. And returning the first patent text list to the client. By the method, the first patent text list is obtained by retrieving according to the key text and the patent classification number after the retrieval instruction of the client is obtained, so that the retrieval efficiency of retrieving the patent text is improved.
According to another aspect of the embodiments of the present invention, there is also provided a text retrieval apparatus, optionally, as shown in fig. 11, the text retrieval apparatus includes:
(1) a first receiving unit 1102, configured to receive a first patent text sent by a client;
(2) an obtaining unit 1104, configured to obtain a key text extracted from a first patent text and a patent classification number matched with the first patent text, where the key text is a text extracted from the first patent text by a text extraction model, and the text extraction model is a model obtained after machine training is performed using published patent texts and is used for extracting the key text in the patent text; the patent classification number is obtained by identifying a text classification model, and the text classification model is a model which is obtained by using a published patent text to perform machine training and is used for identifying the belonging classification of the patent text;
(3) a sending unit 1106, configured to send the key text and the patent classification number to the client;
(4) a second receiving unit 1108, configured to receive a first retrieval request generated by the client using the key text and the patent classification number, where the first retrieval request is used to request to retrieve the first patent text;
(5) a returning unit 1110, configured to return a first patent text list matched with the first patent text to the client, where a text similarity between an object patent text included in the first patent text list and the first patent text is greater than a first threshold.
Alternatively, the text retrieval method can be applied to the process of retrieving similar patent texts, but is not limited to the process. In the related art, in the process of retrieving similar patent texts, it is usually necessary to manually extract the key texts and manually write the corresponding retrieval formula of the patent texts, so that the efficiency of retrieving patents is low. In the scheme, after the first patent text is obtained, the machine-trained text extraction model is used for extracting the key text in the first patent text, and the machine-trained text classification model is used for identifying the patent classification number of the first patent text, so that a first retrieval request can be generated according to the key text and the patent classification number, and the first patent text is retrieved, thereby simplifying the retrieval step of retrieving the first patent text, and improving the efficiency of retrieving the first patent text.
Optionally, the obtaining unit includes:
(1) the processing module is used for carrying out segmentation processing on the first patent text to obtain a text segment set corresponding to the first patent text;
(2) the first extraction module is used for extracting first text characteristics of the text segment set through the text extraction model, and performing word meaning analysis and text recombination on the first patent text according to the first text characteristics to obtain a first abstract text;
(3) the second extraction module is used for extracting a first keyword set from the first abstract text;
(4) and the third extraction module is used for extracting second text characteristics of the text segment set through the text classification model and identifying the patent classification number of the first patent text according to the second text characteristics.
Alternatively, the segmenting process of the first patent text may be, but is not limited to, dividing the first patent text into different paragraphs according to the line break, or dividing the first patent text into a plurality of paragraphs with the same number of words according to the number of words.
Taking the example of segmenting the first patent text according to the line break and acquiring the first abstract text according to the occurrence frequency of the words. After the client side obtains the first patent text, the first patent text is sent to the server, the first patent text is segmented by a text extraction model in the server according to the line break, first text features of the text in each segment are obtained, and a first abstract text is extracted from the first patent text according to the first text features. After the first abstract text is extracted, a first keyword set is determined according to the occurrence frequency and the occurrence positions of the words in the first abstract text. Meanwhile, the server also extracts a second text characteristic of the text segment set by using the text classification model and outputs a patent classification number according to the second text characteristic. After the first abstract text, the first keyword set and the patent classification number are obtained, the first abstract text, the first keyword set and the patent classification number are sent to a client, and after a first retrieval instruction sent by the client is received, a server retrieves the first abstract text, the first keyword set and the patent classification number according to the first retrieval instruction to obtain a first patent text list. And returning the first patent text list to the client. By the method, the first patent text list is obtained by retrieving according to the key text and the patent classification number after the retrieval instruction of the client is obtained, so that the retrieval efficiency of retrieving the patent text is improved.
Optionally, the apparatus further comprises:
(1) and the retrieval unit is used for responding to the first retrieval request before returning the first patent text list matched with the first patent text to the client, and retrieving the first patent text list matched with the first patent text from the database through a text retrieval model, wherein the text retrieval model is a model which is obtained after machine training is carried out by using published patent texts and is used for text retrieval according to text similarity.
Alternatively, the text retrieval model may be, but is not limited to being, obtained by training. And acquiring a patent sample, wherein the patent sample comprises a patent to be retrieved and a target patent. And inputting the patent sample into a text retrieval model for training, and adjusting parameters of the text retrieval model to finally obtain a mature text retrieval model. And searching the first patent text by using a mature text search model, wherein the similarity between the patents in the first patent text list and the first patent text is greater than a preset threshold value.
According to yet another aspect of the embodiments of the present invention, there is also provided an electronic device for implementing the text retrieval method, as shown in fig. 12, the electronic device includes a memory and a processor, the memory stores a computer program, and the processor is configured to execute the steps in any one of the method embodiments through the computer program.
Optionally, in this embodiment, the electronic apparatus may be located in at least one network device of a plurality of network devices of a computer network.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
s1, acquiring a first patent text uploaded by a client;
s2, acquiring a key text extracted from the first patent text and a patent classification number matched with the first patent text, wherein the key text is extracted from the first patent text through a text extraction model, and the text extraction model is a model for extracting the key text in the patent text, which is obtained after machine training is carried out on the published patent text; the patent classification number is obtained by identifying a text classification model, and the text classification model is a model which is obtained by using a published patent text to perform machine training and is used for identifying the belonging classification of the patent text;
s3, sending a first retrieval request generated by the key text and the patent classification number to the server, wherein the first retrieval request is used for requesting retrieval of the first patent text;
s4, a first patent text list matched with the first patent text returned by the server is obtained, wherein the text similarity between the object patent text contained in the first patent text list and the first patent text is greater than a first threshold value.
Alternatively, it can be understood by those skilled in the art that the structure shown in fig. 12 is only an illustration, and the electronic device may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 12 is a diagram illustrating a structure of the electronic device. For example, the electronic device may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 12, or have a different configuration than shown in FIG. 12.
The memory 1202 may be used to store software programs and modules, such as program instructions/modules corresponding to the text retrieval method and apparatus in the embodiments of the present invention, and the processor 1204 executes various functional applications and data processing by running the software programs and modules stored in the memory 1202, that is, implementing the text retrieval method described above. The memory 1202 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 1202 can further include memory located remotely from the processor 1204, which can be connected to a terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 1202 may be, but not limited to, specifically configured to store information such as a first patent text, a key text, a patent classification number, and the like. As an example, as shown in fig. 12, the memory 1202 may include, but is not limited to, a first obtaining unit 1002, a second obtaining unit 1004, a first sending unit 1006, and a third obtaining unit 1008 in the text retrieval device. In addition, other module units in the text retrieval device may also be included, but are not limited to these, and are not described in detail in this example.
Optionally, the transmitting device 1206 is configured to receive or transmit data via a network. Examples of the network may include a wired network and a wireless network. In one example, the transmitting device 1206 includes a Network adapter (NIC) that can be connected to a router via a Network cable to communicate with the internet or a local area Network. In one example, the transmitting device 1206 is a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
In addition, the electronic device further includes: a connection bus 1208 for connecting the various module components in the electronic device.
According to a further aspect of the embodiments of the present invention, there is also provided an electronic apparatus, as shown in fig. 13, including a memory in which a computer program is stored and a processor configured to execute the steps in any one of the above method embodiments by the computer program.
Optionally, in this embodiment, the electronic apparatus may be located in at least one network device of a plurality of network devices of a computer network.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
s1, receiving a first patent text sent by a client;
s2, acquiring a key text extracted from the first patent text and a patent classification number matched with the first patent text, wherein the key text is extracted from the first patent text through a text extraction model, and the text extraction model is a model for extracting the key text in the patent text, which is obtained after machine training is carried out on the published patent text; the patent classification number is obtained by identifying a text classification model, and the text classification model is a model which is obtained by using a published patent text to perform machine training and is used for identifying the belonging classification of the patent text;
s3, sending the key text and the patent classification number to the client;
s4, receiving a first retrieval request generated by the client by using the key text and the patent classification number, wherein the first retrieval request is used for requesting to retrieve the first patent text;
and S5, returning a first patent text list matched with the first patent text to the client, wherein the text similarity between the object patent text contained in the first patent text list and the first patent text is greater than a first threshold value.
Alternatively, it can be understood by those skilled in the art that the structure shown in fig. 13 is only an illustration, and the electronic device may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, and a Mobile Internet Device (MID), a PAD, and the like. Fig. 13 is a diagram illustrating a structure of the electronic device. For example, the electronic device may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 13, or have a different configuration than shown in FIG. 13.
The memory 1302 may be used to store software programs and modules, such as program instructions/modules corresponding to the text retrieval method and apparatus in the embodiments of the present invention, and the processor 1304 executes various functional applications and data processing by running the software programs and modules stored in the memory 1302, that is, implementing the text retrieval method described above. The memory 1302 may include high speed random access memory and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 1302 may further include memory located remotely from the processor 1304, which may be connected to the terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 1302 may be specifically, but not limited to, used for storing information such as a first patent text, a key text, a patent classification number, and the like. As an example, as shown in fig. 13, the memory 1302 may include, but is not limited to, a first receiving unit 1102, an obtaining unit 1104, a sending unit 1106, a second receiving unit 1108, and a returning unit 1110 in the text retrieval device. In addition, other module units in the text retrieval device may also be included, but are not limited to these, and are not described in detail in this example.
Optionally, the transmitting device 1306 is used for receiving or sending data via a network. Examples of the network may include a wired network and a wireless network. In one example, the transmission device 1306 includes a Network adapter (NIC) that can be connected to a router via a Network cable and other Network devices to communicate with the internet or a local area Network. In one example, the transmitting device 1306 is a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
In addition, the electronic device further includes: a connection bus 1308 for connecting the respective module components in the electronic apparatus.
According to a further aspect of embodiments of the present invention, there is also provided a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above-mentioned method embodiments when executed.
Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:
s1, acquiring a first patent text uploaded by a client;
s2, acquiring a key text extracted from the first patent text and a patent classification number matched with the first patent text, wherein the key text is extracted from the first patent text through a text extraction model, and the text extraction model is a model for extracting the key text in the patent text, which is obtained after machine training is carried out on the published patent text; the patent classification number is obtained by identifying a text classification model, and the text classification model is a model which is obtained by using a published patent text to perform machine training and is used for identifying the belonging classification of the patent text;
s3, sending a first retrieval request generated by the key text and the patent classification number to the server, wherein the first retrieval request is used for requesting retrieval of the first patent text;
s4, a first patent text list matched with the first patent text returned by the server is obtained, wherein the text similarity between the object patent text contained in the first patent text list and the first patent text is greater than a first threshold value.
Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:
s1, receiving a first patent text sent by a client;
s2, acquiring a key text extracted from the first patent text and a patent classification number matched with the first patent text, wherein the key text is extracted from the first patent text through a text extraction model, and the text extraction model is a model for extracting the key text in the patent text, which is obtained after machine training is carried out on the published patent text; the patent classification number is obtained by identifying a text classification model, and the text classification model is a model which is obtained by using a published patent text to perform machine training and is used for identifying the belonging classification of the patent text;
s3, sending the key text and the patent classification number to the client;
s4, receiving a first retrieval request generated by the client by using the key text and the patent classification number, wherein the first retrieval request is used for requesting to retrieve the first patent text;
and S5, returning a first patent text list matched with the first patent text to the client, wherein the text similarity between the object patent text contained in the first patent text list and the first patent text is greater than a first threshold value.
Alternatively, in this embodiment, a person skilled in the art may understand that all or part of the steps in the methods of the foregoing embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing one or more computer devices (which may be personal computers, servers, network devices, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (22)

1. A text retrieval method, comprising:
acquiring a first patent text uploaded by a client;
acquiring a key text extracted from the first patent text and a patent classification number matched with the first patent text, wherein the key text is extracted from the first patent text through a text extraction model, and the text extraction model is a model for extracting the key text in the patent text, which is obtained after machine training is performed on a published patent text; the patent classification number is obtained by identifying a text classification model, and the text classification model is a model which is obtained by using a published patent text to perform machine training and is used for identifying the belonging classification of the patent text;
sending a first retrieval request generated by the key text and the patent classification number to a server, wherein the first retrieval request is used for requesting retrieval of the first patent text;
and acquiring a first patent text list which is returned by the server and matched with the first patent text, wherein the text similarity between the object patent text contained in the first patent text list and the first patent text is greater than a first threshold value.
2. The method of claim 1, further comprising, prior to said sending to a server a first search request generated using said key text and said patent classification number:
displaying the obtained key texts and the patent classification numbers in the client, wherein the key texts comprise first abstract texts and first keyword sets, the first abstract texts are used for representing the first patent texts, and object keywords contained in the first keyword sets are keywords extracted from the first abstract texts;
and generating the first retrieval request by using the key text and the patent classification number.
3. The method of claim 2, further comprising, prior to said generating the first search request using the key text and the patent classification number:
acquiring a first adjusting instruction generated by editing operation executed in the client;
according to the first adjusting instruction, at least one of the following adjusting operations is executed: and adjusting the first abstract text into a second abstract text, adjusting the first keyword set into a second keyword set, and adjusting the patent classification number into the adjusted patent classification number.
4. The method according to claim 2, wherein after the obtaining of the first patent text list matched with the first patent text returned by the server, further comprising:
acquiring a second adjusting instruction generated by editing operation executed in the client;
according to the second adjusting instruction, at least one of the following adjusting operations is executed: adjusting the first abstract text into a third abstract text, adjusting the first keyword set into a third keyword set, and adjusting the patent classification number into the adjusted patent classification number;
obtaining an adjustment result obtained according to the second adjustment instruction;
sending a second retrieval request generated by using the adjustment result to the server, wherein the second retrieval request is used for requesting retrieval of the first patent text;
and acquiring a second patent text list which is returned by the server and matched with the first patent text, wherein the text similarity between the object patent text contained in the second patent text list and the first patent text is greater than a second threshold, and the second threshold is greater than the first threshold.
5. The method according to claim 2, wherein the obtaining the key text extracted from the first patent text and the patent classification number matched with the first patent text comprises:
sending the first patent text to the server to perform text preprocessing on the first patent text;
and acquiring the key text and the patent classification number returned by the server.
6. The method of claim 5, wherein the obtaining the key text and the patent classification number returned by the server comprises:
the server carries out segmentation processing on the first patent text to obtain a text segment set corresponding to the first patent text;
the server extracts first text features of the text segment set through the text extraction model, and performs word meaning analysis and text recombination on the first patent text according to the first text features to obtain a first abstract text;
the server extracts the first keyword set from the first abstract text;
and the server extracts a second text feature of the text segment set through the text classification model and identifies the patent classification number of the first patent text according to the second text feature.
7. The method according to any one of claims 1 to 5, wherein the obtaining of the first patent text uploaded by the client comprises at least one of:
acquiring a selection instruction, wherein the selection instruction is used for indicating to select the first patent text stored in a target path; responding to the selection instruction, and uploading the first patent text;
acquiring a dragging instruction, wherein the dragging instruction is used for indicating that the first patent text is dragged to a target area of an interface displayed by the client; and responding to the dragging instruction, and uploading the first patent text.
8. A text retrieval method, comprising:
receiving a first patent text sent by a client;
acquiring a key text extracted from the first patent text and a patent classification number matched with the first patent text, wherein the key text is extracted from the first patent text through a text extraction model, and the text extraction model is a model for extracting the key text in the patent text, which is obtained after machine training is performed on a published patent text; the patent classification number is obtained by identifying a text classification model, and the text classification model is a model which is obtained by using a published patent text to perform machine training and is used for identifying the belonging classification of the patent text;
sending the key text and the patent classification number to the client;
receiving a first retrieval request generated by the client by using the key text and the patent classification number, wherein the first retrieval request is used for requesting retrieval of the first patent text;
and returning a first patent text list matched with the first patent text to the client, wherein the text similarity between the object patent text contained in the first patent text list and the first patent text is greater than a first threshold value.
9. The method of claim 8, wherein the obtaining the key text extracted from the first patent text and the patent classification number matched with the first patent text comprises:
segmenting the first patent text to obtain a text segment set corresponding to the first patent text;
extracting first text features of the text segment set through the text extraction model, and performing word meaning analysis and text recombination on the first patent text according to the first text features to obtain a first abstract text;
extracting a first keyword set from the first abstract text;
and extracting second text features of the text segment set through the text classification model, and identifying the patent classification number of the first patent text according to the second text features.
10. The method of claim 8, further comprising, prior to said returning to said client a list of first patent texts matching said first patent texts:
and in response to the first retrieval request, retrieving the first patent text list matched with the first patent text from a database through a text retrieval model, wherein the text retrieval model is a model which is obtained after machine training is carried out by using published patent texts and is used for text retrieval according to text similarity.
11. A text retrieval device, comprising:
the first acquiring unit is used for acquiring a first patent text uploaded by a client;
a second obtaining unit, configured to obtain a key text extracted from the first patent text and a patent classification number matched with the first patent text, where the key text is extracted from the first patent text by using a text extraction model, and the text extraction model is a model obtained after performing machine training using a published patent text and used for extracting the key text in the patent text; the patent classification number is obtained by identifying a text classification model, and the text classification model is a model which is obtained by using a published patent text to perform machine training and is used for identifying the belonging classification of the patent text;
a first sending unit, configured to send a first retrieval request generated by using the key text and the patent classification number to a server, where the first retrieval request is used to request to retrieve the first patent text;
a third obtaining unit, configured to obtain a first patent text list that is returned by the server and matches the first patent text, where a text similarity between an object patent text included in the first patent text list and the first patent text is greater than a first threshold.
12. The apparatus of claim 11, further comprising:
a display unit, configured to display, in the client, the obtained key text and the patent classification number before sending a first retrieval request generated by using the key text and the patent classification number to a server, where the key text includes a first abstract text and a first keyword set, the first abstract text is used to represent the first patent text, and object keywords included in the first keyword set are keywords extracted from the first abstract text;
and the generating unit is used for generating the first retrieval request by utilizing the key text and the patent classification number.
13. The apparatus of claim 12, further comprising:
a fourth obtaining unit, configured to obtain a first adjustment instruction generated by an editing operation performed in the client before the first retrieval request is generated by using the key text and the patent classification number;
a first adjusting unit, configured to perform at least one of the following adjusting operations according to the first adjusting instruction: and adjusting the first abstract text into a second abstract text, adjusting the first keyword set into a second keyword set, and adjusting the patent classification number into the adjusted patent classification number.
14. The apparatus of claim 12, further comprising:
a fifth obtaining unit, configured to obtain, after obtaining the first patent text list that matches the first patent text and is returned by the server, a second adjustment instruction generated by an editing operation performed in the client;
a second adjusting unit, configured to perform at least one of the following adjusting operations according to the second adjusting instruction: adjusting the first abstract text into a third abstract text, adjusting the first keyword set into a third keyword set, and adjusting the patent classification number into the adjusted patent classification number;
a sixth obtaining unit, configured to obtain an adjustment result according to the second adjustment instruction;
a second sending unit, configured to send, to the server, a second retrieval request generated using the adjustment result, where the second retrieval request is used to request to retrieve the first patent text;
a seventh obtaining unit, configured to obtain a second patent text list that is returned by the server and matches the first patent text, where a text similarity between an object patent text included in the second patent text list and the first patent text is greater than a second threshold, and the second threshold is greater than the first threshold.
15. The apparatus of claim 12, wherein the second obtaining unit comprises:
the sending module is used for sending the first patent text to the server so as to perform text preprocessing on the first patent text;
the first obtaining module is used for obtaining the key text and the patent classification number returned by the server.
16. The apparatus of claim 15, wherein the obtaining the key text and the patent classification number returned by the server comprises:
the server carries out segmentation processing on the first patent text to obtain a text segment set corresponding to the first patent text;
the server extracts first text features of the text segment set through the text extraction model, and performs word meaning analysis and text recombination on the first patent text according to the first text features to obtain a first abstract text;
the server extracts the first keyword set from the first abstract text;
and the server extracts a second text feature of the text segment set through the text classification model and identifies the patent classification number of the first patent text according to the second text feature.
17. The apparatus according to any one of claims 11 to 15, wherein the first obtaining unit comprises at least one of:
the second acquisition module is used for acquiring a selection instruction, wherein the selection instruction is used for indicating to select the first patent text stored in the target path; responding to the selection instruction, and uploading the first patent text;
the third acquisition module is used for acquiring a dragging instruction, wherein the dragging instruction is used for indicating that the first patent text is dragged to a target area of an interface displayed by the client; and responding to the dragging instruction, and uploading the first patent text.
18. A text retrieval device, comprising:
the first receiving unit is used for receiving a first patent text sent by a client;
the acquiring unit is used for acquiring a key text extracted from the first patent text and a patent classification number matched with the first patent text, wherein the key text is extracted from the first patent text through a text extraction model, and the text extraction model is a model for extracting the key text in the patent text, which is obtained after machine training is carried out on the published patent text; the patent classification number is obtained by identifying a text classification model, and the text classification model is a model which is obtained by using a published patent text to perform machine training and is used for identifying the belonging classification of the patent text;
a sending unit, configured to send the key text and the patent classification number to the client;
a second receiving unit, configured to receive a first retrieval request generated by the client using the key text and the patent classification number, where the first retrieval request is used to request to retrieve the first patent text;
and a returning unit, configured to return a first patent text list matched with the first patent text to the client, where text similarity between an object patent text included in the first patent text list and the first patent text is greater than a first threshold.
19. The apparatus of claim 18, wherein the obtaining unit comprises:
the processing module is used for carrying out segmentation processing on the first patent text to obtain a text segment set corresponding to the first patent text;
the first extraction module is used for extracting first text features of the text segment set through the text extraction model, and performing word meaning analysis and text recombination on the first patent text according to the first text features to obtain a first abstract text;
the second extraction module is used for extracting a first keyword set from the first abstract text;
and the third extraction module is used for extracting a second text feature of the text segment set through the text classification model and identifying the patent classification number of the first patent text according to the second text feature.
20. The apparatus of claim 18, further comprising:
and a retrieving unit, configured to, in response to the first retrieval request, retrieve, from a database, a first patent text list that matches the first patent text before returning, to the client, the first patent text list that matches the first patent text by using a text retrieval model, where the text retrieval model is a model obtained after performing machine training using published patent texts and used for performing text retrieval according to text similarity.
21. A storage medium comprising a stored program, wherein the program when executed performs the method of any of claims 1 to 7 or 8 to 10.
22. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 7 or 8 to 10 by means of the computer program.
CN201811069929.4A 2018-09-13 2018-09-13 Text retrieval method and device, storage medium and electronic device Active CN110895556B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811069929.4A CN110895556B (en) 2018-09-13 2018-09-13 Text retrieval method and device, storage medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811069929.4A CN110895556B (en) 2018-09-13 2018-09-13 Text retrieval method and device, storage medium and electronic device

Publications (2)

Publication Number Publication Date
CN110895556A true CN110895556A (en) 2020-03-20
CN110895556B CN110895556B (en) 2023-07-28

Family

ID=69785761

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811069929.4A Active CN110895556B (en) 2018-09-13 2018-09-13 Text retrieval method and device, storage medium and electronic device

Country Status (1)

Country Link
CN (1) CN110895556B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070244881A1 (en) * 2006-04-13 2007-10-18 Lg Electronics Inc. System, method and user interface for retrieving documents
JP2008071198A (en) * 2006-09-14 2008-03-27 Ricoh Co Ltd Document retrieval device, document retrieval method, document retrieval program and storage medium
CN101276340A (en) * 2007-03-29 2008-10-01 上海汉光知识产权数据科技有限公司 Patent data retrieval system
CN106156111A (en) * 2015-04-03 2016-11-23 北京中知智慧科技有限公司 Patent document search method, device and system
CN106372226A (en) * 2016-09-07 2017-02-01 知识产权出版社有限责任公司 Information retrieval device and method
US20170075877A1 (en) * 2015-09-16 2017-03-16 Marie-Therese LEPELTIER Methods and systems of handling patent claims
CN107247780A (en) * 2017-06-12 2017-10-13 北京理工大学 A kind of patent document method for measuring similarity of knowledge based body

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070244881A1 (en) * 2006-04-13 2007-10-18 Lg Electronics Inc. System, method and user interface for retrieving documents
JP2008071198A (en) * 2006-09-14 2008-03-27 Ricoh Co Ltd Document retrieval device, document retrieval method, document retrieval program and storage medium
CN101276340A (en) * 2007-03-29 2008-10-01 上海汉光知识产权数据科技有限公司 Patent data retrieval system
CN106156111A (en) * 2015-04-03 2016-11-23 北京中知智慧科技有限公司 Patent document search method, device and system
US20170075877A1 (en) * 2015-09-16 2017-03-16 Marie-Therese LEPELTIER Methods and systems of handling patent claims
CN106372226A (en) * 2016-09-07 2017-02-01 知识产权出版社有限责任公司 Information retrieval device and method
CN107247780A (en) * 2017-06-12 2017-10-13 北京理工大学 A kind of patent document method for measuring similarity of knowledge based body

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张晓云: "网络环境下的专利信息检索", 图书馆工作与研究 *
马双刚: "基于深度学习理论与方法的中文专利文本自动分类研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *

Also Published As

Publication number Publication date
CN110895556B (en) 2023-07-28

Similar Documents

Publication Publication Date Title
CN107346336B (en) Information processing method and device based on artificial intelligence
CN107193962B (en) Intelligent map matching method and device for Internet promotion information
CN109513211B (en) Art resource file processing method and device and game resource display system
CN104850546B (en) Display method and system of mobile media information
CN104102639B (en) Popularization triggering method based on text classification and device
CN106844685B (en) Method, device and server for identifying website
CN106649446B (en) Information pushing method and device
CN103440243A (en) Teaching resource recommendation method and device thereof
CN112328823A (en) Training method and device for multi-label classification model, electronic equipment and storage medium
CN111552767A (en) Search method, search device and computer equipment
CN111125491A (en) Commodity information searching method and device, storage medium and electronic device
CN110209921B (en) Method and device for pushing media resource, storage medium and electronic device
CN106899755B (en) Information sharing method, information sharing device and terminal
CN109408658A (en) Expression picture reminding method, device, computer equipment and storage medium
CN110929058B (en) Trademark picture retrieval method and device, storage medium and electronic device
CN113407775B (en) Video searching method and device and electronic equipment
CN112000866B (en) Internet data analysis method, device, electronic device and medium
CN111429200B (en) Content association method and device, storage medium and computer equipment
CN112417874A (en) Named entity recognition method and device, storage medium and electronic device
CN112836057B (en) Knowledge graph generation method, device, terminal and storage medium
CN110895555B (en) Data retrieval method and device, storage medium and electronic device
CN110245357B (en) Main entity identification method and device
CN112752134A (en) Video processing method and device, storage medium and electronic device
CN110543457A (en) Track type document processing method and device, storage medium and electronic device
CN110941711A (en) Electronic search report acquisition method and apparatus, storage medium, and electronic apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210107

Address after: 17c, 14 / F, unit 3, building 3, No.48, Zhichun Road, Haidian District, Beijing 100098

Applicant after: Beijing Blue lantern fish Intelligent Technology Co.,Ltd.

Address before: 1411 Junyue Pavilion, 9 Yannan Road, Fuqiang community, Huaqiangbei street, Futian District, Shenzhen, Guangdong 518031

Applicant before: Shenzhen Blue Lantern Fish Intelligent Technology Co.,Ltd.

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant