CN112507109A

CN112507109A - Retrieval method and device based on semantic analysis and keyword recognition

Info

Publication number: CN112507109A
Application number: CN202011442031.4A
Authority: CN
Inventors: 刘伟; 刘灿; 吴永杰; 钟延珍; 陈善雄; 李莉; 李磊; 王雪春; 王仲煜
Original assignee: Chongqing Intellectual Property Big Data Research Institute Co ltd
Current assignee: Chongqing Intellectual Property Big Data Research Institute Co ltd
Priority date: 2020-12-11
Filing date: 2020-12-11
Publication date: 2021-03-16

Abstract

The invention provides a retrieval method and a retrieval device based on semantic analysis and keyword recognition, which comprise the following steps: extracting patent keywords from the patent text by a Textrank algorithm to obtain a patent keyword data set, and performing vector conversion according to an Elmo dynamic word vector conversion algorithm to obtain a patent keyword vector set; determining weights of titles, abstracts, first claims and technical effect sentences of patent texts through an analytic hierarchy process, matching keywords in index information from high weights to low weights according to the keywords to be retrieved to obtain a matching keyword vector set, inputting the matching keyword vector set into a weight model, calculating the weight values of corresponding patent texts, carrying out TOP-K sorting according to the weight values, and forming retrieval results and presenting the retrieval results to a user side. The invention can expand the coverage of related patents, and carry out semantic analysis and keyword recognition on the contents in the patent text, thereby improving the relevance of the retrieval result.

Description

Retrieval method and device based on semantic analysis and keyword recognition

Technical Field

The invention relates to the technical field of patent information, in particular to a retrieval method and a retrieval device based on semantic analysis and keyword recognition.

Background

The task of patent retrieval is to match patent information which best meets the requirements of users according to the conditions provided by the users. With the advent of the big data age, patent retrieval has become an important research hotspot in the field of information retrieval. The particularity of patent retrieval is that the retrieval object is a patent text, and the patent text has particularity unlike the traditional information retrieval task. The attributes of the patent text are various, such as common ipc classification numbers, claim numbers, technical efficacy, legal status, invention types and the like, which often require more specialized personnel to reasonably utilize the data; the patent text also has the characteristics of integration of various information, technical sensitization, wide subject range, wide regional coverage and the like, the characteristics of the patent text are fully considered in the process of constructing the patent retrieval model, the model construction can improve the retrieval efficiency, important help is provided for scientific research, social and economic activities and the like, and a promoting effect is provided for development of subjects and progress of scientific technology.

The main patent retrieval modes existing at present mainly include the following modes:

(1) a patent retrieval method based on a topic model and a language model comprises the steps of firstly constructing a candidate set, wherein relevant patents searched by initial query are loaded in the candidate set, then sorting the screened candidate patents based on the language model and the topic model (LDA and DMR), and the basis of sorting is the proposed weight evaluation standard;

(2) in the patent retrieval method based on the reference relationship, the mutual association between objects, the development context of a technical route and the like can be seen by reasonably using the reference relationship, for example, Fujii calculates the correlation relationship between patent documents on the basis of the reference relationship by using reference information between patents, so as to expand the patent retrieval result. Starting from a patent citation relation, the Mahdabi and the Crestani construct a patent citation network, and provide a Pagerank algorithm based on time perception on the basis of the network;

(3) the patent retrieval method based on query expansion is the most commonly used method in the patent retrieval field and is mainly used for solving the problem of low recall ratio caused by ambiguity or ambiguity of initial query, for example, the query expansion method based on position nearest neighbor uses IPC description as an expansion dictionary to expand query words, and the main idea is to calculate the closeness between candidate words and query words by the distance between the candidate words and the query words in a text;

(4) the patent retrieval method based on the ontology is based on the idea of ontology modeling, the ontology modeling method is applied to the description of the patent information ontology, multiple description problems of the same concept in the patent information database are solved by establishing a patent retrieval information association ontology, then, the patent information ontology, examples and data in the patent database are associated, and the self-organization optimization of the patent retrieval information and the sequencing of patent resources are realized by combining the self-organization evolution process and method of the ontology.

However, the means (1) and (2) are difficult to cover all the related patents because [1] different applicants may use different terms to describe the same technology, and even experts may use different terms; [2] when applying for a patent, sometimes it is desirable to keep the tone low, and in order to avoid paying too much attention to the own patent, they often choose some rare words to describe their own technology; [3] an immature technology is not standardized in the development process and has no uniform name; [4] in the translation process, patents in different countries have no uniform standard;

the method (3) only processes text information of patent documents, but the content of the patent documents is far more than that of the texts and often contains some non-text information such as drawings, diagrams and the like, but because the related fields are wide, the part of information cannot be processed at present and can only be ignored, the information also has important significance for understanding the content of the patent documents, and because the ICPC classification method is too complex, some patent documents are not thoroughly classified, and occasionally can meet patent documents which cannot be classified or are cross-classified, and the requirements on training sets and classification algorithms are high, so that the difficulty of correcting the problems is greatly increased;

the method for rapidly constructing the ontology in the field provided by the mode (4) depends on the existing and perfect relational corpus, but patent data covers all industries of the current society, great differences exist in all fields, and the task of finding the corpus corresponding to each field is almost impossible, so that the search of a proper ontology construction method and the mutual integration of all fields is still a difficult task.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a search method and apparatus based on semantic analysis and keyword recognition.

A retrieval method based on semantic analysis and keyword recognition comprises the following steps:

acquiring search information, wherein the search information comprises keywords to be retrieved; acquiring a patent text from a patent database, extracting patent keywords from the patent text according to a Textrank algorithm, and acquiring a patent keyword data set; performing vector conversion on the patent keyword data set through an Elmo dynamic word vector conversion algorithm to obtain a patent keyword vector set; determining relevant weights for each index information of a patent text through an analytic hierarchy process, matching keywords in the index information from high weights to low weights according to the keywords to be retrieved, and acquiring a matched keyword vector set, wherein the index information comprises: title, abstract, first claim and technical effect clauses; inputting the matched keyword vector set into a weight model, and calculating the weight value of the patent text corresponding to the matched keyword vector set; and carrying out TOP-K sorting according to the weight value of the patent text to form and present a retrieval result.

In one embodiment, the extracting patent keywords from the patent text according to the Textrank algorithm to obtain a patent keyword dataset specifically includes: segmenting the patent text and filtering out stop words to obtain candidate keywords; constructing a candidate keyword graph G ═ (V, e), wherein V is a node set, and the node set consists of the candidate keywords; constructing an edge between any two nodes by utilizing a co-occurrence relation; iteratively updating the weight value of each node according to a weight updating formula until the weight value of each node converges to a range, namely, the weight value obtained by the last updating is regarded as the weight value of the node; sorting the weighted values of the nodes in an inverted order, wherein the key words corresponding to the nodes arranged in the preset order are important key words; and marking the important keywords in the corresponding patent texts, and constructing a patent keyword data set through the important keywords.

In one embodiment, the weight update formula is:

wherein, V_iAnd V_jAll represent a set of nodes, WS (V)_i) Representing a set of nodes V_iD is a damping coefficient, and is generally set to 0.85 In (V)_i) Set of sentences representing the presence of the keyword i, Out (V)_j) Indicating presence of a keySentence set of word j, weight term ω_jiIs the weight of the edge, i.e., the similarity between sentences.

In one embodiment, the vector conversion of the patent keyword data set by the Elmo dynamic word vector conversion algorithm to obtain a patent keyword vector set specifically includes: constructing a word vector conversion model through a double-layer biLSTM structure; training the word vector transformation model in a dataset; and inputting the patent keyword data set into the word vector conversion model, and outputting the word vectors of the corresponding patent keywords according to the patent keywords to obtain a patent keyword vector set.

In one embodiment, the inputting the patent keyword data set into the word vector conversion model, and outputting a corresponding patent keyword vector according to the patent keyword to obtain a patent keyword vector set specifically includes: a sentence is given, and the sentence comprises a corresponding keyword data set; searching a word vector E (1), a word vector E (N) corresponding to the keyword from a static keyword vector table according to the keyword data set, and inputting the word vector conversion model, wherein the word vector conversion model comprises a first layer forward LSTM, a first layer backward LSTM, a second layer forward LSTM and a second layer backward LSTM; respectively inputting keyword vectors E (1),. multidot.,. multidot.E (N)) into a first-layer forward LSTM and a first-layer backward LSTM, so as to obtain forward outputs h (1,1, →),. multidot.,. multidot.E (N), (N)) and backward outputs h (1,1, ° h); passing the forward outputs h (1,1, →), ·, h (N,1, →) into the second layer forward LSTM, resulting in second layer forward outputs h (1,2, →),. ·, h (N,2, →); transmitting the backward output h (1,1, ←), · h (N,1, ←) into the second layer backward LSTM, obtaining the second layer backward output h (1,2, ←), · h (N,2, ←); the word vectors that keyword i can ultimately find include e (i), h (N,1, →), h (N,2, →) and h (N,2, ←).

In one embodiment, the inputting the matching keyword vector set into a weight model, and calculating a weight value of the patent text corresponding to the matching keyword vector set specifically includes: presetting a keyword similarity threshold U; recording the times of keyword retrieval as n, and using x, y, z and h to count the number of words with keyword similarity larger than U; calculating the weight value of the corresponding keyword according to a weight calculation formula; the weight calculation formula is as follows:

wherein, w₁、w₂、w₃And w₄Each representing a corresponding ranking weight vector for the keyword.

A retrieval device based on semantic analysis and keyword recognition comprises:

the information acquisition module is used for acquiring search information, and the search information comprises keywords to be retrieved; the keyword extraction module is used for acquiring patent texts from a patent database, extracting patent keywords from the patent texts according to a Textrank algorithm and acquiring a patent keyword data set; the vector conversion module is used for carrying out vector conversion on the patent keyword data set through an Elmo dynamic word vector conversion algorithm to obtain a patent keyword vector set; the keyword matching module is used for determining relevant weights for each index information of the patent text through an analytic hierarchy process, matching keywords in the index information from high weights to low weights according to the keywords to be retrieved, and acquiring a matching keyword vector set, wherein the index information comprises: title, abstract, first claim and technical effect clauses; the weight calculation module is used for inputting the matching keyword vector set into a weight model and calculating the weight value of the patent text corresponding to the matching keyword vector set; and the text sorting module is used for carrying out TOP-K sorting according to the weight value of the patent text to form a retrieval result and presenting the retrieval result to the user side.

Compared with the prior art, the invention has the advantages and beneficial effects that: firstly, obtaining search information of a user, wherein the search information comprises keywords to be retrieved, obtaining patent texts from a patent database, extracting the patent keywords from the patent texts according to a Textrank algorithm to obtain a patent keyword data set, carrying out vector conversion on the patent keyword data set through an Elmo dynamic word vector conversion algorithm to obtain a patent keyword vector set, determining related weights for each index information of the patent texts through an analytic hierarchy process, namely judging the weights corresponding to titles, abstracts, first claim requirements and technical effect sentences, obtaining a matching keyword vector set according to matching of the keywords from high to low of the weight values, inputting the matching keyword vector set into a weight model, calculating the weight values of the patent texts corresponding to the matching keyword vector set relative to the keywords to be retrieved, carrying out TOP-K sorting according to the weight values from high to low, and a retrieval result is formed and presented to a user side, so that the coverage of related patents can be enlarged, and semantic analysis and keyword recognition are performed on the content in the patent text, thereby improving the relevance of the retrieval result.

Drawings

FIG. 1 is a schematic flow chart illustrating a search method based on semantic analysis and keyword recognition according to an embodiment;

FIG. 2 is a diagram illustrating a structure of a word vector transformation model in one embodiment;

fig. 3 is a schematic structural diagram of a retrieving device based on semantic analysis and keyword recognition according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings by way of specific embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In one embodiment, as shown in fig. 1, there is provided a search method based on semantic analysis and keyword recognition, including the following steps:

step S101, search information is obtained, and the search information comprises keywords to be retrieved.

Specifically, the user may input search information for retrieving a patent in a web page or an application, the search information including a keyword to be retrieved.

Step S102, obtaining patent texts from a patent database, extracting patent keywords from the patent texts according to a Textrank algorithm, and obtaining a patent keyword data set.

Specifically, patent texts are obtained from a patent database, which can be a patent database in an existing website or a high-value patent database after compilation, and patent keywords are extracted from the patent texts according to a Textrank algorithm to obtain a patent keyword data set.

The Textrank algorithm, i.e. the text sorting algorithm, is used to generate keywords and summaries for the text.

And step S103, carrying out vector conversion on the patent keyword data set through an Elmo dynamic word vector conversion algorithm to obtain a patent keyword vector set.

Specifically, the Elmo (entries from Language Models, word vector representation from Language Models) dynamic word vector transformation algorithm can perform vector transformation on the obtained patent keyword data set to obtain a corresponding patent keyword vector set.

Step S104, determining relevant weights for each index information of the patent text through an analytic hierarchy process, matching keywords in the index information from high weights to low weights according to the keywords to be retrieved, and acquiring a matching keyword vector set, wherein the index information comprises: the title, abstract, headings, and technical clauses.

Specifically, the related weight of the title, the abstract, the first claim and the technical effect sentence in the patent text to the patent text is determined through an analytic hierarchy process; and matching keywords in the titles, abstracts, first claims and technical effect sentences of the patent texts from high weight to low weight according to the keywords to be retrieved to obtain a matching keyword vector set.

Step S105, inputting the matched keyword vector set into the weight model, and calculating the weight value of the patent text corresponding to the matched keyword vector set.

Specifically, a weight model may be constructed according to an analytic hierarchy process, the matching keyword vector set is input to the weight model, and a weight value of the patent text corresponding to the matching keyword vector set is calculated.

And S106, performing TOP-K sorting according to the weight value of the patent text to form a retrieval result and presenting the retrieval result to the user side.

Specifically, TOP-K sorting is performed according to the weighted value of the patent text, wherein the K value can be set according to the needs of the user, and the sorted patent text is taken as a retrieval result and presented to a retrieval page of the user.

In this embodiment, first, search information of a user is obtained, the search information includes keywords to be retrieved, patent texts are obtained in a patent database, patent keywords are extracted from the patent texts according to a Textrank algorithm to obtain a patent keyword data set, the patent keyword data set is vector-converted through an Elmo dynamic word vector conversion algorithm to obtain a patent keyword vector set, relevant weights are determined for each index information of the patent texts through an analytic hierarchy process, namely, weights corresponding to a title, an abstract, a first claim and a technical effect sentence are judged, a matching keyword vector set is obtained according to matching of the keywords from high to low of the weight values, the matching keyword vector set is input into a weight model, the weight values of the patent texts corresponding to the matching keyword vector set relative to the keywords to be retrieved are calculated, TOP-K sorting is performed according to the weight values from high to low, and the sequencing result is presented as a retrieval result, so that the coverage of related patents can be improved, and the semantic analysis and keyword recognition are performed on the content in the patent text, thereby improving the relevance of the retrieval result.

In one embodiment, the relationship between patents can be obtained by constructing a patent knowledge graph, and query expansion and the like can be performed.

Wherein, step S102 specifically includes: segmenting the patent text and filtering out stop words to obtain candidate keywords; constructing a candidate keyword graph G which is (V, e), wherein V is a node set, and the node set consists of candidate keywords; constructing an edge between any two nodes by utilizing a co-occurrence relation; iteratively updating the weight value of each node according to a weight updating formula until the weight value of each node converges to a range, namely, the weight value obtained by the last updating is regarded as the weight value of the node; sorting the weighted values of the nodes in an inverted order, wherein the key words corresponding to the nodes arranged in the preset order are important key words; and marking the important keywords in the corresponding patent texts, and constructing a patent keyword data set through the important keywords.

Specifically, when the weight value of a node is always within a range and fluctuates, the range can be determined as a preset range, at this time, the weight update of the node is stopped, and the weight value updated last time is used as the weight value of the node.

The reverse order arrangement means that the weighted values are ordered from large to small, the keywords corresponding to the nodes arranged in the preset order are the important keywords, the preset order can be set according to actual needs, for example, the top five can be set, that is, the keywords with the weighted values of the nodes ordered at the top five are obtained as the important keywords.

Specifically, after the important keywords are labeled in the patent text, when the retrieval result is presented to the user, the retrieval result can be conveniently used for checking the patent text, and the reading speed of the patent document is increased.

Wherein, the weight updating formula is as follows:

wherein, V_iAnd V_jAll represent a set of nodes, WS (V)_i) Representing a set of nodes V_iD is a damping coefficient, and is generally set to 0.85 In (V)_i) Set of sentences representing the presence of the keyword i, Out (V)_j) Set of sentences representing the presence of a keyword j, weight term ω_jiIs the weight of the edge, i.e., the similarity between sentences.

Wherein, step S103 specifically includes: constructing a word vector conversion model through a double-layer biLSTM structure; training a word vector fitting and replacing model in a data set; and inputting the patent keyword data set into a word vector conversion model, outputting word vectors of corresponding patent keywords according to the patent keywords, and acquiring a patent keyword vector set.

FIG. 2 shows the structure of a word vector transformation modelSchematic diagram, wherein 10 is a forward language model (LSTM), 20 is a backward language model (LSTM), T₁、T₂……T_NRepresenting the remaining keyword data that has not been vector converted.

Specifically, the biLSTM (bidirectional language model) has both forward LSTM and backward LSTM, and can learn word vectors storing the above information and the below information at the same time. The word vector emphasis points obtained by different layers in the biLSTM are different, the CNN-BIG-LSTM word vector adopted by the input layer can better encode the part of speech information, the first layer of LSTM can better encode the syntax information, and the second layer of LSTM can better encode the word semantic information; and obtaining a final word vector through the fusion of the multiple layers of word vectors, wherein the final word vector can give consideration to multiple information of different layers.

The method includes the steps of inputting a patent keyword data set into the word vector conversion model, outputting corresponding patent keyword vectors according to patent keywords, and obtaining a patent keyword vector set, and specifically includes the steps of: a sentence is given, and the sentence comprises a corresponding keyword data set; searching a word vector E (1), a word vector E (N) corresponding to the keyword from a static keyword vector table according to the keyword data set, and inputting the word vector conversion model, wherein the word vector conversion model comprises a first layer forward LSTM, a first layer backward LSTM, a second layer forward LSTM and a second layer backward LSTM; respectively inputting keyword vectors E (1),. multidot.,. multidot.E (N)) into a first-layer forward LSTM and a first-layer backward LSTM, so as to obtain forward outputs h (1,1, →),. multidot.,. multidot.E (N), (N)) and backward outputs h (1,1, ° h); passing the forward outputs h (1,1, →), ·, h (N,1, →) into the second layer forward LSTM, resulting in second layer forward outputs h (1,2, →),. ·, h (N,2, →); transmitting the backward output h (1,1, ←), · h (N,1, ←) into the second layer backward LSTM, obtaining the second layer backward output h (1,2, ←), · h (N,2, ←); the word vectors that keyword i can ultimately find include e (i), h (N,1, →), h (N,2, →) and h (N,2, ←).

Specifically, the static keyword vector table may be obtained by a static Word vector algorithm, such as Word to vector (Word to vector) algorithm and Glove (latent semantic analysis) algorithm.

Specifically, if a biLSTM of L layers is employed, 2L +1 word vectors can be finally obtained.

Wherein, step S105 specifically includes: presetting a keyword similarity threshold U; and recording the number of times of searching the keywords as n, using x, y, z and h to count the number of words with the keyword similarity larger than U, and calculating the weight value of the corresponding keyword according to a weight calculation formula.

Specifically, the weight calculation formula is:

In one embodiment, as shown in fig. 3, there is provided a retrieval apparatus 30 based on semantic analysis and keyword recognition, including: an information obtaining module 31, a keyword extracting module 32, a vector converting module 33, a keyword matching module 34, a weight calculating module 35, and a text sorting module 36, wherein:

the information acquisition module 31 is configured to acquire search information, where the search information includes a keyword to be retrieved;

the keyword extraction module 32 is configured to obtain a patent text from a patent database, extract patent keywords from the patent text according to a Textrank algorithm, and obtain a patent keyword dataset;

the vector conversion module 33 is configured to perform vector conversion on the patent keyword data set through an Elmo dynamic word vector conversion algorithm to obtain a patent keyword vector set;

the keyword matching module 34 is configured to determine a relevant weight for each index information of the patent text by an analytic hierarchy process, and match keywords in the index information from a high weight to a low weight according to the keyword to be retrieved to obtain a matching keyword vector set, where the index information includes: title, abstract, first claim and technical effect clauses;

the weight calculation module 35 is configured to input the matching keyword vector set into a weight model, and calculate a weight value of the patent text corresponding to the matching keyword vector set;

and the text sorting module 36 is configured to perform TOP-K sorting according to the weight values of the patent texts, form a retrieval result, and present the retrieval result to the user side.

In one embodiment, the keyword extraction module 32 is further configured to: segmenting the patent text and filtering out stop words to obtain candidate keywords; constructing a candidate keyword graph G ═ (V, e), wherein V is a node set, and the node set consists of the candidate keywords; constructing an edge between any two nodes by utilizing a co-occurrence relation; iteratively updating the weight value of each node according to a weight updating formula until the weight value of each node converges to a range, namely, the weight value obtained by the last updating is regarded as the weight value of the node; sorting the weighted values of the nodes in an inverted order, wherein the key words corresponding to the nodes arranged in the preset order are important key words; and marking the important keywords in the corresponding patent texts, and constructing a patent keyword data set through the important keywords.

In one embodiment, the vector conversion module 33 is further configured to: constructing a word vector conversion model through a double-layer biLSTM structure; training the word vector transformation model in a dataset; and inputting the patent keyword data set into the word vector conversion model, and outputting the word vectors of the corresponding patent keywords according to the patent keywords to obtain a patent keyword vector set.

The foregoing is a more detailed description of the present invention that is presented in conjunction with specific embodiments, and the practice of the invention is not to be considered limited to those descriptions. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims

1. A retrieval method based on semantic analysis and keyword recognition is characterized by comprising the following steps:

acquiring search information, wherein the search information comprises keywords to be retrieved;

acquiring a patent text from a patent database, extracting patent keywords from the patent text according to a Textrank algorithm, and acquiring a patent keyword data set;

performing vector conversion on the patent keyword data set through an Elmo dynamic word vector conversion algorithm to obtain a patent keyword vector set;

determining relevant weights for each index information of a patent text through an analytic hierarchy process, matching keywords in the index information from high weights to low weights according to the keywords to be retrieved, and acquiring a matched keyword vector set, wherein the index information comprises: title, abstract, first claim and technical effect clauses;

inputting the matched keyword vector set into a weight model, and calculating the weight value of the patent text corresponding to the matched keyword vector set;

and carrying out TOP-K sorting according to the weight value of the patent text to form a retrieval result and presenting the retrieval result to the user side.

2. The retrieval method based on semantic analysis and keyword recognition according to claim 1, wherein the extracting patent keywords from the patent text according to Textrank algorithm to obtain a patent keyword dataset specifically comprises:

segmenting the patent text and filtering out stop words to obtain candidate keywords;

constructing a candidate keyword graph G ═ (V, e), wherein V is a node set, and the node set consists of the candidate keywords;

constructing an edge between any two nodes by utilizing a co-occurrence relation;

iteratively updating the weight value of each node according to a weight updating formula until the weight value of each node converges to a range, namely, the weight value obtained by the last updating is regarded as the weight value of the node;

sorting the weighted values of the nodes in an inverted order, wherein the key words corresponding to the nodes arranged in the preset order are important key words;

and marking the important keywords in the corresponding patent texts, and constructing a patent keyword data set through the important keywords.

3. The search method based on semantic analysis and keyword recognition according to claim 2, wherein the weight update formula is:

4. The search method based on semantic analysis and keyword recognition according to claim 1, wherein the vector conversion is performed on the patent keyword data set through an Elmo dynamic word vector conversion algorithm to obtain a patent keyword vector set, specifically comprising:

constructing a word vector conversion model through a double-layer biLSTM structure;

training the word vector transformation model in a dataset;

and inputting the patent keyword data set into the word vector conversion model, and outputting the word vectors of the corresponding patent keywords according to the patent keywords to obtain a patent keyword vector set.

5. The method according to claim 4, wherein the step of inputting the patent keyword data set into the word vector conversion model and outputting a corresponding patent keyword vector according to the patent keyword to obtain a patent keyword vector set comprises:

a sentence is given, and the sentence comprises a corresponding keyword data set;

searching a word vector E (1), a word vector E (N) corresponding to the keyword from a static keyword vector table according to the keyword data set, and inputting the word vector conversion model, wherein the word vector conversion model comprises a first layer forward LSTM, a first layer backward LSTM, a second layer forward LSTM and a second layer backward LSTM;

respectively inputting keyword vectors E (1),. multidot.,. multidot.E (N)) into a first-layer forward LSTM and a first-layer backward LSTM, so as to obtain forward outputs h (1,1, →),. multidot.,. multidot.E (N), (N)) and backward outputs h (1,1, ° h);

passing the forward outputs h (1,1, →), ·, h (N,1, →) into the second layer forward LSTM, resulting in second layer forward outputs h (1,2, →),. ·, h (N,2, →); transmitting the backward output h (1,1, ←), · h (N,1, ←) into the second layer backward LSTM, obtaining the second layer backward output h (1,2, ←), · h (N,2, ←);

the word vectors that keyword i can ultimately find include e (i), h (N,1, →), h (N,2, →) and h (N,2, ←).

6. The method as claimed in claim 1, wherein the step of inputting the matching keyword vector set into a weight model to calculate a weight value of the patent text corresponding to the matching keyword vector set includes:

presetting a keyword similarity threshold U;

recording the times of keyword retrieval as n, and using x, y, z and h to count the number of words with keyword similarity larger than U;

calculating the weight value of the corresponding keyword according to a weight calculation formula;

the weight calculation formula is as follows:

7. A retrieval device based on semantic analysis and keyword recognition is characterized by comprising:

the information acquisition module is used for acquiring search information, and the search information comprises keywords to be retrieved;

the keyword extraction module is used for acquiring patent texts from a patent database, extracting patent keywords from the patent texts according to a Textrank algorithm and acquiring a patent keyword data set;

the vector conversion module is used for carrying out vector conversion on the patent keyword data set through an Elmo dynamic word vector conversion algorithm to obtain a patent keyword vector set;

the keyword matching module is used for determining relevant weights for each index information of the patent text through an analytic hierarchy process, matching keywords in the index information from high weights to low weights according to the keywords to be retrieved, and acquiring a matching keyword vector set, wherein the index information comprises: title, abstract, first claim and technical effect clauses;

the weight calculation module is used for inputting the matching keyword vector set into a weight model and calculating the weight value of the patent text corresponding to the matching keyword vector set;

and the text sorting module is used for carrying out TOP-K sorting according to the weight value of the patent text to form a retrieval result and presenting the retrieval result to the user side.