WO2021093871A1 - Procédé d'interrogation de texte, dispositif d'interrogation de texte, et support de stockage informatique - Google Patents

Procédé d'interrogation de texte, dispositif d'interrogation de texte, et support de stockage informatique Download PDF

Info

Publication number
WO2021093871A1
WO2021093871A1 PCT/CN2020/128801 CN2020128801W WO2021093871A1 WO 2021093871 A1 WO2021093871 A1 WO 2021093871A1 CN 2020128801 W CN2020128801 W CN 2020128801W WO 2021093871 A1 WO2021093871 A1 WO 2021093871A1
Authority
WO
WIPO (PCT)
Prior art keywords
sentence
query
document
word
vector
Prior art date
Application number
PCT/CN2020/128801
Other languages
English (en)
Chinese (zh)
Inventor
杨敏
姜青山
曲强
李成明
贺倩明
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Publication of WO2021093871A1 publication Critical patent/WO2021093871A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3349Reuse of stored results of previous queries

Definitions

  • This application relates to the technical field of text query, in particular to a text query method, text query device and computer storage medium.
  • the user In the literature search, the user is given a problem related to a professional field, and the retrieval system needs to find the most relevant documents from the database and return it to the user. The user can quickly obtain the required related literature data, which can save plenty of time.
  • this application provides a text query method, text query device, and computer storage medium, which can improve the accuracy and efficiency of text query.
  • a technical solution adopted in this application is to provide a text query method.
  • the method includes: based on the first word-level relevance of the query sentence and the document sentence, an attention mechanism is introduced to the query sentence and the document sentence, and according to the introduction
  • the relevance of the query sentence and the document sentence after the attention mechanism is used to obtain the first query result; according to the relevance of the first word level, the phrase-level relevance of the query sentence and the document sentence is obtained, and the relevance of the phrase level is obtained
  • the second query result based on the second word level relevance of the professional domain vocabulary in the query sentence and the professional domain vocabulary in the document sentence, the attention mechanism is introduced to the query sentence and the document sentence, and according to the query after the attention mechanism is introduced
  • the relevance of the statement and the document statement obtains the third query result; according to the first query result, the second query result, and the third query result, the final query result based on the query statement is determined.
  • an attention mechanism is introduced to the query statement and the document statement, and the first query result is obtained according to the relevance of the query statement and the document statement after the attention mechanism is introduced.
  • the first query result is obtained according to the correlation between the query statement and the document statement after the attention mechanism is introduced.
  • determining the vector expression of the query sentence and the document sentence includes: performing word segmentation and word embedding processing on the query sentence and the document sentence to obtain the vector expression Q n*k of the query sentence and the vector expression D m*k of the document sentence, where , Among them, k represents the dimension of the word embedding vector, n represents the number of segmented words in the query sentence sequence, and m represents the number of segmented words in the document sentence. Represents the vector expression of the i-th word in the query sequence, Represents the vector expression of the i-th word in the document.
  • calculating the word-level correlation matrix of the query sentence and the document sentence includes: calculating the word-level correlation matrix M n*m of the query sentence and the document sentence, where the i-th row in the matrix M n*m is the jth row
  • the element M ij of the column is calculated using the following formula: among them, Represents the vector corresponding to the i-th word in the query sequence, Represents the vector corresponding to the jth word in the document sentence.
  • the attention mechanism is introduced to the vector expression of the query sentence and the document sentence, including: using the following formula to calculate the vector expression of the query sentence and the document sentence after the attention mechanism is introduced : among them, Represents the vector after the i-th word in the query sequence is introduced into the attention mechanism, Represents the vector after the jth word in the document is introduced into the attention mechanism.
  • the first query result is obtained, including: calculating the Hadamard product of the two vectors before and after each word in the query sentence and the document sentence is introduced into the attention mechanism; The two vectors and Hadamard products before and after the attention mechanism is introduced for each word in the query sentence and the document sentence are spliced to form a splicing vector; the correlation matrix of the splicing vector of the query sentence and the splicing vector of the document sentence is calculated; the query sentence Pooling is performed on the correlation matrix of the stitching vector of and the stitching vector of the document sentence to obtain the first query result.
  • the pooling operation is performed on the correlation matrix of the splicing vector of the query sentence and the splicing vector of the document sentence to obtain the first query result, including: pooling the correlation matrix of the splicing vector of the query sentence and the splicing vector of the document sentence Operation to get the first intermediate vector
  • idf i is the inverse text frequency index value of the i-th word in the query sentence
  • represents the total number of documents in the corpus
  • df i represents the number of documents containing the i-th word in the corpus.
  • the phrase-level relevance of the query sentence and the document sentence is obtained, and the second query result is obtained according to the relevance of the phrase level, including: performing activities on the relevance matrix of the first word level The average pooling operation with a window size of 2*2 to obtain the first matrix; the maximum pooling operation in the row direction is performed on the first matrix to obtain the second intermediate vector
  • idf i is the inverse text frequency index value of the i-th word in the query sentence
  • represents the total number of documents in the corpus
  • df i represents the number of documents containing the i-th word in the corpus.
  • an attention mechanism is introduced to the query sentence and the document sentence, and according to the query sentence and the document after the introduction of the attention mechanism
  • the relevance of the sentence, the third query result is obtained, including: determining the vector expression of the vocabulary of the professional field; extracting the vocabulary of the professional field in the query sentence and document sentence to form a new vector expression; calculating the word level of the query sentence and the vocabulary of the professional field
  • the attention mechanism is introduced to the vector expression of the query sentence and the document sentence; according to the relevance of the query sentence and the document sentence after the attention mechanism is introduced, it is obtained The first query result.
  • a technical solution adopted in the present application is to provide a text query device, which includes a processor and a memory, the memory stores program data, and the processor is used to execute the program data to implement the above-mentioned method.
  • a technical solution adopted in this application is to provide a computer storage medium in which program data is stored, and the program data is used to implement the above-mentioned method when the program data is executed by a processor.
  • the text query method provided in this application includes: based on the relevance of the query statement and the first word level of the document statement, the attention mechanism is introduced to the query statement and the document statement, and the query statement and the document statement are introduced based on the attention mechanism. Relevance, get the first query result; get the phrase-level relevance of the query sentence and the document sentence according to the relevance of the first word level, and get the second query result according to the phrase-level relevance; based on the specialty in the query sentence
  • the second word level correlation of the domain vocabulary and the professional domain vocabulary in the document sentence, the attention mechanism is introduced to the query sentence and the document sentence, and the third is obtained according to the correlation between the query sentence and the document sentence after the attention mechanism is introduced.
  • the first aspect is to compare words and phrases at the two levels, which can have a better recognition ability for the documents in the professional field.
  • the second aspect is to add professional vocabulary to the recognition, which effectively solves the existing problems.
  • the search network has a lack of professional knowledge background.
  • FIG. 1 is a schematic flowchart of an embodiment of a text query method provided by this application
  • FIG. 2 is a schematic diagram of the flow of step 11 in FIG. 1;
  • FIG. 3 is a schematic flowchart of step 114 in FIG. 2;
  • FIG. 4 is a schematic diagram of the flow of step 12 in FIG. 1;
  • FIG. 5 is a schematic diagram of the flow of step 13 in FIG. 1;
  • FIG. 6 is a schematic structural diagram of an embodiment of a text query device provided by the present application.
  • FIG. 7 is a schematic structural diagram of an embodiment of a computer storage medium provided by the present application.
  • Fig. 1 is a schematic flowchart of an embodiment of a text query method provided by the present application, and the method includes:
  • Step 11 Based on the relevance of the first word level of the query statement and the document statement, an attention mechanism is introduced to the query statement and the document statement, and the first query is obtained according to the relevance of the query statement and the document statement after the attention mechanism is introduced result.
  • the word-level correlation matrix is first obtained through the vector inner product, and the attention mechanism is used to obtain the vector expression of each word on the basis of the correlation matrix. Then, the vector expression of each word in the query sentence is obtained through the maximum pooling operation. Finally, the inverse text frequency index is used to perform a weighted sum to obtain the final score.
  • the use of the attention mechanism can make words more sensitive to related words, which is conducive to improving the results of document retrieval.
  • step 11 may specifically include the following steps:
  • Step 111 Determine the vector expression of the query sentence and the document sentence.
  • k represents the dimension of the word embedding vector
  • n represents the number of segmented words in the query sentence sequence
  • m represents the number of segmented words in the document sentence.
  • Step 112 Calculate the word-level correlation matrix of the query sentence and the document sentence.
  • Step 113 Based on the word-level correlation matrix of the query sentence and the document sentence, an attention mechanism is introduced to the vector expression of the query sentence and the document sentence.
  • Step 114 Obtain the first query result according to the correlation between the query sentence and the document sentence after the attention mechanism is introduced.
  • step 114 may specifically include the following steps:
  • Step 1141 Calculate the Hadamard product of the two vectors before and after each word in the query sentence and the document sentence is introduced into the attention mechanism.
  • means multiplying two values.
  • Step 1142 Splicing the two vectors and Hadamard products before and after the attention mechanism is introduced for each word in the query sentence and the document sentence to form a splicing vector.
  • Step 1143 Calculate the correlation matrix between the stitching vector of the query sentence and the stitching vector of the document sentence.
  • Step 1144 Perform a pooling operation on the correlation matrix of the stitching vector of the query sentence and the stitching vector of the document sentence to obtain the first query result.
  • a pooling operation is performed on the correlation matrix of the splicing vector of the query sentence and the splicing vector of the document sentence to obtain the first intermediate vector
  • Matrix The maximum value of the i-th row in.
  • idf i is the inverse text frequency index value of the i-th word in the query sentence
  • represents the total number of documents in the corpus
  • df i represents the number of documents containing the i-th word in the corpus.
  • Step 12 According to the relevance of the first word level, obtain the phrase level relevance of the query sentence and the document sentence, and obtain the second query result according to the phrase level relevance.
  • the word-level correlation matrix obtained by the vector inner product is subjected to an average pooling operation with a sliding window of 2*2, and then the maximum pooling operation is performed to obtain a phrase-level vector expression, and finally the inverse text frequency index is also used Perform a weighted sum to get the final score at the phrase level.
  • step 12 may specifically include:
  • Step 121 Perform an average pooling operation with an active window size of 2*2 on the correlation matrix of the first word level to obtain the first matrix.
  • the relevance matrix of the first word level calculated before is recorded as The calculation formula of the first matrix is as follows:
  • Matrix The size of the value in row wi and column wj can be seen from the size of the matrix.
  • the range of values for wi and wj is:
  • Step 122 Perform a maximum pooling operation in the row direction on the first matrix to obtain a second intermediate vector
  • Matrix The maximum value of the i-th row in.
  • Step 123 Calculate the second score using the following formula:
  • idf i is the inverse text frequency index value of the i-th word in the query sentence
  • represents the total number of documents in the corpus
  • df i represents the number of documents containing the i-th word in the corpus.
  • Step 13 Based on the second word level correlation of the professional domain vocabulary in the query sentence and the professional domain vocabulary in the document sentence, an attention mechanism is introduced to the query sentence and the document sentence, and based on the query sentence and the query sentence after introducing the attention mechanism. The relevance of the document sentence, the third query result is obtained.
  • the words in the dictionary are converted into vector representations using the TransE algorithm. Find out the words contained in the knowledge dictionary in the query sentence and the document to be retrieved to form a vector expression, and then also obtain the correlation matrix through the vector inner product, and use the attention mechanism to obtain the corresponding vector expression based on the correlation matrix. Finally, the final score is obtained through average pooling and maximum pooling.
  • step 13 may specifically include:
  • Step 131 Determine the vector expression of the vocabulary in the professional field.
  • the legal professional vocabulary is taken as an example.
  • Step 132 Extract the professional domain vocabulary from the query sentence and the document sentence to form a new vector expression.
  • k represents the dimension of the vector after transE embedding the elements in the professional vocabulary
  • n represents the number of the segmented words in the query sentence sequence in the professional field vocabulary
  • m represents the segmented words in the document sentence in the professional field
  • the number of words in the vocabulary Represents the vector expression of the i-th term of the professional vocabulary in the query sequence
  • Step 133 Calculate the word-level correlation matrix of the query sentence and the vocabulary of the professional field.
  • Step 134 Based on the word-level correlation matrix of the query sentence and the document sentence, an attention mechanism is introduced to the vector expression of the query sentence and the document sentence.
  • Step 135 Obtain the first query result according to the correlation between the query sentence and the document sentence after the attention mechanism is introduced.
  • the subsequent steps 133 to 135 can adopt a similar way as in the above step 11: the matrix with Introduce the attention mechanism to get the vector with Calculate the correlation again to get
  • Step 14 Determine the final query result based on the query statement according to the first query result, the second query result, and the third query result.
  • the first score, the second score, and the third score can be averaged to obtain the final score to determine whether the query sentence is related to the document sentence, or the first score, the second score, and the third score can also be calculated.
  • the score is summed according to a certain weight to get the final score, and there is no restriction here.
  • the text query method provided in this embodiment includes: based on the relevance of the query sentence and the first word level of the document sentence, an attention mechanism is introduced to the query sentence and the document sentence, and according to the introduction of the attention mechanism The relevance of the query sentence and the document sentence to obtain the first query result; according to the relevance of the first word level, the phrase-level relevance of the query sentence and the document sentence is obtained, and the second query result is obtained according to the relevance of the phrase level ; Based on the second word level relevance of the professional domain vocabulary in the query sentence and the professional domain vocabulary in the document sentence, the attention mechanism is introduced to the query sentence and the document sentence, and according to the query sentence and the document sentence after the introduction of the attention mechanism The third query result is obtained; according to the first query result, the second query result, and the third query result, the final query result based on the query statement is determined.
  • the first aspect is to compare words and phrases at the two levels, which can have a better recognition ability for the documents in the professional field.
  • the second aspect is to add professional vocabulary to the recognition, which effectively solves the existing problems.
  • the search network has a lack of professional knowledge background.
  • the text query device 60 includes a processor 61 and a memory 62, wherein the memory 62 stores program data, and the processor 61 is used for The program data is executed to achieve the following method steps:
  • an attention mechanism is introduced to the query statement and the document statement, and the first query result is obtained according to the relevance of the query statement and the document statement after the attention mechanism is introduced;
  • Relevance of the first word level obtains the phrase-level relevance of the query sentence and the document sentence, and obtains the second query result according to the phrase-level relevance;
  • the second word-level relevance of, the attention mechanism is introduced to the query sentence and the document sentence, and the third query result is obtained according to the relevance of the query sentence and the document sentence after the attention mechanism is introduced; according to the first query result, the first query The second query result and the third query result determine the final query result based on the query statement.
  • FIG. 7 is a schematic structural diagram of an embodiment of a computer storage medium provided by the present application.
  • the computer storage medium 70 stores program data 71, and the program data 71 is used to implement the following method when executed by a processor step:
  • an attention mechanism is introduced to the query statement and the document statement, and the first query result is obtained according to the relevance of the query statement and the document statement after the attention mechanism is introduced;
  • Relevance of the first word level obtains the phrase-level relevance of the query sentence and the document sentence, and obtains the second query result according to the phrase-level relevance;
  • the second word-level relevance of, the attention mechanism is introduced to the query sentence and the document sentence, and the third query result is obtained according to the relevance of the query sentence and the document sentence after the attention mechanism is introduced; according to the first query result, the first query The second query result and the third query result determine the final query result based on the query statement.
  • the program data when executed, it is also used to realize: determine the vector expression of the query sentence and the document sentence; calculate the words of the query sentence and the document sentence Level correlation matrix; based on the word-level correlation matrix of query sentences and document sentences, an attention mechanism is introduced to the vector expression of query sentences and document sentences; according to the relevance of query sentences and document sentences after introducing the attention mechanism, Get the first query result.
  • determining the vector expression of the query sentence and the document sentence includes: performing word segmentation and word embedding processing on the query sentence and the document sentence to obtain the vector expression Q n*k of the query sentence and the vector expression D m*k of the document sentence, where , Among them, k represents the dimension of the word embedding vector, n represents the number of segmented words in the query sentence sequence, and m represents the number of segmented words in the document sentence. Represents the vector expression of the i-th word in the query sequence, Represents the vector expression of the i-th word in the document.
  • calculating the word-level correlation matrix of the query sentence and the document sentence includes: calculating the word-level correlation matrix M n*m of the query sentence and the document sentence, where the i-th row in the matrix M n*m is the jth row
  • the element M ij of the column is calculated using the following formula: among them, Represents the vector corresponding to the i-th word in the query sequence, Represents the vector corresponding to the jth word in the document sentence.
  • the attention mechanism is introduced to the vector expression of the query sentence and the document sentence, including: using the following formula to calculate the vector expression of the query sentence and the document sentence after the attention mechanism is introduced : among them, Represents the vector after the i-th word in the query sequence is introduced into the attention mechanism, Represents the vector of the j-th word in the document after the attention mechanism is introduced.
  • the first query result is obtained, including: calculating the Hadamard product of the two vectors before and after each word in the query sentence and the document sentence is introduced into the attention mechanism; The two vectors and Hadamard products before and after the attention mechanism is introduced for each word in the query sentence and the document sentence are spliced to form a splicing vector; the correlation matrix of the splicing vector of the query sentence and the splicing vector of the document sentence is calculated; the query sentence Pooling is performed on the correlation matrix of the stitching vector of and the stitching vector of the document sentence to obtain the first query result.
  • the pooling operation is performed on the correlation matrix of the splicing vector of the query sentence and the splicing vector of the document sentence to obtain the first query result, including: pooling the correlation matrix of the splicing vector of the query sentence and the splicing vector of the document sentence Operation to get the first intermediate vector
  • idf i is the inverse text frequency index value of the i-th word in the query sentence
  • represents the total number of documents in the corpus
  • df i represents the number of documents containing the i-th word in the corpus.
  • the program data when executed, it is also used to implement: performing an active window size of 2*2 on the correlation matrix of the first word level Average the pooling operation to obtain the first matrix; perform the maximum pooling operation in the row direction on the first matrix to obtain the second intermediate vector
  • idf i is the inverse text frequency index value of the i-th word in the query sentence
  • represents the total number of documents in the corpus
  • df i represents the number of documents containing the i-th word in the corpus.
  • the program data when executed, it is also used to realize: determine the vector expression of the vocabulary of the professional field; combine the professional field in the query sentence and the document sentence The vocabulary is extracted to form a new vector expression; the word-level correlation matrix of the query sentence and the vocabulary of the professional field is calculated; based on the word-level correlation matrix of the query sentence and the document sentence, attention is drawn to the vector expression of the query sentence and the document sentence Mechanism: According to the relevance of the query statement and the document statement after the attention mechanism is introduced, the first query result is obtained.
  • the disclosed method and device may be implemented in other ways.
  • the device implementation described above is merely illustrative.
  • the division of the modules or units is only a logical function division.
  • there may be other divisions for example, multiple units or components may be Combined or can be integrated into another system, or some features can be ignored or not implemented.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the objectives of the solutions of this embodiment.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un procédé d'interrogation de texte, un dispositif d'interrogation de texte, et un support de stockage informatique ; ledit procédé d'interrogation de texte comprenant : sur la base de la pertinence d'un premier niveau de mot d'une instruction d'interrogation et d'une instruction de document, l'introduction d'un mécanisme d'attention dans l'instruction d'interrogation et l'instruction de document, et, en fonction de la pertinence de l'instruction d'interrogation et de l'instruction de document, l'obtention d'un premier résultat d'interrogation (11) ; en fonction de la pertinence du premier niveau de mot, l'obtention de la pertinence du niveau d'expression de l'instruction d'interrogation et de l'instruction de document, et, en fonction de la pertinence du niveau d'expression, l'obtention d'un deuxième résultat d'interrogation (12) ; sur la base de la pertinence du second niveau de mot d'un terme de champ professionnel dans l'expression d'interrogation et d'un terme de champ professionnel dans l'instruction de document, l'introduction d'un mécanisme d'attention dans l'instruction d'interrogation et l'instruction de document, et, en fonction de la pertinence de l'instruction d'interrogation et de l'instruction de document, l'obtention d'un troisième résultat d'interrogation (13) ; en fonction du premier résultat d'interrogation, du deuxième résultat d'interrogation et du troisième résultat d'interrogation, la détermination d'un résultat d'interrogation final sur la base de l'instruction d'interrogation (14). Au moyen du procédé décrit, il est possible d'améliorer la précision et l'efficacité d'interrogation de texte.
PCT/CN2020/128801 2019-11-14 2020-11-13 Procédé d'interrogation de texte, dispositif d'interrogation de texte, et support de stockage informatique WO2021093871A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911114274.2 2019-11-14
CN201911114274.2A CN111159331B (zh) 2019-11-14 2019-11-14 文本的查询方法、文本查询装置以及计算机存储介质

Publications (1)

Publication Number Publication Date
WO2021093871A1 true WO2021093871A1 (fr) 2021-05-20

Family

ID=70555994

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/128801 WO2021093871A1 (fr) 2019-11-14 2020-11-13 Procédé d'interrogation de texte, dispositif d'interrogation de texte, et support de stockage informatique

Country Status (2)

Country Link
CN (1) CN111159331B (fr)
WO (1) WO2021093871A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111159331B (zh) * 2019-11-14 2021-11-23 中国科学院深圳先进技术研究院 文本的查询方法、文本查询装置以及计算机存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160004784A1 (en) * 2014-07-04 2016-01-07 Samsung Electronics Co., Ltd. Method of providing relevant information and electronic device adapted to the same
CN107844469A (zh) * 2017-10-26 2018-03-27 北京大学 基于词向量查询模型的文本简化方法
CN108491433A (zh) * 2018-02-09 2018-09-04 平安科技(深圳)有限公司 聊天应答方法、电子装置及存储介质
CN109063174A (zh) * 2018-08-21 2018-12-21 腾讯科技(深圳)有限公司 查询答案的生成方法及装置、计算机存储介质、电子设备
CN111159331A (zh) * 2019-11-14 2020-05-15 中国科学院深圳先进技术研究院 文本的查询方法、文本查询装置以及计算机存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6026388A (en) * 1995-08-16 2000-02-15 Textwise, Llc User interface and other enhancements for natural language information retrieval system and method
US10031913B2 (en) * 2014-03-29 2018-07-24 Camelot Uk Bidco Limited Method, system and software for searching, identifying, retrieving and presenting electronic documents
CN109472024B (zh) * 2018-10-25 2022-10-11 安徽工业大学 一种基于双向循环注意力神经网络的文本分类方法
CN110347790B (zh) * 2019-06-18 2021-08-10 广州杰赛科技股份有限公司 基于注意力机制的文本查重方法、装置、设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160004784A1 (en) * 2014-07-04 2016-01-07 Samsung Electronics Co., Ltd. Method of providing relevant information and electronic device adapted to the same
CN107844469A (zh) * 2017-10-26 2018-03-27 北京大学 基于词向量查询模型的文本简化方法
CN108491433A (zh) * 2018-02-09 2018-09-04 平安科技(深圳)有限公司 聊天应答方法、电子装置及存储介质
CN109063174A (zh) * 2018-08-21 2018-12-21 腾讯科技(深圳)有限公司 查询答案的生成方法及装置、计算机存储介质、电子设备
CN111159331A (zh) * 2019-11-14 2020-05-15 中国科学院深圳先进技术研究院 文本的查询方法、文本查询装置以及计算机存储介质

Also Published As

Publication number Publication date
CN111159331B (zh) 2021-11-23
CN111159331A (zh) 2020-05-15

Similar Documents

Publication Publication Date Title
WO2020062770A1 (fr) Procédé et appareil de construction de dictionnaire de domaine et dispositif et support d'enregistrement
Zheng et al. Learning to reweight terms with distributed representations
Thakkar et al. Graph-based algorithms for text summarization
Wang et al. Using word embeddings to enhance keyword identification for scientific publications
KR101923650B1 (ko) 문장 임베딩 및 유사 질문 검색을 위한 장치 및 방법
JP5216063B2 (ja) 未登録語のカテゴリを決定する方法と装置
Jabbar et al. Empirical evaluation and study of text stemming algorithms
US20180260381A1 (en) Prepositional phrase attachment over word embedding products
Anupriya et al. LDA based topic modeling of journal abstracts
JP2002510076A (ja) 言語モデルに基づく情報検索および音声認識
CN109783806B (zh) 一种利用语义解析结构的文本匹配方法
CN107992477A (zh) 文本主题确定方法、装置及电子设备
El Mahdaouy et al. Word-embedding-based pseudo-relevance feedback for Arabic information retrieval
CN103646112A (zh) 利用了网络搜索的依存句法的领域自适应方法
KR102059743B1 (ko) 딥러닝 기반의 지식 구조 생성 방법을 활용한 의료 문헌 구절 검색 방법 및 시스템
CN111737997A (zh) 一种文本相似度确定方法、设备及储存介质
CN110929498A (zh) 一种短文本相似度的计算方法及装置、可读存储介质
CN110019474B (zh) 异构数据库中的同义数据自动关联方法、装置及电子设备
Sharma et al. BioAMA: towards an end to end biomedical question answering system
JP5427694B2 (ja) 関連コンテンツ提示装置及びプログラム
WO2021093871A1 (fr) Procédé d'interrogation de texte, dispositif d'interrogation de texte, et support de stockage informatique
Wang et al. A joint chinese named entity recognition and disambiguation system
CN110442674B (zh) 标签传播的聚类方法、终端设备、存储介质及装置
TWI636370B (zh) Establishing chart indexing method and computer program product by text information
CN113868387A (zh) 一种基于改进tf-idf加权的word2vec医疗相似问题检索方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20887870

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20887870

Country of ref document: EP

Kind code of ref document: A1