CN110598078B - Data retrieval method and device, computer-readable storage medium and electronic device - Google Patents

Data retrieval method and device, computer-readable storage medium and electronic device Download PDF

Info

Publication number
CN110598078B
CN110598078B CN201910860221.9A CN201910860221A CN110598078B CN 110598078 B CN110598078 B CN 110598078B CN 201910860221 A CN201910860221 A CN 201910860221A CN 110598078 B CN110598078 B CN 110598078B
Authority
CN
China
Prior art keywords
sentence
vector
retrieved
retrieval
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910860221.9A
Other languages
Chinese (zh)
Other versions
CN110598078A (en
Inventor
祁立
刘梦宇
李元
龙涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jingdong Technology Holding Co Ltd
Original Assignee
Jingdong Technology Holding Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jingdong Technology Holding Co Ltd filed Critical Jingdong Technology Holding Co Ltd
Priority to CN201910860221.9A priority Critical patent/CN110598078B/en
Publication of CN110598078A publication Critical patent/CN110598078A/en
Application granted granted Critical
Publication of CN110598078B publication Critical patent/CN110598078B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention relates to a data retrieval method and device, a computer readable storage medium and electronic equipment, relating to the technical field of information retrieval, wherein the method comprises the following steps: inputting data to be retrieved into a sentence vector encoder to obtain a plurality of sentence vectors, and obtaining a vector matrix to be retrieved according to each sentence vector; respectively inputting sentence vectors of each row in a vector matrix to be retrieved into a database with vector indexes for retrieval to obtain a plurality of candidate retrieval results; calculating Euclidean distances between candidate retrieval results corresponding to the sentence vectors and weights of the sentence vectors in a vector matrix to be retrieved; and obtaining a plurality of target retrieval results according to the candidate retrieval results, the Euclidean distances and the weights, and displaying the target retrieval results according to the word shift distance between the target retrieval results and the data to be retrieved. The embodiment of the invention improves the accuracy of the retrieval result.

Description

Data retrieval method and device, computer-readable storage medium and electronic device
Technical Field
The embodiment of the invention relates to the technical field of information retrieval, in particular to a data retrieval method, a data retrieval device, a computer-readable storage medium and electronic equipment.
Background
With the rapid development of information technology, data on the internet is growing in enormous quantities. At the same time, the redundant data on the network is increasing, and the searching task becomes more difficult for the user who needs to search the information needed by the user on the network. Therefore, how to simply, conveniently and effectively acquire the information that the user wants to know has become a difficult problem.
Most of the existing information retrieval methods are carried out by the following methods: firstly, constructing a knowledge base of candidate answers; secondly, when the user inputs a question, the question which is closest to the question and is stored in the library is found based on the similarity calculation, and then the corresponding answer is returned.
However, the above method has the following drawbacks: because the retrieval scheme is based on the word inverted index, retrieval results with different literal meanings and similar semantemes cannot be recalled, and the accuracy of the retrieval results is low.
Therefore, it is desirable to provide a new data retrieval method and apparatus.
It is to be noted that the information invented in the above background section is only for enhancing the understanding of the background of the present invention, and therefore, may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The present invention is directed to a data retrieval method, a data retrieval apparatus, a computer-readable storage medium, and an electronic device, which overcome at least some of the problems of low accuracy of retrieval results due to limitations and disadvantages of the related art.
According to an aspect of the present disclosure, there is provided a data retrieval method including:
inputting data to be retrieved into a sentence vector encoder to obtain a plurality of sentence vectors, and obtaining a vector matrix to be retrieved according to each sentence vector;
respectively inputting sentence vectors of each line in the vector matrix to be retrieved into a database with vector indexes to be retrieved to obtain a plurality of candidate retrieval results;
calculating Euclidean distances between candidate retrieval results corresponding to the sentence vectors and weights of the sentence vectors in the vector matrix to be retrieved;
and obtaining a plurality of target retrieval results according to the candidate retrieval results, the Euclidean distances and the weights, and displaying the target retrieval results according to the word shift distance between the target retrieval results and the data to be retrieved.
In an exemplary embodiment of the present disclosure, inputting data to be retrieved into a sentence vector encoder to obtain a plurality of sentence vectors includes:
performing word segmentation processing on the data to be retrieved to obtain a plurality of word groups, and inputting each word group into a sentence vector encoder to obtain a plurality of sentence vectors;
wherein, the sentence vector encoder is a supervised sentence embedding model.
In an exemplary embodiment of the present disclosure, the data retrieval method further includes:
calculating the length of each sentence vector;
and filling the sentence vector when the length of the sentence vector is determined not to reach the preset length.
In an exemplary embodiment of the present disclosure, calculating the weight of each sentence vector in the vector matrix to be retrieved includes:
calculating the occurrence frequency of each sentence vector in the vector matrix to be retrieved and the total number of sentence vectors in the vector matrix to be retrieved;
and calculating the weight of each sentence vector in the vector matrix to be retrieved according to the times of the sentence vectors appearing in the vector matrix to be retrieved and the total number of the sentence vectors in the vector matrix to be retrieved.
In an exemplary embodiment of the present disclosure, calculating the euclidean distance between the candidate retrieval result corresponding to each sentence vector and the sentence vector includes:
calculating difference budgeting on the candidate retrieval result corresponding to each sentence vector and the sentence vector to obtain a plurality of difference operation results;
and performing summation operation on the square of each difference operation result, and performing square opening on the operation result of the summation operation to obtain the Euclidean distance between the candidate retrieval result corresponding to each sentence vector and the sentence vector.
In an exemplary embodiment of the present disclosure, the data retrieval method further includes:
obtaining a plurality of weight vectors according to the weight of each sentence vector in the vector matrix to be retrieved and the weight of the candidate retrieval result corresponding to each sentence vector in the target retrieval result;
and obtaining a word shift distance between the target retrieval result and the data to be retrieved according to the Euclidean distance between the candidate retrieval result corresponding to each sentence vector and the weight vector corresponding to each sentence vector.
In an exemplary embodiment of the present disclosure, obtaining a word shift distance between the target search result and the data to be searched according to the document feature of the target search result and the euclidean distance between the candidate search result corresponding to each sentence vector and the sentence vector includes:
carrying out product operation on the Euclidean distance between the candidate retrieval result corresponding to each sentence vector and the weight vector corresponding to each sentence vector to obtain a plurality of product operation results;
and performing summation operation on each product operation result to obtain a plurality of sum operation results, and performing minimization operation on each sum operation result to obtain a word shift distance between each target retrieval result and the data to be retrieved.
According to an aspect of the present disclosure, there is provided a data retrieval apparatus including:
the first processing module is used for inputting the data to be retrieved into the sentence vector encoder to obtain a plurality of sentence vectors and obtaining a vector matrix to be retrieved according to each sentence vector;
the second processing module is used for respectively inputting the sentence vectors of each row in the vector matrix to be retrieved into a database with vector indexes for retrieval to obtain a plurality of candidate retrieval results;
the first calculation module is used for calculating Euclidean distances between candidate retrieval results corresponding to the sentence vectors and weights of the sentence vectors in the vector matrix to be retrieved;
and the third processing module is used for obtaining a plurality of target retrieval results according to the candidate retrieval results, the Euclidean distances and the weights, and displaying the target retrieval results according to the word shift distance between the target retrieval results and the data to be retrieved.
According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a data retrieval method as recited in any one of the above.
According to an aspect of the present disclosure, there is provided an electronic device including:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform any of the data retrieval methods described above via execution of the executable instructions.
On one hand, a plurality of sentence vectors are obtained by inputting data to be retrieved into a sentence vector encoder, and a vector matrix to be retrieved is obtained according to each sentence vector; then, sentence vectors of each line in a vector matrix to be retrieved are respectively input into a database with vector indexes for retrieval to obtain a plurality of candidate retrieval results; then calculating Euclidean distances between candidate retrieval results corresponding to the sentence vectors and weights of the sentence vectors in a vector matrix to be retrieved; finally, a plurality of target retrieval results are obtained according to the candidate retrieval results, the Euclidean distances and the weights, and the target retrieval results are displayed according to the word shifting distance between the target retrieval results and the data to be retrieved, so that the problem that the retrieval results with different literal surfaces and similar semantics cannot be recalled in the prior art is solved, the accuracy of the retrieval results is low, and the accuracy of the retrieval results is improved; on the other hand, a plurality of target retrieval results are obtained according to the candidate retrieval results, the Euclidean distances and the weights, and the target retrieval results are displayed according to the word shifting distance between the target retrieval results and the data to be retrieved, so that the problem that the semantic deviation phenomenon is easy to occur due to the fact that the recalled results are too wide is solved, and the accuracy of the retrieval results is further improved; on the other hand, sentence vectors of each row in the vector matrix to be retrieved are respectively input into the database with the vector index for retrieval to obtain a plurality of candidate retrieval results, and the retrieval efficiency is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
Fig. 1 schematically shows a flow chart of a data retrieval method according to an exemplary embodiment of the present invention.
FIG. 2 schematically illustrates an example diagram of the structure of a supervised sentence embedding model in accordance with an example embodiment of the present invention.
Fig. 3 is a diagram schematically illustrating an example of a structure of a database with vector indexes according to an exemplary embodiment of the present invention.
Fig. 4 schematically shows a flow chart of another data retrieval method according to an exemplary embodiment of the present invention.
Fig. 5 schematically shows a block diagram of a data retrieval apparatus according to an exemplary embodiment of the present invention.
Fig. 6 schematically illustrates an electronic device for implementing the above-described data retrieval method according to an exemplary embodiment of the present invention.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the invention.
Furthermore, the drawings are merely schematic illustrations of the invention and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The information retrieval has very wide application in the field of knowledge question answering. The general process of retrieving an answer system may include: firstly, constructing a candidate answer knowledge base; secondly, when the user inputs a question, the question which is closest to the question and is stored in the library is found based on the similarity calculation, and then the corresponding answer is returned. The method mainly comprises the following steps:
1) constructing a candidate answer index set; 2) after receiving the query, preliminarily selecting some candidate answers; 3) matching the query and the answer, and then ranking; 4) finally, topk (k previous) answers are returned.
However, the traditional retrieval is based on word inverted index retrieval, and results with different word surfaces and similar semantics cannot be recalled; in addition, the current search method based on semantic vectors can recall results with similar semantics, but the recalled results are too wide, and the semantic deviation phenomenon is easy to occur.
In the present exemplary embodiment, a data retrieval method is first provided, where the method may be operated in a server, a server cluster, a cloud server, or the like, and may also be operated in an equipment terminal; of course, those skilled in the art may also run the method of the present invention on other platforms as needed, which is not limited in this exemplary embodiment. Referring to fig. 1, the data retrieval method includes the steps of:
step S110, inputting data to be retrieved into a sentence vector encoder to obtain a plurality of sentence vectors, and obtaining a vector matrix to be retrieved according to the sentence vectors.
And S120, respectively inputting the sentence vectors of each row in the vector matrix to be retrieved into a database with vector indexes for retrieval to obtain a plurality of candidate retrieval results.
Step S130, calculating Euclidean distances between candidate retrieval results corresponding to the sentence vectors and weights of the sentence vectors in the vector matrix to be retrieved.
Step S140, a plurality of target retrieval results are obtained according to each candidate retrieval result, each Euclidean distance and each weight, and each target retrieval result is displayed according to the word shift distance between each target retrieval result and the data to be retrieved.
In the data retrieval method, on one hand, a plurality of sentence vectors are obtained by inputting data to be retrieved into a sentence vector encoder, and a vector matrix to be retrieved is obtained according to each sentence vector; then, respectively inputting sentence vectors of each row in a vector matrix to be retrieved into a database with vector indexes for retrieval to obtain a plurality of candidate retrieval results; then calculating Euclidean distances between candidate retrieval results corresponding to the sentence vectors and weights of the sentence vectors in a vector matrix to be retrieved; finally, a plurality of target retrieval results are obtained according to the candidate retrieval results, the Euclidean distances and the weights, and the target retrieval results are displayed according to the word shifting distance between the target retrieval results and the data to be retrieved, so that the problem that the retrieval results with different literal surfaces and similar semantics cannot be recalled in the prior art is solved, the accuracy of the retrieval results is low, and the accuracy of the retrieval results is improved; on the other hand, a plurality of target retrieval results are obtained according to the candidate retrieval results, the Euclidean distances and the weights, and the target retrieval results are displayed according to the word shifting distance between the target retrieval results and the data to be retrieved, so that the problem that the semantic deviation phenomenon is easy to occur due to the fact that the recalled results are too wide is solved, and the accuracy of the retrieval results is further improved; on the other hand, sentence vectors of each row in the vector matrix to be retrieved are respectively input into the database with the vector index for retrieval to obtain a plurality of candidate retrieval results, and the retrieval efficiency is improved.
Hereinafter, each step involved in the data retrieval method of the exemplary embodiment of the present invention will be explained and explained in detail with reference to the drawings.
In step S110, the data to be retrieved is input to a sentence vector encoder to obtain a plurality of sentence vectors, and a vector matrix to be retrieved is obtained according to each sentence vector.
In this example embodiment, first, a word segmentation process is performed on the data to be retrieved to obtain a plurality of word groups, and then, each of the word groups is input to a sentence vector encoder to obtain a plurality of sentence vectors; wherein, the sentence vector encoder is a supervised sentence embedding model. In detail:
first, a sentence vector encoder is explained and explained. Referring to fig. 2, the sentence vector encoder may be, for example, a supervised sentence embedding model (inferset model), whose main structure may include BiLSTM and Max-Pooling. The model may include, among other things, a first input (e.g., may be a sensor encoder with prediction input)201, a second input (sensor encoder with perspective input)202, an encoding layer 203, a fully connected layer 204, a pooling layer 205, and an output 206. Specifically, according to the embedding coding of each token of the BiLSTM layer, an encoding matrix of [ max _ seq _ len, embedding size ] is obtained to represent each sentence. Therefore, sentences can be effectively represented, and semantic loss in retrieval caused by using one sentence embedding is avoided. It should be noted that, the first input and the second input are paired inputs, and the second input may be a word similar to the first input or a word opposite to the first input.
Further, the insersent model can be trained by using sentences composed of the existing synonym forest (antisense word forest), the common synonym dictionary (antisense word dictionary) obtained by encyclopedia entry sorting conversion or other Chinese common synonym dictionaries (antisense word dictionaries), so as to obtain the trained supervised sentence embedding model.
Next, the explanation will be made with respect to step S110 described above. Firstly, a word segmentation tool (optionally a word segmentation tool, which is not specially limited in this example) can be used to perform word segmentation on data to be retrieved to obtain a plurality of word groups; then, each word group is input into a sentence vector encoder to obtain a plurality of sentence vectors. It should be further added here that, in order to avoid subsequent influence on the search result due to the word vector short-term segment, the method may further include: calculating the length of each sentence vector; and when the length of the sentence vector does not reach the preset length, filling the sentence vector. For example, the length of a sentence vector may be set to a fixed length M, and when the length of a sentence vector does not reach the length, it may be padded. Specifically, the padding may be performed by using 0, or may be performed by using other characters, which is not particularly limited in this example. It should be noted that, since the sentence vector encoder is trained by using the existing synonym forest (antonym forest), the common synonym dictionary (antonym dictionary) obtained by sorting and converting encyclopedic entries, or other chinese common synonym dictionary (antonym dictionary), the obtained sentence vector may include a sentence vector having synonyms with the phrases in the data to be retrieved. Therefore, the problem that in the prior art, retrieval results with different literal and similar semantics cannot be recalled, and the accuracy of the retrieval results is low is solved.
Further, after sentence vectors are obtained, the sentence vectors can be combined to obtain a vector matrix to be retrieved.
In step S120, the sentence vectors in each row of the vector matrix to be retrieved are respectively input into the database with vector indexes for retrieval, so as to obtain a plurality of candidate retrieval results.
In the present exemplary embodiment, first, a database with a vector index is explained and explained. Referring to fig. 3, the database with vector index may be, for example, an index of sentence vector using Annoy. The Annoy is a C + +/Python tool which is an open source of Spotify and used for approximate nearest neighbor query, the memory usage is optimized, indexes can be stored or loaded in a hard disk, and the query efficiency is guaranteed. And, Annoy is an open source library that approximates nearest neighbors in high dimensional space. Annoy constructs a binary tree (shown in reference to fig. 2), whose leaf nodes may represent the set of vectors assigned to the node; and, the query time is o (logn).
Further, the sentence vectors of each row in the vector matrix to be retrieved may be respectively input into a database with vector indexes for retrieval, and a plurality of candidate retrieval results may be obtained. It should be added here that, since the leaf nodes of the binary tree may represent the vector set allocated to the node, based on the input sentence vector of each row, matching may be performed from the root node until all the leaf nodes are matched, and then the combination of the candidate search results of the sentence vector with changed rows is returned as the above candidate search result. By the method, the query efficiency can be improved, and the problem that results with different literal and similar semantics cannot be recalled and the accuracy of retrieval results is low can be avoided.
In step S130, an euclidean distance between the candidate search result corresponding to each sentence vector and the sentence vector and a weight of each sentence vector in the vector matrix to be searched are calculated.
In this exemplary embodiment, first, calculating the euclidean distance between the candidate search result corresponding to each sentence vector and the sentence vector may specifically include: firstly, carrying out difference budget calculation on candidate retrieval results corresponding to each sentence vector and the sentence vectors to obtain a plurality of difference operation results; and secondly, performing summation operation on the square of each difference operation result, and performing square opening operation on the operation result of the summation operation to obtain the Euclidean distance between the candidate retrieval result corresponding to each sentence vector and the sentence vector.
Such as, for example,
Figure BDA0002199524930000091
wherein, y 1 ,y 2 ,...,y n For a set of vectors, x, included in the candidate search results 1 ,x 2 ,...,x n Set of vectors, dist, included in sentence vectors i For the ith sentence vector and its corresponding candidate search nodeEuclidean Distance between fruits (Euclidean Distance).
Secondly, calculating the weight of each sentence vector in the vector matrix to be retrieved, which specifically includes: calculating the occurrence frequency of each sentence vector in the vector matrix to be retrieved and the total number of sentence vectors in the vector matrix to be retrieved; and calculating the weight of each sentence vector in the vector matrix to be retrieved according to the times of the sentence vectors appearing in the vector matrix to be retrieved and the total number of the sentence vectors in the vector matrix to be retrieved.
Such as, for example,
Figure BDA0002199524930000092
wherein, d i The weight of the ith sentence vector in a vector matrix to be retrieved; c. C i The times of the ith sentence vector appearing in the vector matrix to be retrieved; and N is the total number of sentence vectors in the vector matrix to be retrieved.
In step S140, a plurality of target search results are obtained according to each candidate search result, each euclidean distance, and each weight, and each target search result is displayed according to a word shift distance between each target search result and the data to be searched.
In this exemplary embodiment, after obtaining the euclidean distance and the weight, the candidate retrieval results may be combined according to an order of occurrence of each sentence vector in the data to be retrieved, a candidate retrieval result corresponding to each sentence vector, a euclidean distance between each sentence vector and its corresponding candidate retrieval result, and a weight of each sentence vector in the vector matrix to be retrieved, so as to obtain a plurality of target retrieval results. Then, displaying each target retrieval according to the word shift distance between each target retrieval result and the data to be retrieved; for example, the closer the word shift distance, the further forward the display.
It should be added that, the target search results may be sorted according to the word shift distance, and then displayed in sequence according to the number of the set search results that can be displayed, so as to be convenient for viewing.
Fig. 4 schematically shows another data retrieval method according to an exemplary embodiment of the present invention. Referring to fig. 4, the data retrieving method may further include step S410 and step S420, which will be described in detail below.
In step S410, a plurality of weight vectors are obtained according to the weight of each sentence vector in the vector matrix to be retrieved and the weight of the candidate retrieval result corresponding to each sentence vector in the target retrieval result.
In this exemplary embodiment, weights of each sentence vector in a vector matrix to be retrieved and weights of candidate retrieval results corresponding to each sentence vector in a target retrieval result may be directly combined to obtain a plurality of weight vectors; other ways are also possible, such as weighted combination or concatenation, etc., which are not specifically limited by this example.
In step S420, a word shift distance between the target search result and the data to be searched is obtained according to the euclidean distance between the candidate search result corresponding to each sentence vector and the weight vector corresponding to each sentence vector.
In this exemplary embodiment, first, a product operation is performed on the euclidean distance between the candidate search result corresponding to each sentence vector and the sentence vector, and the weight vector corresponding to each sentence vector to obtain a plurality of product operation results; and secondly, performing summation operation on each product operation result to obtain a plurality of sum operation results, and performing minimization operation on each sum operation result to obtain a word shift distance between each target retrieval result and the data to be retrieved. In detail:
Figure BDA0002199524930000111
wherein, T ij Is a weight vector; d i And d' j Respectively, a sentence vector and a candidate corresponding to the sentence vectorSelecting the weight of each word in the retrieval result; and d' j And d is calculated i Similarly, no further description is provided herein. Furthermore, the scores of all the results are subjected to secondary normalization operation (in query and returned result dimensions), so that the constraint conditions of the formula are met; and calculating the weight T and the scores of all returned results, calculating to obtain a final WMD score, and sorting according to the scores to return the final result. The calculating method adopts the score returned by retrieval, avoids the optimization process of the traditional WMD solving process and greatly improves the calculating efficiency of the WMD algorithm.
The present disclosure also provides a data retrieval device. Referring to fig. 5, the data retrieval apparatus may include a first processing module 510, a second processing module 520, a first calculation module 530, and a third processing module 540. Wherein:
the first processing module 510 may be configured to input data to be retrieved into a sentence vector encoder to obtain a plurality of sentence vectors, and obtain a vector matrix to be retrieved according to each sentence vector.
The second processing module 520 may be configured to input the sentence vectors in each row of the vector matrix to be retrieved into a database with a vector index, and retrieve a plurality of candidate retrieval results.
The first calculating module 530 may be configured to calculate a euclidean distance between a candidate search result corresponding to each sentence vector and the sentence vector, and a weight of each sentence vector in the vector matrix to be searched.
The third processing module 540 may be configured to obtain a plurality of target search results according to each candidate search result, each euclidean distance, and each weight, and display each target search result according to a word shift distance between each target search result and the data to be searched.
In an exemplary embodiment of the present disclosure, inputting data to be retrieved into a sentence vector encoder to obtain a plurality of sentence vectors includes:
performing word segmentation processing on the data to be retrieved to obtain a plurality of word groups, and inputting each word group into a sentence vector encoder to obtain a plurality of sentence vectors; wherein, the sentence vector encoder is a supervised sentence embedding model.
In an exemplary embodiment of the present disclosure, the data retrieval apparatus further includes:
and the second calculation module can be used for calculating the length of each sentence vector.
The filling module may be configured to fill the sentence vector when it is determined that the length of the sentence vector does not reach a preset length.
In an exemplary embodiment of the present disclosure, calculating the weight of each sentence vector in the vector matrix to be retrieved includes:
calculating the occurrence frequency of each sentence vector in the vector matrix to be retrieved and the total number of sentence vectors in the vector matrix to be retrieved;
and calculating the weight of each sentence vector in the vector matrix to be retrieved according to the times of the sentence vectors appearing in the vector matrix to be retrieved and the total number of the sentence vectors in the vector matrix to be retrieved.
In an exemplary embodiment of the present disclosure, calculating the euclidean distance between the candidate retrieval result corresponding to each sentence vector and the sentence vector includes:
calculating difference budgets of the candidate retrieval results corresponding to the sentence vectors and the sentence vectors to obtain a plurality of difference operation results;
and performing summation operation on the square of each difference operation result, and performing square opening operation on the operation result of the summation operation to obtain the Euclidean distance between the candidate retrieval result corresponding to each sentence vector and the sentence vector.
In an exemplary embodiment of the present disclosure, the data retrieval apparatus further includes:
the weight vector calculation module may be configured to obtain a plurality of weight vectors according to the weight of each sentence vector in the vector matrix to be retrieved and the weight of the candidate retrieval result corresponding to each sentence vector in the target retrieval result;
the word shift distance calculation module may be configured to obtain a word shift distance between the target search result and the data to be searched according to an euclidean distance between the candidate search result corresponding to each sentence vector and the sentence vector and a weight vector corresponding to each sentence vector.
In an exemplary embodiment of the present disclosure, obtaining a word shift distance between the target search result and the data to be searched according to the document feature of the target search result and the euclidean distance between the candidate search result corresponding to each sentence vector and the sentence vector includes:
carrying out product operation on the Euclidean distance between the candidate retrieval result corresponding to each sentence vector and the weight vector corresponding to each sentence vector to obtain a plurality of product operation results;
and performing summation operation on each product operation result to obtain a plurality of sum operation results, and performing minimization operation on each sum operation result to obtain a word shift distance between each target retrieval result and the data to be retrieved.
The specific details of each module in the data retrieval device have been described in detail in the corresponding data retrieval method, and therefore are not described herein again.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the invention. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Moreover, although the steps of the methods of the present invention are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, and may also be implemented by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to make a computing device (which can be a personal computer, a server, a mobile terminal, or a network device, etc.) execute the method according to the embodiment of the present invention.
In an exemplary embodiment of the present invention, there is also provided an electronic device capable of implementing the above method.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
An electronic device 600 according to this embodiment of the invention is described below with reference to fig. 6. The electronic device 600 shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 6, the electronic device 600 is embodied in the form of a general purpose computing device. The components of the electronic device 600 may include, but are not limited to: the at least one processing unit 610, the at least one memory unit 620, and a bus 630 that couples the various system components including the memory unit 620 and the processing unit 610.
Wherein the storage unit stores program code that is executable by the processing unit 610 to cause the processing unit 610 to perform steps according to various exemplary embodiments of the present invention as described in the above section "exemplary methods" of the present specification. For example, the processing unit 610 may perform step S110 as shown in fig. 1: inputting data to be retrieved into a sentence vector encoder to obtain a plurality of sentence vectors, and obtaining a vector matrix to be retrieved according to each sentence vector; step S120: respectively inputting sentence vectors of each row in the vector matrix to be retrieved into a database with vector indexes for retrieval to obtain a plurality of candidate retrieval results; step S130: calculating Euclidean distances between candidate retrieval results corresponding to the sentence vectors and weights of the sentence vectors in the vector matrix to be retrieved; step S140: and obtaining a plurality of target retrieval results according to the candidate retrieval results, the Euclidean distances and the weights, and displaying the target retrieval results according to the word shift distance between the target retrieval results and the data to be retrieved.
The storage unit 620 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)6201 and/or a cache memory unit 6202, and may further include a read-only memory unit (ROM) 6203.
The memory unit 620 may also include a program/utility 6204 having a set (at least one) of program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 630 can be any bus representing one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 600 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 600, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 600 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interface 650. Also, the electronic device 600 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 660. As shown, the network adapter 660 communicates with the other modules of the electronic device 600 over the bus 630. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to make a computing device (which can be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the embodiment of the present invention.
In an exemplary embodiment of the present invention, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, the various aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the above section "exemplary method" of this description, when said program product is run on said terminal device.
The program product for implementing the above method according to the embodiment of the present invention may employ a portable compact disc read only memory (CD-ROM) and include program codes, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
Furthermore, the above-described figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

Claims (10)

1. A method of data retrieval, comprising:
inputting data to be retrieved into a sentence vector encoder to obtain a plurality of sentence vectors, and obtaining a vector matrix to be retrieved according to each sentence vector; the sentence vector encoder is a supervised sentence embedding model, and sentence pairs required by training the supervised sentence embedding model are formed by a synonym forest, an antisense forest, a synonym dictionary and an antisense dictionary;
respectively inputting sentence vectors of each row in the vector matrix to be retrieved into a database with vector indexes for retrieval to obtain a plurality of candidate retrieval results; wherein, the database with vector index is used for indexing sentence vector by using Annoy; the candidate retrieval result is obtained by matching from the root node of a binary tree constructed by Annoy based on the input sentence vector of each row until all leaf nodes are matched, and then returning to the candidate retrieval result of the sentence vector of the row;
calculating Euclidean distances between candidate retrieval results corresponding to the sentence vectors and weights of the sentence vectors in the vector matrix to be retrieved;
and obtaining a plurality of target retrieval results according to the candidate retrieval results, the Euclidean distances and the weights, and displaying the target retrieval results according to the word shift distance between the target retrieval results and the data to be retrieved.
2. The data retrieval method of claim 1, wherein inputting the data to be retrieved into a sentence vector encoder to obtain a plurality of sentence vectors comprises:
and performing word segmentation on the data to be retrieved to obtain a plurality of word groups, and inputting each word group into a sentence vector encoder to obtain a plurality of sentence vectors.
3. The data retrieval method of claim 2, further comprising:
calculating the length of each sentence vector;
and filling the sentence vector when the length of the sentence vector is determined not to reach the preset length.
4. The data retrieval method of claim 1, wherein calculating the weight of each sentence vector in the vector matrix to be retrieved comprises:
calculating the occurrence frequency of each sentence vector in the vector matrix to be retrieved and the total number of sentence vectors in the vector matrix to be retrieved;
and calculating the weight of each sentence vector in the vector matrix to be retrieved according to the times of the sentence vectors appearing in the vector matrix to be retrieved and the total number of the sentence vectors in the vector matrix to be retrieved.
5. The data retrieval method of claim 1, wherein calculating the euclidean distance between the candidate retrieval result corresponding to each sentence vector and the sentence vector comprises:
calculating difference budgeting on the candidate retrieval result corresponding to each sentence vector and the sentence vector to obtain a plurality of difference operation results;
and performing summation operation on the square of each difference operation result, and performing square opening operation on the operation result of the summation operation to obtain the Euclidean distance between the candidate retrieval result corresponding to each sentence vector and the sentence vector.
6. The data retrieval method of claim 1, wherein the data retrieval method further comprises:
obtaining a plurality of weight vectors according to the weight of each sentence vector in the vector matrix to be retrieved and the weight of the candidate retrieval result corresponding to each sentence vector in the target retrieval result;
and obtaining a word shift distance between the target retrieval result and the data to be retrieved according to the Euclidean distance between the candidate retrieval result corresponding to each sentence vector and the weight vector corresponding to each sentence vector.
7. The data retrieval method of claim 6, wherein obtaining a word shift distance between the target retrieval result and the data to be retrieved according to the document features of the target retrieval result and the Euclidean distance between the candidate retrieval result corresponding to each sentence vector and the sentence vector comprises:
carrying out product operation on the Euclidean distance between the candidate retrieval result corresponding to each sentence vector and the weight vector corresponding to each sentence vector to obtain a plurality of product operation results;
and performing summation operation on each product operation result to obtain a plurality of sum operation results, and performing minimization operation on each sum operation result to obtain a word shift distance between each target retrieval result and the data to be retrieved.
8. A data retrieval device, comprising:
the first processing module is used for inputting data to be retrieved into a sentence vector encoder to obtain a plurality of sentence vectors and obtaining a vector matrix to be retrieved according to each sentence vector; the sentence vector encoder is a supervised sentence embedding model, and a sentence pair required by training the supervised sentence embedding model is formed by a synonym forest, an antisense word forest, a synonym dictionary and an antisense dictionary;
the second processing module is used for respectively inputting the sentence vectors of each row in the vector matrix to be retrieved into a database with vector indexes for retrieval to obtain a plurality of candidate retrieval results; the database with the vector index is used for indexing sentence vectors by using Annoy; the candidate retrieval result is obtained by matching from the root node of a binary tree constructed by Annoy based on the input sentence vector of each line until all the leaf nodes are matched, and then returning to the candidate retrieval result of the sentence vector of the line;
the first calculation module is used for calculating Euclidean distances between candidate retrieval results corresponding to the sentence vectors and weights of the sentence vectors in the vector matrix to be retrieved;
and the third processing module is used for obtaining a plurality of target retrieval results according to the candidate retrieval results, the Euclidean distances and the weights, and displaying the target retrieval results according to the word shift distance between the target retrieval results and the data to be retrieved.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the data retrieval method of any one of claims 1 to 7.
10. An electronic device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the data retrieval method of any one of claims 1-7 via execution of the executable instructions.
CN201910860221.9A 2019-09-11 2019-09-11 Data retrieval method and device, computer-readable storage medium and electronic device Active CN110598078B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910860221.9A CN110598078B (en) 2019-09-11 2019-09-11 Data retrieval method and device, computer-readable storage medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910860221.9A CN110598078B (en) 2019-09-11 2019-09-11 Data retrieval method and device, computer-readable storage medium and electronic device

Publications (2)

Publication Number Publication Date
CN110598078A CN110598078A (en) 2019-12-20
CN110598078B true CN110598078B (en) 2022-09-30

Family

ID=68858881

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910860221.9A Active CN110598078B (en) 2019-09-11 2019-09-11 Data retrieval method and device, computer-readable storage medium and electronic device

Country Status (1)

Country Link
CN (1) CN110598078B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111241242B (en) * 2020-01-09 2023-05-30 北京百度网讯科技有限公司 Method, device, equipment and computer readable storage medium for determining target content
CN111581453B (en) * 2020-03-31 2023-08-15 浪潮通用软件有限公司 Retrieval method, equipment and medium for thin-wall components
CN111639194B (en) * 2020-05-29 2023-08-08 天健厚德网络科技(大连)有限公司 Knowledge graph query method and system based on sentence vector
CN111767373A (en) * 2020-06-30 2020-10-13 平安国际智慧城市科技股份有限公司 Document retrieval method, document retrieval device, electronic equipment and storage medium
CN112560501B (en) * 2020-12-25 2022-02-25 北京百度网讯科技有限公司 Semantic feature generation method, model training method, device, equipment and medium
CN113742471B (en) * 2021-09-15 2023-09-12 重庆大学 Vector retrieval type dialogue method of Pu-Fa question-answering system
CN114840632A (en) * 2022-05-31 2022-08-02 浪潮电子信息产业股份有限公司 Knowledge extraction method, system, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109271505A (en) * 2018-11-12 2019-01-25 深圳智能思创科技有限公司 A kind of question answering system implementation method based on problem answers pair
CN109460549A (en) * 2018-10-12 2019-03-12 北京奔影网络科技有限公司 The processing method and processing device of semantic vector
CN109522394A (en) * 2018-10-12 2019-03-26 北京奔影网络科技有限公司 Knowledge base question and answer system and method for building up
CN109710612A (en) * 2018-12-25 2019-05-03 百度在线网络技术(北京)有限公司 Vector index recalls method, apparatus, electronic equipment and storage medium
CN109766547A (en) * 2018-12-26 2019-05-17 重庆邮电大学 A kind of sentence similarity calculation method
CN110196901A (en) * 2019-06-28 2019-09-03 北京百度网讯科技有限公司 Construction method, device, computer equipment and the storage medium of conversational system

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9588963B2 (en) * 2009-03-18 2017-03-07 Iqintell, Inc. System and method of grouping and extracting information from data corpora
US10372739B2 (en) * 2014-03-17 2019-08-06 NLPCore LLC Corpus search systems and methods
CN107391614A (en) * 2017-07-04 2017-11-24 重庆智慧思特大数据有限公司 A kind of Chinese question and answer matching process based on WMD
CN108304437B (en) * 2017-09-25 2020-01-31 腾讯科技(深圳)有限公司 automatic question answering method, device and storage medium
US10394959B2 (en) * 2017-12-21 2019-08-27 International Business Machines Corporation Unsupervised neural based hybrid model for sentiment analysis of web/mobile application using public data sources
CN109522561B (en) * 2018-11-29 2023-06-16 苏州大学 Question and sentence repeated recognition method, device and equipment and readable storage medium
CN109657212B (en) * 2018-12-13 2022-04-15 武汉大学 Music pattern generation method based on word movement distance and word vector
CN110008465B (en) * 2019-01-25 2023-05-12 网经科技(苏州)有限公司 Method for measuring semantic distance of sentence
CN110083809A (en) * 2019-03-16 2019-08-02 平安城市建设科技(深圳)有限公司 Contract terms similarity calculating method, device, equipment and readable storage medium storing program for executing

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109460549A (en) * 2018-10-12 2019-03-12 北京奔影网络科技有限公司 The processing method and processing device of semantic vector
CN109522394A (en) * 2018-10-12 2019-03-26 北京奔影网络科技有限公司 Knowledge base question and answer system and method for building up
CN109271505A (en) * 2018-11-12 2019-01-25 深圳智能思创科技有限公司 A kind of question answering system implementation method based on problem answers pair
CN109710612A (en) * 2018-12-25 2019-05-03 百度在线网络技术(北京)有限公司 Vector index recalls method, apparatus, electronic equipment and storage medium
CN109766547A (en) * 2018-12-26 2019-05-17 重庆邮电大学 A kind of sentence similarity calculation method
CN110196901A (en) * 2019-06-28 2019-09-03 北京百度网讯科技有限公司 Construction method, device, computer equipment and the storage medium of conversational system

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Magnitude: A Fast, Efficient Universal Vector Embedding Utility Package;Ajay Patel 等;《arXiv:1810.11190v1》;20181026;1-7 *
Supervised Learning of Universal Sentence Representations from Natural Language Inference Data;Alexis Conneau 等;《arXiv:1705.02364v5》;20180708;1-12 *
基于深度学习的中文论述类问题智能问答系统的研究与实现;王英涛;《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》;20181015(第10期);I138-976 *
海量数据相似查找系列2 -- Annoy算法;范涛;《https://blog.csdn.net/hero_fantao/article/details/70245387》;20170419;1-10 *
综合协同过滤和文本相关性的法条智能推荐技术研究;叶菁菁;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20190715(第07期);I138-1491 *

Also Published As

Publication number Publication date
CN110598078A (en) 2019-12-20

Similar Documents

Publication Publication Date Title
CN110598078B (en) Data retrieval method and device, computer-readable storage medium and electronic device
CN108363790B (en) Method, device, equipment and storage medium for evaluating comments
CN107491547B (en) Search method and device based on artificial intelligence
CN108170749B (en) Dialog method, device and computer readable medium based on artificial intelligence
CN109032375B (en) Candidate text sorting method, device, equipment and storage medium
CN110019732B (en) Intelligent question answering method and related device
CN112256860B (en) Semantic retrieval method, system, equipment and storage medium for customer service dialogue content
CN112100354B (en) Man-machine conversation method, device, equipment and storage medium
CN111611452B (en) Method, system, equipment and storage medium for identifying ambiguity of search text
CN111966810B (en) Question-answer pair ordering method for question-answer system
US11461613B2 (en) Method and apparatus for multi-document question answering
CN112818091A (en) Object query method, device, medium and equipment based on keyword extraction
JP2022169743A (en) Information extraction method and device, electronic equipment, and storage medium
CN112434134A (en) Search model training method and device, terminal equipment and storage medium
CN113128431A (en) Video clip retrieval method, device, medium and electronic equipment
CN117114063A (en) Method for training a generative large language model and for processing image tasks
CN115062134A (en) Knowledge question-answering model training and knowledge question-answering method, device and computer equipment
CN111753029A (en) Entity relationship extraction method and device
CN112632255B (en) Method and device for obtaining question and answer results
CN112599211A (en) Medical entity relationship extraction method and device
CN113139558A (en) Method and apparatus for determining a multi-level classification label for an article
CN115203378B (en) Retrieval enhancement method, system and storage medium based on pre-training language model
CN113468311B (en) Knowledge graph-based complex question and answer method, device and storage medium
CN112100335B (en) Problem generation method, model training method, device, equipment and storage medium
CN114742062A (en) Text keyword extraction processing method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant after: Jingdong Digital Technology Holding Co.,Ltd.

Address before: Room 221, 2nd floor, Block C, 18 Kechuang 11th Street, Daxing Economic and Technological Development Zone, Beijing, 100176

Applicant before: JINGDONG DIGITAL TECHNOLOGY HOLDINGS Co.,Ltd.

Address after: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant after: Jingdong Technology Holding Co.,Ltd.

Address before: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant before: Jingdong Digital Technology Holding Co.,Ltd.

GR01 Patent grant
GR01 Patent grant