CN109948044A - Document query based on vector nearest neighbor search - Google Patents

Document query based on vector nearest neighbor search Download PDF

Info

Publication number
CN109948044A
CN109948044A CN201711343103.8A CN201711343103A CN109948044A CN 109948044 A CN109948044 A CN 109948044A CN 201711343103 A CN201711343103 A CN 201711343103A CN 109948044 A CN109948044 A CN 109948044A
Authority
CN
China
Prior art keywords
vector
document
query
retrieval
library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711343103.8A
Other languages
Chinese (zh)
Inventor
李明琴
陈琪
任刚
王井东
韩殿飞
华杰锋
张东擎
罗威
李增中
谭锋
张十
朱素艳
沈徽
张霖涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Microsoft Technology Licensing LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Technology Licensing LLC filed Critical Microsoft Technology Licensing LLC
Priority to CN201711343103.8A priority Critical patent/CN109948044A/en
Priority to PCT/US2018/064146 priority patent/WO2019118253A1/en
Publication of CN109948044A publication Critical patent/CN109948044A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing

Abstract

The technical solution of document query disclosed herein based on vector nearest neighbor search, vector approximation matching retrieval technique is applied in search engine, after carrying out semantic vector respectively by the way that content and web document will be inquired, in the way of vector approximation matching retrieval, obtain the web document close with inquiry content, so as to break through Symbol matching retrieval mode limitation, provide can preferably hold user intention retrieval service.

Description

Document query based on vector nearest neighbor search
Background technique
With the development of network technology, the function of search engine is stronger and stronger, and the content of search is also more and more abundant.It searches Index, which is held up, also provides information for many application programs, is service necessary to many application programs.When high speed information develops In generation, there are the web documents of magnanimity, and web document quantity also increases at high speed.At the same time, user is for information Demand is constantly increasing.How to realize that the retrieval service that more quickly, efficiently and accurately my user is intended to is current Search engine technique facing challenges always.
Summary of the invention
There is provided content of the embodiment of the present invention is to further retouch in will be described in detail below with the form introduction simplified The some concepts stated.The content of present invention is not intended to the key features or essential features of mark claimed subject, also not purport In the range for limiting claimed subject.
The technical solution of the disclosed document query based on vector nearest neighbor search answers vector approximation matching retrieval technique It uses in search engine, after carrying out semantic vector respectively by the way that content and web document will be inquired, is matched using vector approximation The mode of retrieval, obtain with the close web document of inquiry content, so as to break through Symbol matching retrieval mode limitation, The retrieval service that user's intention can preferably be held is provided.
Above description is only the general introduction of disclosed technique scheme, in order to better understand the technological means of the disclosure, And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects, features, and advantages of the present disclosure can It is clearer and more comprehensible, below the special specific embodiment for lifting the disclosure.
Detailed description of the invention
Fig. 1 is the example block diagram of search engine system disclosed by the embodiments of the present invention;
Fig. 2 is the schematic diagram of one of query processing process of web document of the embodiment of the present invention;
Fig. 3 is the structural block diagram of one of web document query processing device of the embodiment of the present invention;
Fig. 4 is two schematic diagram of the query processing process of the web document of the embodiment of the present invention;
Fig. 5 is two structural block diagram of the web document query processing device of the embodiment of the present invention;
Fig. 6 is three schematic diagram of the query processing process of the web document of the embodiment of the present invention;
Fig. 7 is the block diagram for one of the system architecture of query processing of web document of the embodiment of the present invention;
Fig. 8 is four schematic diagram of the query processing process of the web document of the embodiment of the present invention;
Fig. 9 is five schematic diagram of the query processing process of the web document of the embodiment of the present invention;
Figure 10 is two block diagram of the query processing framework for web document of the embodiment of the present invention;
Figure 11 is three block diagram of the query processing framework for web document of the embodiment of the present invention;
Figure 12 is the vector nearest neighbor search based on CDSSM model of the embodiment of the present invention using exemplary schematic diagram;
Figure 13 is the block diagram of the electronic equipment of the embodiment of the present invention.
Specific embodiment
Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure It is fully disclosed to those skilled in the art.
Herein, term " technology " may refer to such as (one or more) system, (one or more) method, computer It is readable instruction, (one or more) module, algorithm, hardware logic (for example, field programmable gate array (FPGA)), dedicated integrated Circuit (ASIC), Application Specific Standard Product (ASSP), system on chip (SOC), complex programmable logic equipment (CPLD) and/or above-mentioned Context and in this document permitted (one or more) other technologies in the whole text.
Search engine technique has been widely used in various industries, in addition to the use of general web page access mode is searched Other than index is held up, search engine is also associated in all kinds of APP (application program), provides various information search services for user.
User issues inquiry request to search engine, and search engine is according to the inquiry content for including in inquiry request It is scanned in the web document of storage, obtains the web document with the inquiry content matching of user, and return to user.Search is drawn Holding up is not merely to retrieve to web document, can also be directed to other kinds of document (such as message document, data file) The scene retrieve.Herein, it is mainly illustrated by taking web document as an example.
Current file retrieval is the mode based on Symbol matching mostly to complete, in network search engines, more often The mode seen is: the correlation text for a certain inquiry content is obtained based on the Symbol matching method of the inverted index of keyword Shelves.The current retrieval mode based on Symbol matching can not better understand user's intention.Although in some search engines, Some variations can be carried out to the original query of input, then be retrieved again, so that recall rate is improved, but this variation is also It is very limited, especially when encountering some new concepts, it not can guarantee recall rate.
In document query technology proposed in this paper based on vector nearest neighbor search, the document in search engine is turned in advance The document vector for turning to semantic vector form, by the inquiry content that user inputs be also converted to the inquiry of semantic vector form to Then amount based on vector approximation matching retrieval, in document vector library (the document vector converted by multiple documents is constituted), is sought Look for the approximate document vector of query vector, finally, corresponding document is got further according to the document vector found, as inquiry As a result user is returned to.
Wherein, above-mentioned vector approximation matching retrieval specifically can search for (ANN, approximate using approximate KNN Nearest neighbor search) technology.Since document and inquiry have been converted to the form of semantic vector, according to inquiry Similarity between vector and document vector determines the document recalled, and this mode breaches the retrieval mode of Symbol matching Limitation, better understood when the intention of user.
As shown in Figure 1, it is the example block diagram 100 of search engine system disclosed by the embodiments of the present invention, block diagram 100 is wrapped It includes: user 101, the server 103 with search engine 102, the one or more databases 104 for storing web document, user It is connected between 101 and server 103 by internet 105.Herein, user 101 may refer to the client of people, software form End, example, in hardware client (such as desktop computer, laptop, mobile phone, tablet computer and it is other similar intelligence eventually End), APP or other application server.
On the one hand, search engine 102 is round-the-clock identifies in mass data and grabs content, forms web document, storage In database 104.Wherein, the content of web document may include: title, link, anchor, click data etc..On the other hand, Search engine 102 receives the inquiry request of user 101, is examined in database 104 according to the inquiry content in inquiry request Rope obtains the web document with inquiry content matching, is then returned to user 101.Inquiry request can be based on user in net What the word content inputted in the search box of page generated, it is also possible to the query demand from APP according to user and generates.With When the input inquiry content of family, it can also be inputted using voice by the way of text input and then be identified as text again Form.Herein, no matter user's input inquiry content in what manner, finally can all be converted into and exist in the form of natural language Inquiry content, thus to use technology described herein to be further processed.
The embodiment of the present invention improves the retrieval aspect for web document in search engine.It is related generally to It is following aspects:
1) the web document query processing based on vector approximation matching retrieval
The embodiment of the present invention introduces vector approximation matching retrieval, will inquiry content and inquiry document be converted into it is semantic to After amount, then carry out matching retrieval.To break through the limitation of Symbol matching retrieval, the intention of user is more fully understood.
2) the piecemeal processing of web document data
Since the data volume of web document is very huge, the embodiment of the present invention is first to web document data (by multiple webpages Document is constituted) piecemeal processing is carried out, the conversion and index for then carrying out semantic vector are established.During retrieval, and Parallel search is carried out to each web document data block respectively, then web document is merged again, forms output result.
3) vector index foundation and the application in query processing
In order to further increase vector approximation matching effectiveness of retrieval, vector index, the main work of vector index are established With being quickly to navigate to that there may be matched web document vectors in region.
4) connected applications of vector approximation matching retrieval and inverted index retrieval
In order to advanced optimize to search result, the embodiment of the present invention uses inverted index retrieval and vector simultaneously The webpage that approximate match retrieves both modes to carry out the retrieval of web document, and two kinds of retrieval modes is made full use of to get Document generates final search result.
The improvement of this several respect will be described in detail respectively below.
The query processing of web document based on vector approximation matching retrieval
In embodiments of the present invention, the web document grabbed is converted web document vector in advance by search engine 102, and It is stored in database 104, as the subsequent data basis for carrying out document query.On this basis, when search engine 102 receives To after the inquiry request of user, inquiry content is extracted, then executes the web document of the embodiment of the present invention as shown in Figure 2 The schematic diagram 200 of one of query processing process, the treatment process include:
S201: query vector is generated according to inquiry content.Inquiry content exists in the form of natural language, by inquiry The semantic of content carries out feature extraction to form query vector, can be based on inquiring content during generating query vector Context extracts semantic feature, is intended to so as to preferably hold user.In addition, query vector and web document vector are Based on identical semantic space generative semantics vector, in this way convenient for the vector between subsequent query vector and web document vector Approximate match processing.
S202: vector approximation matching retrieval is executed, is obtained and the matched web document vector of query vector.Due in inquiry Hold and web document exist all in the form of semantic vector, therefore, can by calculate web document vector and query vector it Between similarity find the web document vector closest with query vector as query result.Vector mentioned herein is close Algorithm basis like matching retrieval is exactly the similarity calculated between semantic vector, in same semantic space, any two language The distance between adopted vector embodies the degree of closeness between two semantic vectors, calculates the algorithm of the distance between semantic vector Have very much, such as the cosine degree of approximation, that is, calculates the cosine value of the angle between two semantic vectors, cosine value is smaller, two languages The degree of approximation between adopted vector is higher.Wherein, vector approximation matching is retrieved can specifically be searched for using approximate KNN (ANN, Approximate nearest neighbor search) it completes, more typical ANN algorithm includes: KD tree (KD- Tree) algorithm, K neighborhood graph (KNN graph) algorithm, local sensitivity Hash (LSH, Locality Sensitive Hash) are calculated Method etc..
S203: corresponding web document is obtained according to web document vector.Can be stored in the database web document to Mapping relations between amount and web document can find the corresponding webpage text of the web document vector according to the mapping relations Shelves.
Above-mentioned steps S201 to S203 describes the treatment process based on the inquiry matched web document of content retrieval.As before Introduction is said in face, before this, needs web document converting web document vector.Therefore, in above-mentioned treatment process, Before step S202, can also include:
S204: one or more web document vectors are generated according to the document content of document.Wherein, document content can wrap It includes: title, link, anchor, click data.Can based in above-mentioned document content any one or multinomial combination from birth Web document vector is generated, a document, which can correspond to, generates multiple web document vectors.When a document is corresponding, there are multiple When web document vector, in above-mentioned step S102, when query vector is with any one web document Vectors matching, this is all thought Document matches with inquiry content, can return the document as query result.Further, since search engine 102 is whole day The content in ground crawl network is waited, and forms web document, therefore, the process of web document converting vector document is also constantly In progress, when there is new web document, new web document is just converted to web document vector by search engine 102, And it is added in database.
As shown in figure 3, its structural block diagram 300 for one of the web document query processing device of the embodiment of the present invention, on The query processing for the web document stated can be completed by web document query processing device shown in Fig. 3, the processing unit It can be set in above-mentioned search engine 102 comprising:
Query vector generation module 301, for generating query vector according to inquiry content;
Web document vector obtains module 302, for executing vector approximation matching retrieval, obtains matched with query vector Web document vector;
Document obtains module 303, for obtaining corresponding document according to web document vector.
In addition, the device can also include web document vector generation module 304, for raw according to the document content of document At one or more web document vectors.
During above-mentioned web document query processing, by being semantic vector by web document and inquiry Content Transformation Form, and by for semantic vector execute vector approximation matching retrieval, can be carried out based on the similitude of semantic vector It searches, it is available to web document vector approximate in vector space, Symbol matching is breached for the limitation of retrieval.And And due to the retrieval based on semantic vector it includes characteristic element be not only vocabulary to be checked (single vocabulary or sentence In vocabulary) itself, but may include characteristic element more abundant, so as to more fully understand the inquiry meaning of user Figure improves recall rate.
The piecemeal of web document data is handled
The most basic query processing process based on vector approximation matching retrieval for forgoing describing the embodiment of the present invention, In practical application, search engine 102 needs to handle the web document of magnanimity, herein, the collection that multiple web documents are formed Zoarium is referred to as web document data.In face of the web document data being made of the web document of magnanimity, data volume is quite huge Greatly, either storage or establishing index is all a huge engineering, and it is so huge and also not for data volume The disconnected web document data increased, it is also quite time-consuming for carrying out the matching retrieval based on inquiry content.In this regard, of the invention Embodiment proposes the system architecture for carrying out piecemeal processing for web document data and establishing index respectively, in such system It unites on the basis of framework, same inquiry content is carried out in each web document data block to parallel query processing respectively, then will The web document obtained from each web document data block is integrated, and final query result is formed.
As shown in figure 4, its two schematic diagram 400 for the query processing process of the web document of the embodiment of the present invention, base In the above-mentioned system architecture that web document data are carried out with piecemeal, query processing process includes:
S401: query vector is generated according to inquiry content.
S402: according to query vector, vector approximation matching retrieval is executed in multiple web document vectors library, obtains and looks into The web document vector of Vectors matching is ask, and according to web document vector, in the corresponding web document number in web document vector library According in block, web document corresponding with web document vector is obtained.Vector approximation matching retrieval specifically can be using above-mentioned close It is completed like nearest neighbor search (ANN).
S403: the web document got respectively from each web document data block is merged, is generated final Query result.It is independent from each other between each web document data block, the net retrieved from each web document data block The case where there is no repetitions between page document, and it is empty for also having the query result in the web document data block of part Situation.The web document as intermediate queries result got from each web document data block can directly be closed And it is exported as final search result.More preferred mode, can also be during merging treatment, to from each Hybrid-sorting, selection and inquiry are once screened or carried out to the web document got in a web document data block The immediate one or several web documents of content are as final query result.
As previously described, the preparation as query processing process needs in advance to huge web document number According to progress piecemeal, and each web document is converted into web document vector, therefore, in above-mentioned treatment process, in step Can also include: before rapid S401
S404: piecemeal processing is carried out to web document data, generates multiple web document data blocks.In practical applications, It, can be in the webpage text for accumulating a certain size since search engine 102 can constantly grab webpage information and form web document After file data, then piecemeal processing is carried out to it.
S405: handling multiple documents in each web document data block, generates and each web document data The corresponding multiple web document vectors library of block, each web document vector library include and multiple documents in web document data block Corresponding multiple web document vectors.
As shown in figure 5, its two structural block diagram 500 for the web document query processing device of the embodiment of the present invention, on The query processing for the web document stated can be completed by web document query processing device shown in fig. 5, the processing unit It can be set in above-mentioned search engine 102 comprising:
Query vector generation module 501, for generating query vector according to inquiry content;
Vector approximation matches retrieval module 502, for according to query vector, executed in multiple web document vectors library to Approximate match retrieval, acquisition and the matched web document vector of query vector are measured, and according to document vector, in web document vector In the corresponding web document data block in library, web document corresponding with web document vector is obtained;
Query result generation module 503, the web document for will be got respectively from each web document data block It merges, generates final query result.
In addition, the device can also include that web document data are carried out with piecemeal processing and progress web document vector turn The processing module of change, specifically includes:
Piecemeal processing module 504 carries out piecemeal processing to web document data, generates multiple web document data blocks;
Document vector library generation module 505, handles multiple web documents in each web document data block, raw At multiple web document vectors library corresponding with each web document data block, each web document vector library includes and webpage text The corresponding multiple web document vectors of multiple web documents in file data block, the corresponding one or more documents of each document Vector.
Vector approximation can be matched the model of retrieval by carrying out piecemeal processing to web document data by the embodiment of the present invention It encloses and narrows down in reasonable range, so as to more rapidly carry out vector approximation matching retrieval.
Vector index is established and the application in query processing
In order to more rapidly carry out vector approximation matching retrieval, the embodiment of the present invention to web document data into On the basis of row piecemeal, the web document vector library also formed to each web document data block establishes vector index.Vector The main function of index carries out subregion to each web document vector in web document vector library, thus executing query processing During, query vector can quickly navigate to there may be matched web document vectors in the zone.In the present invention In embodiment, in the vector index for just establish after piecemeal to web document data block, therefore, the scale phase of vector index To smaller, so as to further improve the speed of Vectors matching retrieval.
On the basis of establishing vector index, as shown in fig. 6, its Directory Enquiries for the web document of the embodiment of the present invention The schematic diagram 600 of the three of reason process in above-mentioned step S402, according to query vector, is held in multiple web document vectors library Row vector approximate match retrieval, obtains the processing with the matched web document vector of query vector, can specifically include:
S601: according to the corresponding vector index of query vector and each document vector library, in each document vector library really The region of fixed pending vector approximation matching retrieval;
S602: according to query vector, in determining region, vector approximation matching retrieval, acquisition and query vector are executed Matched document vector.
As shown in fig. 7, it is the frame for one of the system architecture of query processing of web document of the embodiment of the present invention Figure 70 0, block diagram 700 include query processor (Query Worker) 701, multiple retrieval process devices (Search Worker) 702, summary device (Aggregator) 703 and database 704 corresponding with each retrieval process device 702.
Vector index is established in the piecemeal processing for having carried out web document data and for web document vector library On the basis of, query processor 701 will inquire content transformation to be replicated, being distributed to each retrieval process device after semantic vector 702, each retrieval process device 702 executes the retrieving of web document vector for each web document data block parallel, so Backward summary device 703 exports the web document retrieved, the webpage text that the meeting of summary device 703 provides each retrieval process device 702 Shelves are ranked up, and are selected and are supplied to the immediate one or more web documents of inquiry content as final query result User.
Each retrieval process device 702 corresponds to a database 704, corresponding for being stored in the retrieval process device 702 Web document data block and web document vector library, record has the vector index in web document vector library in retrieval process device 702.
The embodiment of the present invention can be rapidly by the process range contracting of vector approximation matching retrieval by establishing vector index In the small specific region to web document vector library, thus the workload of the calculating degree of approximation between reducing vector, improve to Measure approximate match effectiveness of retrieval.
The connected applications of vector approximation matching retrieval and inverted index retrieval
In order to preferably optimize to query result, inverted index is retrieved and is matched with vector approximation by the embodiment of the present invention Retrieval combines, so that the advantage of two kinds of retrieval modes is made full use of, to further increase the accuracy of query result.
In embodiments of the present invention, inverted index is all that web document data are being carried out piecemeal as vector index The index just established afterwards, inverted index is the index established for the web document in each web document database, and vector Index is the index established for each web document vector in each web document vector library.
As shown in figure 8, its four schematic diagram 800 for the query processing process of the web document of the embodiment of the present invention, with And as shown in figure 9, it is five schematic diagram 900 of the query processing process of the web document of the embodiment of the present invention.In the present invention In embodiment, inverted index retrieval matches retrieval with vector approximation and executes parallel.Piecemeal is being carried out to web document data On the basis of, analysis 801 first is carried out to inquiry content, it is parallel to execute inverted index retrieval for each web document data block It matches and retrieves with vector approximation, obtain the web document retrieved based on inverted index respectively and based on vector approximation matching inspection The web document that rope obtains specifically as shown in Figure 8 and Figure 9, executes respectively for inquiry content and extracts keyword 802 and life At query vector 803, distributed inverted index retrieval 804 and distributed ANN vector index 805 are then executed respectively.Most Afterwards, the processing merged for the web document obtained from each web document data block can during merging treatment To be ranked up processing to the web document of acquisition, so that it is determined that eventually as the search result exported to user.About sequence Processing, can use the following two kinds mode:
Mode one: as shown in figure 8, executing sequence processing 806 to the web document retrieved based on inverted index respectively Sequence processing 807 is executed with to the web document obtained based on vector approximation matching retrieval, then by sequence processing 806 and sequence The web document of 807 output of processing is ranked up processing 808 again, and the web document of 808 output of sequence processing is merged place After reason 809, final query result is generated, then executes query result output 810.
Mode two: as shown in figure 9, being examined by the web document for matching retrieval acquisition by vector approximation and by inverted index The web document that rope obtains carries out hybrid-sorting 901, and the web document that hybrid-sorting 901 exports then is merged processing After 902, final query result is generated, then executes query result output 810.
It as shown in Figure 10, is two block diagram of the query processing framework for web document of the embodiment of the present invention 1000, the query processing process of above-mentioned web document can be completed based on processing framework shown in Fig. 10.In block diagram 1000, Query processor (Query Worker) 1001 executes the processing that inquiry content transformation is semantic vector and according to inquiry content After the processing for extracting keyword, it will replicate, distribute from the query vector after the keyword and conversion extracted in inquiry content Give each retrieval process device (Search Worker).Wherein, retrieval process device is divided into two classes, and one kind is to execute vector approximation Retrieval process device 1002 with retrieval, another kind of is the retrieval process device 1003 for executing inverted index retrieval.Sorting processor 1004 for being ranked up the web document obtained by vector approximation matching retrieval, and sorting processor 1005 is for passing through The web document that inverted index retrieval obtains is ranked up, and sorting processor 1006 is used for sorting processor 1004 and sequence It manages web document that device 1005 exports and is carrying out minor sort again, last summary device (Aggregator) 1007 is by sorting processor After the web document of 1005 outputs merges processing, the query result that can finally be provided to user is generated.
It as shown in figure 11, is three block diagram of the query processing framework for web document of the embodiment of the present invention 1100.The query processing process of above-mentioned web document is also based on processing framework shown in Figure 11 to complete.In block diagram 1100, query processor 1101 will inquire content transformation be semantic vector and based on inquiry contents extraction go out keyword after, will Query vector after the keyword extracted and conversion is replicated, and is distributed to each retrieval process device 1102, at each retrieval Device 1102 is managed other than executing vector approximation matching retrieval, also execution inverted index retrieval.Each retrieval process device 1102 will The web document of retrieval acquisition is matched by vector approximation and mixing is output to by the web document that inverted index retrieval obtains Sorting processor 1103, hybrid-sorting processor 1103 will match the web document and pass through that retrieval obtains by vector approximation It arranges the web document that indexed search obtains and carries out hybrid-sorting, then, summary device 1104 exports hybrid-sorting processor 1103 Web document merge processing after, generate and can finally be provided to the query result of user.
It, can be using such as LambdaRank (a kind of algorithm of study sequence) about the processing of the sequence to web document Model or LambdaMart (a kind of algorithm of study sequence) model is handled.
It is used in combination by retrieving to match to retrieve with vector approximation inverted index, can fully utilize two class here Retrieval mode a little, so as to obtain more accurate and more can understand query result that user is intended to.
The embodiment of application scenarios
Be described above the treatment process of the document query technology based on vector nearest neighbor search of the embodiment of the present invention with And overall architecture.The technical solution of the embodiment of the present invention will be further illustrated by a concrete application example below.
It as shown in figure 12, is the embodiment of the present invention based on CDSSM (Convolutional Deep Structured Semantic Models, the depth structure semantic model based on convolution) model vector nearest neighbor search using exemplary Schematic diagram 1200.In the present embodiment, with original inquiry content for " coffee and teasouth melbourne " 1201 As an example, and assume to currently exist three web documents, wherein the URL (Uniform of web document 1202 Resoure Locator, uniform resource locator) it is " www truelocal com au find coffee tea vic Melbourn city south melbourne ", the title (title) of web document 2 1203 are " coffee tea Suppliers in south melbourne Melbourne city vic ", the click record of web document 1204 (click)"coffee beans supplier south melbourne".Click record mentioned here, which refers to, clicked this The inquiry content of the corresponding web page interlinkage of web document, i.e. user input some inquiry content, and search engine returns some net Page document, user click the web page interlinkage of the web document, have accessed corresponding webpage, and search engine can be by the inquiry content It is recorded as the click record of the web document.
The vectorization and similitude matching of inquiry content and web document are realized in figure using CDSSM model.Such as figure Shown in, original inquiry content and web document all pass through word insertion (word embedding) and deep neural network (Deep Neural Network) carries out the conversion of semantic vector, in model shown in the figure, first using being based on three type matrixes Formula (tri-letter) carries out word insertion (word embedding) 1208, then uses the depth structure language based on convolution again Adopted model (CDSSM) 1209 (dimension that the d mark in figure generates vector) generates the semantic vector that dimension is 100.
As shown in the figure, on the basis of generating query vector 1205 and web document vector 1206, by inquiry Cosine similarity is executed between each web document vector of vector sum calculates 1207 to select the most like highest web document of degree As query result.
Implement example
In some instances, above-mentioned Fig. 1 to Figure 12 is related to one or more modules or one or more steps or One or more treatment processes can also mutually be tied by software program with hardware circuit by software program, hardware circuit The mode of conjunction is realized.For example, above-mentioned various components or module and one or more steps all can be in system on chip (SoC) it is realized in.SoC can include: IC chip, the IC chip include following one or more: processing unit (such as central processing unit (CPU), microcontroller, microprocessing unit, digital signal processing unit (DSP)), memory, one Or the firmware of multiple communication interfaces, and/or further circuit and optional insertion for executing its function.
It as shown in figure 13, is the structural block diagram of the electronic equipment 1300 of inventive embodiments.Electronic equipment 1300 includes: to deposit Reservoir 1301 and processor 1302.
Memory 1301, for storing program.In addition to above procedure, memory 1301 is also configured to store other Various data are to support the operation on electronic equipment 1300.The example of these data includes for grasping on electronic equipment 1300 The instruction of any application or method of work, contact data, telephone book data, message, picture, video etc..
Memory 1301 can realize by any kind of volatibility or non-volatile memory device or their combination, Such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable is read-only Memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, disk Or CD.
Memory 1301 is coupled to processor 1302 and includes the instruction being stored thereon, and described instruction is by handling Device 1302 makes electronic equipment execute movement when executing, and as the embodiment of a kind of electronic equipment, which may include:
Query vector is generated according to inquiry content;
Vector approximation matching retrieval is executed, is obtained and the matched document vector of query vector;
Corresponding document is obtained according to document vector.
Wherein, vector approximation matching retrieval is executed, obtaining with the matched document vector of query vector may include: based on close Like nearest neighbor search, obtain and the matched document vector of query vector.
As the embodiment of another electronic equipment, above-mentioned movement may include:
Query vector is generated according to inquiry content;
According to query vector, vector approximation matching retrieval is executed in multiple document vectors library, is obtained and query vector The document vector matched, and according to the document vector, in the corresponding document data block in document vector library, obtain and document vector Corresponding document;
The document got respectively from each document data block is merged, final query result is generated.
Wherein, according to query vector, vector approximation matching retrieval is executed in multiple document vectors library, obtain and inquire to Flux matched document vector may include:
According to the corresponding vector index of query vector and each document vector library, determined in each document vector library into The region of row vector approximate match retrieval;
According to query vector, in determining region, vector approximation matching retrieval is executed, is obtained matched with query vector Document vector.
In addition, can also include: root by before being merged from the document got respectively in each document data block It is investigated that asking content, in multiple document data blocks, inverted index retrieval is executed, and obtain document corresponding with inquiry content;
Correspondingly, the document got respectively from each document data block is merged, generates final inquiry knot Fruit may include: that will match the document of retrieval acquisition by vector approximation and mixed by the document that inverted index retrieval obtains Sequence is closed, processing is merged to document according to ranking results, carries out generating final query result.
For above-mentioned processing operation, detailed description has been carried out in the embodiment of method and apparatus in front, for The detailed content of above-mentioned processing operation can equally be well applied in electronic equipment 1300, it can by what is mentioned in preceding embodiment Specific processing operation is written in memory 1301 in a manner of program, and is executed by processor 1302.
Further, as shown in Figure 113, electronic equipment 1300 can also include: communication component 1303, power supply module 1304, Other components such as audio component 1305, display 1306, chipset 107.Members are only schematically provided in Figure 13, and unexpectedly Taste electronic equipment 1300 only include component shown in Figure 13.
Communication component 1303 is configured to facilitate the logical of wired or wireless way between electronic equipment 1300 and other equipment Letter.Electronic equipment can access the wireless network based on communication standard, such as WiFi, 2G or 3G or their combination.Show at one In example property embodiment, communication component 1303 receives broadcast singal or broadcast from external broadcasting management system via broadcast channel Relevant information.In one exemplary embodiment, communication component 1303 further includes near-field communication (NFC) module, to promote short distance Communication.For example, radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band can be based in NFC module (UWB) technology, bluetooth (BT) technology and other technologies are realized.
Power supply module 1304 provides electric power for the various assemblies of electronic equipment.Power supply module 1304 may include power supply pipe Reason system, one or more power supplys and other with for electronic equipment generate, manage, and distribute the associated component of electric power.
Audio component 1305 is configured as output and/or input audio signal.For example, audio component 1305 includes a wheat Gram wind (MIC), when electronic equipment is in operation mode, when such as call mode, recording mode, and voice recognition mode, microphone quilt It is configured to receive external audio signal.The received audio signal can be further stored in memory 1301 or via communication Component 1303 is sent.In some embodiments, audio component 1305 further includes a loudspeaker, is used for output audio signal.
Display 1306 includes screen, and screen may include liquid crystal display (LCD) and touch panel (TP).If screen Curtain includes touch panel, and screen may be implemented as touch screen, to receive input signal from the user.Touch panel includes one A or multiple touch sensors are to sense the gesture on touch, slide, and touch panel.Touch sensor can not only sense touching It touches or the boundary of sliding action, but also detects duration and pressure relevant with touch or slide.
Above-mentioned memory 1301, processor 1302, communication component 1303, power supply module 1304, audio component 1305 with And display 1306 can be connect with chipset 1307.Chipset 1307 can be provided in processor 1302 and electronic equipment 1300 Remaining component between interface.In addition, chipset 1307 can also provide the various components in electronic equipment 1300 to storage The communication interface mutually accessed between the access interface and various components of device 1301.
Example clause
A kind of A: method, comprising:
Query vector is generated according to inquiry content;
Vector approximation matching retrieval is executed, is obtained and the matched document vector of the query vector;
Corresponding document is obtained according to the document vector.
B: the method as described in paragraph A, wherein the execution vector approximation matching retrieval obtains and the query vector Matched document vector includes:
It is searched for, is obtained and the matched document vector of the query vector based on approximate KNN.
C: the method as described in paragraph A, wherein the query vector and the document vector is based on identical semantic empty Between generative semantics vector.
D: the method as described in paragraph A, wherein described to include: according to inquiry content generation query vector
The query vector is generated according to the context of the inquiry content.
E: the method as described in paragraph A, wherein further include:
Generate one or more document vectors according to the document content of document, the document content include: title, link, Any one or multinomial combination in anchor, click data.
A kind of F: method, comprising:
Query vector is generated according to inquiry content;
According to the query vector, vector approximation matching retrieval is executed in multiple document vectors library, acquisition is looked into described The document vector of Vectors matching is ask, and is obtained in the corresponding document data block in document vector library according to the document vector Take document corresponding with the document vector;
The document got respectively from each document data block is merged, final query result is generated.
G: the method as described in paragraph F, wherein according to the query vector, execute vector in multiple document vectors library Approximate match retrieval, acquisition include: with the matched document vector of the query vector
According to the corresponding vector index of the query vector and each document vector library, determined in each document vector library The region of pending vector approximation matching retrieval;
According to the query vector, in the determining region, the vector approximation matching retrieval, acquisition and institute are executed State the matched document vector of query vector.
H: the method as described in paragraph G, wherein closed by the document got respectively from each document data block And before, further includes:
According to the inquiry content, in multiple document data blocks, execute inverted index retrieval, and obtain with it is described Inquire the corresponding document of content;
It is described to merge the document got respectively from each document data block, generate final inquiry knot Fruit includes:
For each document data block, the document and pass through inverted index that retrieval obtains will be matched by vector approximation The document that retrieval obtains carries out hybrid-sorting, merges processing to the document according to ranking results, generate final Query result.
I: the method as described in paragraph F, wherein further include:
Piecemeal processing is carried out to document data, generates multiple document data blocks;
Multiple documents in each document data block are handled, are generated corresponding with each document data block Multiple document vector libraries, each document vector library includes right respectively with multiple documents in the document data block The multiple document vectors answered, each corresponding one or more document vectors of the document.
J: the method as described in paragraph I, wherein further include:
Each document vector library is established respectively for carrying out subregion to each document vector in document vector library The vector index.
K: a kind of electronic equipment, comprising:
Processing unit;And
Memory is coupled to the processing unit and includes the instruction being stored thereon, and described instruction is by described Reason unit makes the equipment execute movement when executing, and the movement includes:
Query vector is generated according to inquiry content;
Vector approximation matching retrieval is executed, is obtained and the matched document vector of the query vector;
Corresponding document is obtained according to the document vector.
L: the electronic equipment as described in paragraph K, wherein the execution vector approximation matching retrieval obtains and the inquiry The document vector of Vectors matching includes:
It is searched for, is obtained and the matched document vector of the query vector based on approximate KNN.
M: a kind of electronic equipment, comprising:
Processing unit;And
Memory is coupled to the processing unit and includes the instruction being stored thereon, and described instruction is by described Reason unit makes the equipment execute movement when executing, and the movement includes:
Query vector is generated according to inquiry content;
According to the query vector, vector approximation matching retrieval is executed in multiple document vectors library, acquisition is looked into described The document vector of Vectors matching is ask, and is obtained in the corresponding document data block in document vector library according to the document vector Take document corresponding with the document vector;
The document got respectively from each document data block is merged, final query result is generated.
N: the electronic equipment as described in paragraph M, wherein
It is described to execute vector approximation matching in multiple document vectors library according to the query vector and retrieve, acquisition and institute Stating the matched document vector of query vector includes:
According to the corresponding vector index of the query vector and each document vector library, determined in each document vector library The region of pending vector approximation matching retrieval;
According to the query vector, in the determining region, the vector approximation matching retrieval, acquisition and institute are executed State the matched document vector of query vector.
O: the electronic equipment as described in paragraph N, wherein the document that will be got respectively from each document data block into Before row merges, further includes:
According to the inquiry content, in multiple document data blocks, execute inverted index retrieval, and obtain with it is described Inquire the corresponding document of content;
For each document data block, the document and pass through inverted index that retrieval obtains will be matched by vector approximation The document that retrieval obtains carries out hybrid-sorting, merges processing to the document according to ranking results, generate final Query result.
A kind of P: device, comprising:
Query vector generation module, for generating query vector according to inquiry content;
Document vector obtains module, for executing vector approximation matching retrieval, obtains and the matched text of the query vector Shelves vector;
Document obtains module, for obtaining corresponding document according to the document vector.
Q: the device as described in paragraph P, wherein the execution vector approximation matching retrieval obtains and the query vector Matched document vector includes: to be searched for based on approximate KNN, is obtained and the matched document vector of the query vector.
A kind of R: device, comprising:
Query vector generation module, for generating query vector according to inquiry content;
Vector approximation matches retrieval module, for executing vector in multiple document vectors library according to the query vector Approximate match retrieval, obtain with the matched document vector of the query vector, and according to the document vector, the document to It measures in the corresponding document data block in library, obtains document corresponding with the document vector;
Query result generation module, for closing the document got respectively from each document data block And generate final query result.
S: the device as described in paragraph R, wherein it is described according to the query vector, it is executed in multiple document vectors library Vector approximation matching retrieval, acquisition include: with the matched document vector of the query vector
According to the corresponding vector index of the query vector and each document vector library, determined in each document vector library The region of pending vector approximation matching retrieval;
According to the query vector, in the determining region, the vector approximation matching retrieval, acquisition and institute are executed State the matched document vector of query vector.
T: the device as described in paragraph S, wherein further include multiple inverted index retrieval modules, for according to the inquiry Content executes inverted index retrieval, and obtain document corresponding with the inquiry content in multiple document data blocks;
In the query result generation module, the document that will be got respectively from each document data block It merges, generating final query result includes:
It is mixed by the document for matching retrieval acquisition by vector approximation and by the document that inverted index retrieval obtains Sequence, merges processing to the document according to ranking results, carries out generating final query result.
Conclusion
Although this theme of the dedicated language description of structural features and or methods of action has been used, it is to be understood that appended power Theme defined in sharp claim is not necessarily limited to described specific feature or action.But these specific features and movement are It is disclosed as the illustrative form for realizing the claim.
Unless specifically stated otherwise, otherwise within a context be understood that and be used generally conditional statement (such as " energy ", " can ", " possibility " or " can with ") indicate that particular example includes and other examples do not include special characteristic, element and/or step. Therefore, such conditional statement is generally not intended to imply that requires feature, element for one or more examples in any way And/or step, or one or more examples necessarily include inputting or mentioning for the logic of decision, with or without user Show, whether to include or to execute these features, element and/or step in any specific embodiment.
Unless specifically stated otherwise, it should be understood that joint language (such as phrase " at least one in X, Y or Z ") indicates item, word Language etc. can be any one of X, Y or Z, or combinations thereof.
Any customary description, element or frame should be understood to potentially in flow chart described in described herein and/or attached drawing Expression include the code of one or more executable instructions for realizing logic function specific in the routine or element module, Segment or part.Replacement example is included in the range of example described herein, and wherein each element or function can be deleted, or It is inconsistently executed with sequence shown or discussed, including substantially simultaneously executes or execute in reverse order, this depends on In related function, as those skilled in the art also will be understood that.
It should be emphasized that can to above-mentioned example, many modifications may be made and modification, element therein shows as other are acceptable Example is understood that like that.All such modifications and variations are intended to include herein within the scope of this disclosure and by following right Claim protection.
Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above-mentioned each method embodiment can lead to The relevant hardware of program instruction is crossed to complete.Program above-mentioned can be stored in a computer readable storage medium.The journey When being executed, execution includes the steps that above-mentioned each method embodiment to sequence;And storage medium above-mentioned include: ROM, RAM, magnetic disk or The various media that can store program code such as person's CD.
Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into Row equivalent replacement;And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution The range of scheme.

Claims (20)

1. a kind of method, comprising:
Query vector is generated according to inquiry content;
Vector approximation matching retrieval is executed, is obtained and the matched document vector of the query vector;
Corresponding document is obtained according to the document vector.
2. according to the method described in claim 1, wherein, the execution vector approximation matching retrieval, obtain with it is described inquire to Flux matched document vector includes:
It is searched for, is obtained and the matched document vector of the query vector based on approximate KNN.
3. according to the method described in claim 1, wherein, the query vector and the document vector are based on identical semanteme Space generative semantics vector.
It is described query vector is generated according to inquiry content to include: 4. according to the method described in claim 1, wherein
The query vector is generated according to the context of the inquiry content.
5. according to the method described in claim 1, wherein, further includes:
One or more document vectors are generated according to the document content of document, the document content includes: title, link, anchor, point Hit any one or multinomial combination in data.
6. a kind of method, comprising:
Query vector is generated according to inquiry content;
According to the query vector, vector approximation matching retrieval is executed in multiple document vectors library, obtain with it is described inquire to Flux matched document vector, and according to the document vector, in the corresponding document data block in document vector library, obtain with The corresponding document of the document vector;
The document got respectively from each document data block is merged, final query result is generated.
7. according to the method described in claim 6, wherein, according to the query vector, executed in multiple document vectors library to Approximate match retrieval is measured, acquisition includes: with the matched document vector of the query vector
According to the corresponding vector index of the query vector and each document vector library, determined in each document vector library into The region of row vector approximate match retrieval;
According to the query vector, in the determining region, the vector approximation matching retrieval is executed, acquisition is looked into described Ask the document vector of Vectors matching.
8. according to the method described in claim 7, wherein, being carried out in the document that will be got respectively from each document data block Before merging, further includes:
According to the inquiry content, in multiple document data blocks, inverted index retrieval is executed, and obtain and the inquiry The corresponding document of content;
It is described to merge the document got respectively from each document data block, generate final query result packet It includes:
Hybrid-sorting is carried out by the document for matching retrieval acquisition by vector approximation and by the document that inverted index retrieval obtains, Processing is merged to the document according to ranking results, carries out generating final query result.
9. according to the method described in claim 6, wherein, further includes:
Piecemeal processing is carried out to document data, generates multiple document data blocks;
Multiple documents in each document data block are handled, are generated corresponding more with each document data block A document vector library, each document vector library include corresponding with multiple documents in the document data block Multiple document vectors, each corresponding one or more document vectors of the document.
10. according to the method described in claim 9, wherein, further includes:
Institute for carrying out subregion to each document vector in document vector library is established respectively to each document vector library State vector index.
11. a kind of electronic equipment, comprising:
Processing unit;And
Memory is coupled to the processing unit and includes the instruction being stored thereon, and described instruction is single by the processing Member makes the equipment execute movement when executing, and the movement includes:
Query vector is generated according to inquiry content;
Vector approximation matching retrieval is executed, is obtained and the matched document vector of the query vector;
Corresponding document is obtained according to the document vector.
12. electronic equipment according to claim 11, wherein execution vector approximation matching retrieval, obtain with it is described The matched document vector of query vector includes:
It is searched for, is obtained and the matched document vector of the query vector based on approximate KNN.
13. a kind of electronic equipment, comprising:
Processing unit;And
Memory is coupled to the processing unit and includes the instruction being stored thereon, and described instruction is single by the processing Member makes the equipment execute movement when executing, and the movement includes:
Query vector is generated according to inquiry content;
According to the query vector, vector approximation matching retrieval is executed in multiple document vectors library, obtain with it is described inquire to Flux matched document vector, and according to the document vector, in the corresponding document data block in document vector library, obtain with The corresponding document of the document vector;
The document got respectively from each document data block is merged, final query result is generated.
14. electronic equipment according to claim 13, wherein
Described that vector approximation matching retrieval is executed in multiple document vectors library according to the query vector, acquisition is looked into described Ask Vectors matching document vector include:
According to the corresponding vector index of the query vector and each document vector library, determined in each document vector library into The region of row vector approximate match retrieval;
According to the query vector, in the determining region, the vector approximation matching retrieval is executed, acquisition is looked into described Ask the document vector of Vectors matching.
15. electronic equipment according to claim 14, wherein in the text that will be got respectively from each document data block Before shelves merge, further includes:
According to the inquiry content, in multiple document data blocks, inverted index retrieval is executed, and obtain and the inquiry The corresponding document of content;
It is described to merge the document got respectively from each document data block, generate final query result packet It includes: carrying out hybrid-sorting by the document for matching retrieval acquisition by vector approximation and by the document that inverted index retrieval obtains, Processing is merged to the document according to ranking results, carries out generating final query result.
16. a kind of device, comprising:
Query vector generation module, for generating query vector according to inquiry content;
Document vector obtain module, for execute vector approximation matching retrieval, obtain with the matched document of the query vector to Amount;
Document obtains module, for obtaining corresponding document according to the document vector.
17. device according to claim 16, wherein the execution vector approximation matching retrieval obtains and the inquiry The document vector of Vectors matching includes: to be searched for based on approximate KNN, is obtained and the matched document vector of the query vector.
18. a kind of device, comprising:
Query vector generation module, for generating query vector according to inquiry content;
Vector approximation matches retrieval module, for executing vector approximation in multiple document vectors library according to the query vector Matching retrieval, acquisition and the matched document vector of the query vector, and according to the document vector, in document vector library In corresponding document data block, document corresponding with the document vector is obtained;
Query result generation module, it is raw for merging the document got respectively from each document data block At final query result.
19. device according to claim 18, wherein it is described according to the query vector, in multiple document vectors library Vector approximation matching retrieval is executed, acquisition includes: with the matched document vector of the query vector
According to the corresponding vector index of the query vector and each document vector library, determined in each document vector library into The region of row vector approximate match retrieval;
According to the query vector, in the determining region, the vector approximation matching retrieval is executed, acquisition is looked into described Ask the document vector of Vectors matching.
20. device according to claim 19, wherein further include multiple inverted index retrieval modules, for according to Content is inquired, in multiple document data blocks, executes inverted index retrieval, and obtain text corresponding with the inquiry content Shelves;
It is described to carry out the document got respectively from each document data block in the query result generation module Merge, generating final query result includes:
Hybrid-sorting is carried out by the document for matching retrieval acquisition by vector approximation and by the document that inverted index retrieval obtains, Processing is merged to the document according to ranking results, carries out generating final query result.
CN201711343103.8A 2017-12-14 2017-12-14 Document query based on vector nearest neighbor search Pending CN109948044A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201711343103.8A CN109948044A (en) 2017-12-14 2017-12-14 Document query based on vector nearest neighbor search
PCT/US2018/064146 WO2019118253A1 (en) 2017-12-14 2018-12-06 Document recall based on vector nearest neighbor search

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711343103.8A CN109948044A (en) 2017-12-14 2017-12-14 Document query based on vector nearest neighbor search

Publications (1)

Publication Number Publication Date
CN109948044A true CN109948044A (en) 2019-06-28

Family

ID=65199569

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711343103.8A Pending CN109948044A (en) 2017-12-14 2017-12-14 Document query based on vector nearest neighbor search

Country Status (2)

Country Link
CN (1) CN109948044A (en)
WO (1) WO2019118253A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111339241A (en) * 2020-02-18 2020-06-26 北京百度网讯科技有限公司 Question duplicate checking method and device and electronic equipment
CN111339261A (en) * 2020-03-17 2020-06-26 北京香侬慧语科技有限责任公司 Document extraction method and system based on pre-training model
CN111930880A (en) * 2020-08-14 2020-11-13 易联众信息技术股份有限公司 Text code retrieval method, device and medium
US11354293B2 (en) 2020-01-28 2022-06-07 Here Global B.V. Method and apparatus for indexing multi-dimensional records based upon similarity of the records
CN115545853A (en) * 2022-12-02 2022-12-30 云筑信息科技(成都)有限公司 Searching method for searching suppliers

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7475071B1 (en) * 2005-11-12 2009-01-06 Google Inc. Performing a parallel nearest-neighbor matching operation using a parallel hybrid spill tree
CN101639831A (en) * 2008-07-29 2010-02-03 华为技术有限公司 Search method, search device and search system
CN103136352A (en) * 2013-02-27 2013-06-05 华中师范大学 Full-text retrieval system based on two-level semantic analysis
CN103838833A (en) * 2014-02-24 2014-06-04 华中师范大学 Full-text retrieval system based on semantic analysis of relevant words
CN103838735A (en) * 2012-11-21 2014-06-04 大连灵动科技发展有限公司 Data retrieval method for improving retrieval efficiency and quality
CN106909628A (en) * 2017-01-24 2017-06-30 南京大学 A kind of text similarity method based on interval

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7475071B1 (en) * 2005-11-12 2009-01-06 Google Inc. Performing a parallel nearest-neighbor matching operation using a parallel hybrid spill tree
CN101639831A (en) * 2008-07-29 2010-02-03 华为技术有限公司 Search method, search device and search system
CN103838735A (en) * 2012-11-21 2014-06-04 大连灵动科技发展有限公司 Data retrieval method for improving retrieval efficiency and quality
CN103136352A (en) * 2013-02-27 2013-06-05 华中师范大学 Full-text retrieval system based on two-level semantic analysis
CN103838833A (en) * 2014-02-24 2014-06-04 华中师范大学 Full-text retrieval system based on semantic analysis of relevant words
CN106909628A (en) * 2017-01-24 2017-06-30 南京大学 A kind of text similarity method based on interval

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MUJA MARIUS ET AL: "Scalable Nearest Neighbor Algorithms for High Dimensional Data", 《IEEE COMPUTER SOCIETY》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11354293B2 (en) 2020-01-28 2022-06-07 Here Global B.V. Method and apparatus for indexing multi-dimensional records based upon similarity of the records
CN111339241A (en) * 2020-02-18 2020-06-26 北京百度网讯科技有限公司 Question duplicate checking method and device and electronic equipment
CN111339241B (en) * 2020-02-18 2024-02-13 北京百度网讯科技有限公司 Problem duplicate checking method and device and electronic equipment
CN111339261A (en) * 2020-03-17 2020-06-26 北京香侬慧语科技有限责任公司 Document extraction method and system based on pre-training model
CN111930880A (en) * 2020-08-14 2020-11-13 易联众信息技术股份有限公司 Text code retrieval method, device and medium
CN115545853A (en) * 2022-12-02 2022-12-30 云筑信息科技(成都)有限公司 Searching method for searching suppliers

Also Published As

Publication number Publication date
WO2019118253A1 (en) 2019-06-20

Similar Documents

Publication Publication Date Title
US11030445B2 (en) Sorting and displaying digital notes on a digital whiteboard
CN109948044A (en) Document query based on vector nearest neighbor search
CN103339623B (en) It is related to the method and apparatus of Internet search
WO2018072071A1 (en) Knowledge map building system and method
US20160162476A1 (en) Methods and systems for modeling complex taxonomies with natural language understanding
US20010044800A1 (en) Internet organizer
CN107145496A (en) The method for being matched image with content item based on keyword
JP6346218B2 (en) Search method, apparatus and server for online trading platform
US8799257B1 (en) Searching based on audio and/or visual features of documents
CN107885873A (en) Method and apparatus for output information
CN103412903B (en) The Internet of Things real-time searching method and system predicted based on object of interest
CN112131295A (en) Data processing method and device based on Elasticissearch
CN107145497A (en) The method of the image of metadata selected and content matching based on image and content
CN109918594A (en) A kind of information display method and device
KR101446154B1 (en) System and method for searching semantic contents using user query expansion
Antunes et al. Context storage for m2m scenarios
US11314793B2 (en) Query processing
KR20240020166A (en) Method for learning machine-learning model with structured ESG data using ESG auxiliary tool and service server for generating automatically completed ESG documents with the machine-learning model
US20220027419A1 (en) Smart search and recommendation method for content, storage medium, and terminal
US9195940B2 (en) Jabba-type override for correcting or improving output of a model
KR101592670B1 (en) Apparatus for searching data using index and method for using the apparatus
CN110110199B (en) Information output method and device
CN107463570B (en) Document retrieval/analysis method and device
CN104657456B (en) A kind of multidimensional information searching system based on type
CN110110185A (en) A kind of method, equipment and storage medium extracting browser searches engine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination