CN109948044A - Document query based on vector nearest neighbor search - Google Patents
Document query based on vector nearest neighbor search Download PDFInfo
- Publication number
- CN109948044A CN109948044A CN201711343103.8A CN201711343103A CN109948044A CN 109948044 A CN109948044 A CN 109948044A CN 201711343103 A CN201711343103 A CN 201711343103A CN 109948044 A CN109948044 A CN 109948044A
- Authority
- CN
- China
- Prior art keywords
- vector
- document
- query
- retrieval
- library
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
Abstract
The technical solution of document query disclosed herein based on vector nearest neighbor search, vector approximation matching retrieval technique is applied in search engine, after carrying out semantic vector respectively by the way that content and web document will be inquired, in the way of vector approximation matching retrieval, obtain the web document close with inquiry content, so as to break through Symbol matching retrieval mode limitation, provide can preferably hold user intention retrieval service.
Description
Background technique
With the development of network technology, the function of search engine is stronger and stronger, and the content of search is also more and more abundant.It searches
Index, which is held up, also provides information for many application programs, is service necessary to many application programs.When high speed information develops
In generation, there are the web documents of magnanimity, and web document quantity also increases at high speed.At the same time, user is for information
Demand is constantly increasing.How to realize that the retrieval service that more quickly, efficiently and accurately my user is intended to is current
Search engine technique facing challenges always.
Summary of the invention
There is provided content of the embodiment of the present invention is to further retouch in will be described in detail below with the form introduction simplified
The some concepts stated.The content of present invention is not intended to the key features or essential features of mark claimed subject, also not purport
In the range for limiting claimed subject.
The technical solution of the disclosed document query based on vector nearest neighbor search answers vector approximation matching retrieval technique
It uses in search engine, after carrying out semantic vector respectively by the way that content and web document will be inquired, is matched using vector approximation
The mode of retrieval, obtain with the close web document of inquiry content, so as to break through Symbol matching retrieval mode limitation,
The retrieval service that user's intention can preferably be held is provided.
Above description is only the general introduction of disclosed technique scheme, in order to better understand the technological means of the disclosure,
And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects, features, and advantages of the present disclosure can
It is clearer and more comprehensible, below the special specific embodiment for lifting the disclosure.
Detailed description of the invention
Fig. 1 is the example block diagram of search engine system disclosed by the embodiments of the present invention;
Fig. 2 is the schematic diagram of one of query processing process of web document of the embodiment of the present invention;
Fig. 3 is the structural block diagram of one of web document query processing device of the embodiment of the present invention;
Fig. 4 is two schematic diagram of the query processing process of the web document of the embodiment of the present invention;
Fig. 5 is two structural block diagram of the web document query processing device of the embodiment of the present invention;
Fig. 6 is three schematic diagram of the query processing process of the web document of the embodiment of the present invention;
Fig. 7 is the block diagram for one of the system architecture of query processing of web document of the embodiment of the present invention;
Fig. 8 is four schematic diagram of the query processing process of the web document of the embodiment of the present invention;
Fig. 9 is five schematic diagram of the query processing process of the web document of the embodiment of the present invention;
Figure 10 is two block diagram of the query processing framework for web document of the embodiment of the present invention;
Figure 11 is three block diagram of the query processing framework for web document of the embodiment of the present invention;
Figure 12 is the vector nearest neighbor search based on CDSSM model of the embodiment of the present invention using exemplary schematic diagram;
Figure 13 is the block diagram of the electronic equipment of the embodiment of the present invention.
Specific embodiment
Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here
It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure
It is fully disclosed to those skilled in the art.
Herein, term " technology " may refer to such as (one or more) system, (one or more) method, computer
It is readable instruction, (one or more) module, algorithm, hardware logic (for example, field programmable gate array (FPGA)), dedicated integrated
Circuit (ASIC), Application Specific Standard Product (ASSP), system on chip (SOC), complex programmable logic equipment (CPLD) and/or above-mentioned
Context and in this document permitted (one or more) other technologies in the whole text.
Search engine technique has been widely used in various industries, in addition to the use of general web page access mode is searched
Other than index is held up, search engine is also associated in all kinds of APP (application program), provides various information search services for user.
User issues inquiry request to search engine, and search engine is according to the inquiry content for including in inquiry request
It is scanned in the web document of storage, obtains the web document with the inquiry content matching of user, and return to user.Search is drawn
Holding up is not merely to retrieve to web document, can also be directed to other kinds of document (such as message document, data file)
The scene retrieve.Herein, it is mainly illustrated by taking web document as an example.
Current file retrieval is the mode based on Symbol matching mostly to complete, in network search engines, more often
The mode seen is: the correlation text for a certain inquiry content is obtained based on the Symbol matching method of the inverted index of keyword
Shelves.The current retrieval mode based on Symbol matching can not better understand user's intention.Although in some search engines,
Some variations can be carried out to the original query of input, then be retrieved again, so that recall rate is improved, but this variation is also
It is very limited, especially when encountering some new concepts, it not can guarantee recall rate.
In document query technology proposed in this paper based on vector nearest neighbor search, the document in search engine is turned in advance
The document vector for turning to semantic vector form, by the inquiry content that user inputs be also converted to the inquiry of semantic vector form to
Then amount based on vector approximation matching retrieval, in document vector library (the document vector converted by multiple documents is constituted), is sought
Look for the approximate document vector of query vector, finally, corresponding document is got further according to the document vector found, as inquiry
As a result user is returned to.
Wherein, above-mentioned vector approximation matching retrieval specifically can search for (ANN, approximate using approximate KNN
Nearest neighbor search) technology.Since document and inquiry have been converted to the form of semantic vector, according to inquiry
Similarity between vector and document vector determines the document recalled, and this mode breaches the retrieval mode of Symbol matching
Limitation, better understood when the intention of user.
As shown in Figure 1, it is the example block diagram 100 of search engine system disclosed by the embodiments of the present invention, block diagram 100 is wrapped
It includes: user 101, the server 103 with search engine 102, the one or more databases 104 for storing web document, user
It is connected between 101 and server 103 by internet 105.Herein, user 101 may refer to the client of people, software form
End, example, in hardware client (such as desktop computer, laptop, mobile phone, tablet computer and it is other similar intelligence eventually
End), APP or other application server.
On the one hand, search engine 102 is round-the-clock identifies in mass data and grabs content, forms web document, storage
In database 104.Wherein, the content of web document may include: title, link, anchor, click data etc..On the other hand,
Search engine 102 receives the inquiry request of user 101, is examined in database 104 according to the inquiry content in inquiry request
Rope obtains the web document with inquiry content matching, is then returned to user 101.Inquiry request can be based on user in net
What the word content inputted in the search box of page generated, it is also possible to the query demand from APP according to user and generates.With
When the input inquiry content of family, it can also be inputted using voice by the way of text input and then be identified as text again
Form.Herein, no matter user's input inquiry content in what manner, finally can all be converted into and exist in the form of natural language
Inquiry content, thus to use technology described herein to be further processed.
The embodiment of the present invention improves the retrieval aspect for web document in search engine.It is related generally to
It is following aspects:
1) the web document query processing based on vector approximation matching retrieval
The embodiment of the present invention introduces vector approximation matching retrieval, will inquiry content and inquiry document be converted into it is semantic to
After amount, then carry out matching retrieval.To break through the limitation of Symbol matching retrieval, the intention of user is more fully understood.
2) the piecemeal processing of web document data
Since the data volume of web document is very huge, the embodiment of the present invention is first to web document data (by multiple webpages
Document is constituted) piecemeal processing is carried out, the conversion and index for then carrying out semantic vector are established.During retrieval, and
Parallel search is carried out to each web document data block respectively, then web document is merged again, forms output result.
3) vector index foundation and the application in query processing
In order to further increase vector approximation matching effectiveness of retrieval, vector index, the main work of vector index are established
With being quickly to navigate to that there may be matched web document vectors in region.
4) connected applications of vector approximation matching retrieval and inverted index retrieval
In order to advanced optimize to search result, the embodiment of the present invention uses inverted index retrieval and vector simultaneously
The webpage that approximate match retrieves both modes to carry out the retrieval of web document, and two kinds of retrieval modes is made full use of to get
Document generates final search result.
The improvement of this several respect will be described in detail respectively below.
The query processing of web document based on vector approximation matching retrieval
In embodiments of the present invention, the web document grabbed is converted web document vector in advance by search engine 102, and
It is stored in database 104, as the subsequent data basis for carrying out document query.On this basis, when search engine 102 receives
To after the inquiry request of user, inquiry content is extracted, then executes the web document of the embodiment of the present invention as shown in Figure 2
The schematic diagram 200 of one of query processing process, the treatment process include:
S201: query vector is generated according to inquiry content.Inquiry content exists in the form of natural language, by inquiry
The semantic of content carries out feature extraction to form query vector, can be based on inquiring content during generating query vector
Context extracts semantic feature, is intended to so as to preferably hold user.In addition, query vector and web document vector are
Based on identical semantic space generative semantics vector, in this way convenient for the vector between subsequent query vector and web document vector
Approximate match processing.
S202: vector approximation matching retrieval is executed, is obtained and the matched web document vector of query vector.Due in inquiry
Hold and web document exist all in the form of semantic vector, therefore, can by calculate web document vector and query vector it
Between similarity find the web document vector closest with query vector as query result.Vector mentioned herein is close
Algorithm basis like matching retrieval is exactly the similarity calculated between semantic vector, in same semantic space, any two language
The distance between adopted vector embodies the degree of closeness between two semantic vectors, calculates the algorithm of the distance between semantic vector
Have very much, such as the cosine degree of approximation, that is, calculates the cosine value of the angle between two semantic vectors, cosine value is smaller, two languages
The degree of approximation between adopted vector is higher.Wherein, vector approximation matching is retrieved can specifically be searched for using approximate KNN (ANN,
Approximate nearest neighbor search) it completes, more typical ANN algorithm includes: KD tree (KD-
Tree) algorithm, K neighborhood graph (KNN graph) algorithm, local sensitivity Hash (LSH, Locality Sensitive Hash) are calculated
Method etc..
S203: corresponding web document is obtained according to web document vector.Can be stored in the database web document to
Mapping relations between amount and web document can find the corresponding webpage text of the web document vector according to the mapping relations
Shelves.
Above-mentioned steps S201 to S203 describes the treatment process based on the inquiry matched web document of content retrieval.As before
Introduction is said in face, before this, needs web document converting web document vector.Therefore, in above-mentioned treatment process,
Before step S202, can also include:
S204: one or more web document vectors are generated according to the document content of document.Wherein, document content can wrap
It includes: title, link, anchor, click data.Can based in above-mentioned document content any one or multinomial combination from birth
Web document vector is generated, a document, which can correspond to, generates multiple web document vectors.When a document is corresponding, there are multiple
When web document vector, in above-mentioned step S102, when query vector is with any one web document Vectors matching, this is all thought
Document matches with inquiry content, can return the document as query result.Further, since search engine 102 is whole day
The content in ground crawl network is waited, and forms web document, therefore, the process of web document converting vector document is also constantly
In progress, when there is new web document, new web document is just converted to web document vector by search engine 102,
And it is added in database.
As shown in figure 3, its structural block diagram 300 for one of the web document query processing device of the embodiment of the present invention, on
The query processing for the web document stated can be completed by web document query processing device shown in Fig. 3, the processing unit
It can be set in above-mentioned search engine 102 comprising:
Query vector generation module 301, for generating query vector according to inquiry content;
Web document vector obtains module 302, for executing vector approximation matching retrieval, obtains matched with query vector
Web document vector;
Document obtains module 303, for obtaining corresponding document according to web document vector.
In addition, the device can also include web document vector generation module 304, for raw according to the document content of document
At one or more web document vectors.
During above-mentioned web document query processing, by being semantic vector by web document and inquiry Content Transformation
Form, and by for semantic vector execute vector approximation matching retrieval, can be carried out based on the similitude of semantic vector
It searches, it is available to web document vector approximate in vector space, Symbol matching is breached for the limitation of retrieval.And
And due to the retrieval based on semantic vector it includes characteristic element be not only vocabulary to be checked (single vocabulary or sentence
In vocabulary) itself, but may include characteristic element more abundant, so as to more fully understand the inquiry meaning of user
Figure improves recall rate.
The piecemeal of web document data is handled
The most basic query processing process based on vector approximation matching retrieval for forgoing describing the embodiment of the present invention,
In practical application, search engine 102 needs to handle the web document of magnanimity, herein, the collection that multiple web documents are formed
Zoarium is referred to as web document data.In face of the web document data being made of the web document of magnanimity, data volume is quite huge
Greatly, either storage or establishing index is all a huge engineering, and it is so huge and also not for data volume
The disconnected web document data increased, it is also quite time-consuming for carrying out the matching retrieval based on inquiry content.In this regard, of the invention
Embodiment proposes the system architecture for carrying out piecemeal processing for web document data and establishing index respectively, in such system
It unites on the basis of framework, same inquiry content is carried out in each web document data block to parallel query processing respectively, then will
The web document obtained from each web document data block is integrated, and final query result is formed.
As shown in figure 4, its two schematic diagram 400 for the query processing process of the web document of the embodiment of the present invention, base
In the above-mentioned system architecture that web document data are carried out with piecemeal, query processing process includes:
S401: query vector is generated according to inquiry content.
S402: according to query vector, vector approximation matching retrieval is executed in multiple web document vectors library, obtains and looks into
The web document vector of Vectors matching is ask, and according to web document vector, in the corresponding web document number in web document vector library
According in block, web document corresponding with web document vector is obtained.Vector approximation matching retrieval specifically can be using above-mentioned close
It is completed like nearest neighbor search (ANN).
S403: the web document got respectively from each web document data block is merged, is generated final
Query result.It is independent from each other between each web document data block, the net retrieved from each web document data block
The case where there is no repetitions between page document, and it is empty for also having the query result in the web document data block of part
Situation.The web document as intermediate queries result got from each web document data block can directly be closed
And it is exported as final search result.More preferred mode, can also be during merging treatment, to from each
Hybrid-sorting, selection and inquiry are once screened or carried out to the web document got in a web document data block
The immediate one or several web documents of content are as final query result.
As previously described, the preparation as query processing process needs in advance to huge web document number
According to progress piecemeal, and each web document is converted into web document vector, therefore, in above-mentioned treatment process, in step
Can also include: before rapid S401
S404: piecemeal processing is carried out to web document data, generates multiple web document data blocks.In practical applications,
It, can be in the webpage text for accumulating a certain size since search engine 102 can constantly grab webpage information and form web document
After file data, then piecemeal processing is carried out to it.
S405: handling multiple documents in each web document data block, generates and each web document data
The corresponding multiple web document vectors library of block, each web document vector library include and multiple documents in web document data block
Corresponding multiple web document vectors.
As shown in figure 5, its two structural block diagram 500 for the web document query processing device of the embodiment of the present invention, on
The query processing for the web document stated can be completed by web document query processing device shown in fig. 5, the processing unit
It can be set in above-mentioned search engine 102 comprising:
Query vector generation module 501, for generating query vector according to inquiry content;
Vector approximation matches retrieval module 502, for according to query vector, executed in multiple web document vectors library to
Approximate match retrieval, acquisition and the matched web document vector of query vector are measured, and according to document vector, in web document vector
In the corresponding web document data block in library, web document corresponding with web document vector is obtained;
Query result generation module 503, the web document for will be got respectively from each web document data block
It merges, generates final query result.
In addition, the device can also include that web document data are carried out with piecemeal processing and progress web document vector turn
The processing module of change, specifically includes:
Piecemeal processing module 504 carries out piecemeal processing to web document data, generates multiple web document data blocks;
Document vector library generation module 505, handles multiple web documents in each web document data block, raw
At multiple web document vectors library corresponding with each web document data block, each web document vector library includes and webpage text
The corresponding multiple web document vectors of multiple web documents in file data block, the corresponding one or more documents of each document
Vector.
Vector approximation can be matched the model of retrieval by carrying out piecemeal processing to web document data by the embodiment of the present invention
It encloses and narrows down in reasonable range, so as to more rapidly carry out vector approximation matching retrieval.
Vector index is established and the application in query processing
In order to more rapidly carry out vector approximation matching retrieval, the embodiment of the present invention to web document data into
On the basis of row piecemeal, the web document vector library also formed to each web document data block establishes vector index.Vector
The main function of index carries out subregion to each web document vector in web document vector library, thus executing query processing
During, query vector can quickly navigate to there may be matched web document vectors in the zone.In the present invention
In embodiment, in the vector index for just establish after piecemeal to web document data block, therefore, the scale phase of vector index
To smaller, so as to further improve the speed of Vectors matching retrieval.
On the basis of establishing vector index, as shown in fig. 6, its Directory Enquiries for the web document of the embodiment of the present invention
The schematic diagram 600 of the three of reason process in above-mentioned step S402, according to query vector, is held in multiple web document vectors library
Row vector approximate match retrieval, obtains the processing with the matched web document vector of query vector, can specifically include:
S601: according to the corresponding vector index of query vector and each document vector library, in each document vector library really
The region of fixed pending vector approximation matching retrieval;
S602: according to query vector, in determining region, vector approximation matching retrieval, acquisition and query vector are executed
Matched document vector.
As shown in fig. 7, it is the frame for one of the system architecture of query processing of web document of the embodiment of the present invention
Figure 70 0, block diagram 700 include query processor (Query Worker) 701, multiple retrieval process devices (Search Worker)
702, summary device (Aggregator) 703 and database 704 corresponding with each retrieval process device 702.
Vector index is established in the piecemeal processing for having carried out web document data and for web document vector library
On the basis of, query processor 701 will inquire content transformation to be replicated, being distributed to each retrieval process device after semantic vector
702, each retrieval process device 702 executes the retrieving of web document vector for each web document data block parallel, so
Backward summary device 703 exports the web document retrieved, the webpage text that the meeting of summary device 703 provides each retrieval process device 702
Shelves are ranked up, and are selected and are supplied to the immediate one or more web documents of inquiry content as final query result
User.
Each retrieval process device 702 corresponds to a database 704, corresponding for being stored in the retrieval process device 702
Web document data block and web document vector library, record has the vector index in web document vector library in retrieval process device 702.
The embodiment of the present invention can be rapidly by the process range contracting of vector approximation matching retrieval by establishing vector index
In the small specific region to web document vector library, thus the workload of the calculating degree of approximation between reducing vector, improve to
Measure approximate match effectiveness of retrieval.
The connected applications of vector approximation matching retrieval and inverted index retrieval
In order to preferably optimize to query result, inverted index is retrieved and is matched with vector approximation by the embodiment of the present invention
Retrieval combines, so that the advantage of two kinds of retrieval modes is made full use of, to further increase the accuracy of query result.
In embodiments of the present invention, inverted index is all that web document data are being carried out piecemeal as vector index
The index just established afterwards, inverted index is the index established for the web document in each web document database, and vector
Index is the index established for each web document vector in each web document vector library.
As shown in figure 8, its four schematic diagram 800 for the query processing process of the web document of the embodiment of the present invention, with
And as shown in figure 9, it is five schematic diagram 900 of the query processing process of the web document of the embodiment of the present invention.In the present invention
In embodiment, inverted index retrieval matches retrieval with vector approximation and executes parallel.Piecemeal is being carried out to web document data
On the basis of, analysis 801 first is carried out to inquiry content, it is parallel to execute inverted index retrieval for each web document data block
It matches and retrieves with vector approximation, obtain the web document retrieved based on inverted index respectively and based on vector approximation matching inspection
The web document that rope obtains specifically as shown in Figure 8 and Figure 9, executes respectively for inquiry content and extracts keyword 802 and life
At query vector 803, distributed inverted index retrieval 804 and distributed ANN vector index 805 are then executed respectively.Most
Afterwards, the processing merged for the web document obtained from each web document data block can during merging treatment
To be ranked up processing to the web document of acquisition, so that it is determined that eventually as the search result exported to user.About sequence
Processing, can use the following two kinds mode:
Mode one: as shown in figure 8, executing sequence processing 806 to the web document retrieved based on inverted index respectively
Sequence processing 807 is executed with to the web document obtained based on vector approximation matching retrieval, then by sequence processing 806 and sequence
The web document of 807 output of processing is ranked up processing 808 again, and the web document of 808 output of sequence processing is merged place
After reason 809, final query result is generated, then executes query result output 810.
Mode two: as shown in figure 9, being examined by the web document for matching retrieval acquisition by vector approximation and by inverted index
The web document that rope obtains carries out hybrid-sorting 901, and the web document that hybrid-sorting 901 exports then is merged processing
After 902, final query result is generated, then executes query result output 810.
It as shown in Figure 10, is two block diagram of the query processing framework for web document of the embodiment of the present invention
1000, the query processing process of above-mentioned web document can be completed based on processing framework shown in Fig. 10.In block diagram 1000,
Query processor (Query Worker) 1001 executes the processing that inquiry content transformation is semantic vector and according to inquiry content
After the processing for extracting keyword, it will replicate, distribute from the query vector after the keyword and conversion extracted in inquiry content
Give each retrieval process device (Search Worker).Wherein, retrieval process device is divided into two classes, and one kind is to execute vector approximation
Retrieval process device 1002 with retrieval, another kind of is the retrieval process device 1003 for executing inverted index retrieval.Sorting processor
1004 for being ranked up the web document obtained by vector approximation matching retrieval, and sorting processor 1005 is for passing through
The web document that inverted index retrieval obtains is ranked up, and sorting processor 1006 is used for sorting processor 1004 and sequence
It manages web document that device 1005 exports and is carrying out minor sort again, last summary device (Aggregator) 1007 is by sorting processor
After the web document of 1005 outputs merges processing, the query result that can finally be provided to user is generated.
It as shown in figure 11, is three block diagram of the query processing framework for web document of the embodiment of the present invention
1100.The query processing process of above-mentioned web document is also based on processing framework shown in Figure 11 to complete.In block diagram
1100, query processor 1101 will inquire content transformation be semantic vector and based on inquiry contents extraction go out keyword after, will
Query vector after the keyword extracted and conversion is replicated, and is distributed to each retrieval process device 1102, at each retrieval
Device 1102 is managed other than executing vector approximation matching retrieval, also execution inverted index retrieval.Each retrieval process device 1102 will
The web document of retrieval acquisition is matched by vector approximation and mixing is output to by the web document that inverted index retrieval obtains
Sorting processor 1103, hybrid-sorting processor 1103 will match the web document and pass through that retrieval obtains by vector approximation
It arranges the web document that indexed search obtains and carries out hybrid-sorting, then, summary device 1104 exports hybrid-sorting processor 1103
Web document merge processing after, generate and can finally be provided to the query result of user.
It, can be using such as LambdaRank (a kind of algorithm of study sequence) about the processing of the sequence to web document
Model or LambdaMart (a kind of algorithm of study sequence) model is handled.
It is used in combination by retrieving to match to retrieve with vector approximation inverted index, can fully utilize two class here
Retrieval mode a little, so as to obtain more accurate and more can understand query result that user is intended to.
The embodiment of application scenarios
Be described above the treatment process of the document query technology based on vector nearest neighbor search of the embodiment of the present invention with
And overall architecture.The technical solution of the embodiment of the present invention will be further illustrated by a concrete application example below.
It as shown in figure 12, is the embodiment of the present invention based on CDSSM (Convolutional Deep Structured
Semantic Models, the depth structure semantic model based on convolution) model vector nearest neighbor search using exemplary
Schematic diagram 1200.In the present embodiment, with original inquiry content for " coffee and teasouth melbourne " 1201
As an example, and assume to currently exist three web documents, wherein the URL (Uniform of web document 1202
Resoure Locator, uniform resource locator) it is " www truelocal com au find coffee tea vic
Melbourn city south melbourne ", the title (title) of web document 2 1203 are " coffee tea
Suppliers in south melbourne Melbourne city vic ", the click record of web document 1204
(click)"coffee beans supplier south melbourne".Click record mentioned here, which refers to, clicked this
The inquiry content of the corresponding web page interlinkage of web document, i.e. user input some inquiry content, and search engine returns some net
Page document, user click the web page interlinkage of the web document, have accessed corresponding webpage, and search engine can be by the inquiry content
It is recorded as the click record of the web document.
The vectorization and similitude matching of inquiry content and web document are realized in figure using CDSSM model.Such as figure
Shown in, original inquiry content and web document all pass through word insertion (word embedding) and deep neural network
(Deep Neural Network) carries out the conversion of semantic vector, in model shown in the figure, first using being based on three type matrixes
Formula (tri-letter) carries out word insertion (word embedding) 1208, then uses the depth structure language based on convolution again
Adopted model (CDSSM) 1209 (dimension that the d mark in figure generates vector) generates the semantic vector that dimension is 100.
As shown in the figure, on the basis of generating query vector 1205 and web document vector 1206, by inquiry
Cosine similarity is executed between each web document vector of vector sum calculates 1207 to select the most like highest web document of degree
As query result.
Implement example
In some instances, above-mentioned Fig. 1 to Figure 12 is related to one or more modules or one or more steps or
One or more treatment processes can also mutually be tied by software program with hardware circuit by software program, hardware circuit
The mode of conjunction is realized.For example, above-mentioned various components or module and one or more steps all can be in system on chip
(SoC) it is realized in.SoC can include: IC chip, the IC chip include following one or more: processing unit
(such as central processing unit (CPU), microcontroller, microprocessing unit, digital signal processing unit (DSP)), memory, one
Or the firmware of multiple communication interfaces, and/or further circuit and optional insertion for executing its function.
It as shown in figure 13, is the structural block diagram of the electronic equipment 1300 of inventive embodiments.Electronic equipment 1300 includes: to deposit
Reservoir 1301 and processor 1302.
Memory 1301, for storing program.In addition to above procedure, memory 1301 is also configured to store other
Various data are to support the operation on electronic equipment 1300.The example of these data includes for grasping on electronic equipment 1300
The instruction of any application or method of work, contact data, telephone book data, message, picture, video etc..
Memory 1301 can realize by any kind of volatibility or non-volatile memory device or their combination,
Such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable is read-only
Memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, disk
Or CD.
Memory 1301 is coupled to processor 1302 and includes the instruction being stored thereon, and described instruction is by handling
Device 1302 makes electronic equipment execute movement when executing, and as the embodiment of a kind of electronic equipment, which may include:
Query vector is generated according to inquiry content;
Vector approximation matching retrieval is executed, is obtained and the matched document vector of query vector;
Corresponding document is obtained according to document vector.
Wherein, vector approximation matching retrieval is executed, obtaining with the matched document vector of query vector may include: based on close
Like nearest neighbor search, obtain and the matched document vector of query vector.
As the embodiment of another electronic equipment, above-mentioned movement may include:
Query vector is generated according to inquiry content;
According to query vector, vector approximation matching retrieval is executed in multiple document vectors library, is obtained and query vector
The document vector matched, and according to the document vector, in the corresponding document data block in document vector library, obtain and document vector
Corresponding document;
The document got respectively from each document data block is merged, final query result is generated.
Wherein, according to query vector, vector approximation matching retrieval is executed in multiple document vectors library, obtain and inquire to
Flux matched document vector may include:
According to the corresponding vector index of query vector and each document vector library, determined in each document vector library into
The region of row vector approximate match retrieval;
According to query vector, in determining region, vector approximation matching retrieval is executed, is obtained matched with query vector
Document vector.
In addition, can also include: root by before being merged from the document got respectively in each document data block
It is investigated that asking content, in multiple document data blocks, inverted index retrieval is executed, and obtain document corresponding with inquiry content;
Correspondingly, the document got respectively from each document data block is merged, generates final inquiry knot
Fruit may include: that will match the document of retrieval acquisition by vector approximation and mixed by the document that inverted index retrieval obtains
Sequence is closed, processing is merged to document according to ranking results, carries out generating final query result.
For above-mentioned processing operation, detailed description has been carried out in the embodiment of method and apparatus in front, for
The detailed content of above-mentioned processing operation can equally be well applied in electronic equipment 1300, it can by what is mentioned in preceding embodiment
Specific processing operation is written in memory 1301 in a manner of program, and is executed by processor 1302.
Further, as shown in Figure 113, electronic equipment 1300 can also include: communication component 1303, power supply module 1304,
Other components such as audio component 1305, display 1306, chipset 107.Members are only schematically provided in Figure 13, and unexpectedly
Taste electronic equipment 1300 only include component shown in Figure 13.
Communication component 1303 is configured to facilitate the logical of wired or wireless way between electronic equipment 1300 and other equipment
Letter.Electronic equipment can access the wireless network based on communication standard, such as WiFi, 2G or 3G or their combination.Show at one
In example property embodiment, communication component 1303 receives broadcast singal or broadcast from external broadcasting management system via broadcast channel
Relevant information.In one exemplary embodiment, communication component 1303 further includes near-field communication (NFC) module, to promote short distance
Communication.For example, radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band can be based in NFC module
(UWB) technology, bluetooth (BT) technology and other technologies are realized.
Power supply module 1304 provides electric power for the various assemblies of electronic equipment.Power supply module 1304 may include power supply pipe
Reason system, one or more power supplys and other with for electronic equipment generate, manage, and distribute the associated component of electric power.
Audio component 1305 is configured as output and/or input audio signal.For example, audio component 1305 includes a wheat
Gram wind (MIC), when electronic equipment is in operation mode, when such as call mode, recording mode, and voice recognition mode, microphone quilt
It is configured to receive external audio signal.The received audio signal can be further stored in memory 1301 or via communication
Component 1303 is sent.In some embodiments, audio component 1305 further includes a loudspeaker, is used for output audio signal.
Display 1306 includes screen, and screen may include liquid crystal display (LCD) and touch panel (TP).If screen
Curtain includes touch panel, and screen may be implemented as touch screen, to receive input signal from the user.Touch panel includes one
A or multiple touch sensors are to sense the gesture on touch, slide, and touch panel.Touch sensor can not only sense touching
It touches or the boundary of sliding action, but also detects duration and pressure relevant with touch or slide.
Above-mentioned memory 1301, processor 1302, communication component 1303, power supply module 1304, audio component 1305 with
And display 1306 can be connect with chipset 1307.Chipset 1307 can be provided in processor 1302 and electronic equipment 1300
Remaining component between interface.In addition, chipset 1307 can also provide the various components in electronic equipment 1300 to storage
The communication interface mutually accessed between the access interface and various components of device 1301.
Example clause
A kind of A: method, comprising:
Query vector is generated according to inquiry content;
Vector approximation matching retrieval is executed, is obtained and the matched document vector of the query vector;
Corresponding document is obtained according to the document vector.
B: the method as described in paragraph A, wherein the execution vector approximation matching retrieval obtains and the query vector
Matched document vector includes:
It is searched for, is obtained and the matched document vector of the query vector based on approximate KNN.
C: the method as described in paragraph A, wherein the query vector and the document vector is based on identical semantic empty
Between generative semantics vector.
D: the method as described in paragraph A, wherein described to include: according to inquiry content generation query vector
The query vector is generated according to the context of the inquiry content.
E: the method as described in paragraph A, wherein further include:
Generate one or more document vectors according to the document content of document, the document content include: title, link,
Any one or multinomial combination in anchor, click data.
A kind of F: method, comprising:
Query vector is generated according to inquiry content;
According to the query vector, vector approximation matching retrieval is executed in multiple document vectors library, acquisition is looked into described
The document vector of Vectors matching is ask, and is obtained in the corresponding document data block in document vector library according to the document vector
Take document corresponding with the document vector;
The document got respectively from each document data block is merged, final query result is generated.
G: the method as described in paragraph F, wherein according to the query vector, execute vector in multiple document vectors library
Approximate match retrieval, acquisition include: with the matched document vector of the query vector
According to the corresponding vector index of the query vector and each document vector library, determined in each document vector library
The region of pending vector approximation matching retrieval;
According to the query vector, in the determining region, the vector approximation matching retrieval, acquisition and institute are executed
State the matched document vector of query vector.
H: the method as described in paragraph G, wherein closed by the document got respectively from each document data block
And before, further includes:
According to the inquiry content, in multiple document data blocks, execute inverted index retrieval, and obtain with it is described
Inquire the corresponding document of content;
It is described to merge the document got respectively from each document data block, generate final inquiry knot
Fruit includes:
For each document data block, the document and pass through inverted index that retrieval obtains will be matched by vector approximation
The document that retrieval obtains carries out hybrid-sorting, merges processing to the document according to ranking results, generate final
Query result.
I: the method as described in paragraph F, wherein further include:
Piecemeal processing is carried out to document data, generates multiple document data blocks;
Multiple documents in each document data block are handled, are generated corresponding with each document data block
Multiple document vector libraries, each document vector library includes right respectively with multiple documents in the document data block
The multiple document vectors answered, each corresponding one or more document vectors of the document.
J: the method as described in paragraph I, wherein further include:
Each document vector library is established respectively for carrying out subregion to each document vector in document vector library
The vector index.
K: a kind of electronic equipment, comprising:
Processing unit;And
Memory is coupled to the processing unit and includes the instruction being stored thereon, and described instruction is by described
Reason unit makes the equipment execute movement when executing, and the movement includes:
Query vector is generated according to inquiry content;
Vector approximation matching retrieval is executed, is obtained and the matched document vector of the query vector;
Corresponding document is obtained according to the document vector.
L: the electronic equipment as described in paragraph K, wherein the execution vector approximation matching retrieval obtains and the inquiry
The document vector of Vectors matching includes:
It is searched for, is obtained and the matched document vector of the query vector based on approximate KNN.
M: a kind of electronic equipment, comprising:
Processing unit;And
Memory is coupled to the processing unit and includes the instruction being stored thereon, and described instruction is by described
Reason unit makes the equipment execute movement when executing, and the movement includes:
Query vector is generated according to inquiry content;
According to the query vector, vector approximation matching retrieval is executed in multiple document vectors library, acquisition is looked into described
The document vector of Vectors matching is ask, and is obtained in the corresponding document data block in document vector library according to the document vector
Take document corresponding with the document vector;
The document got respectively from each document data block is merged, final query result is generated.
N: the electronic equipment as described in paragraph M, wherein
It is described to execute vector approximation matching in multiple document vectors library according to the query vector and retrieve, acquisition and institute
Stating the matched document vector of query vector includes:
According to the corresponding vector index of the query vector and each document vector library, determined in each document vector library
The region of pending vector approximation matching retrieval;
According to the query vector, in the determining region, the vector approximation matching retrieval, acquisition and institute are executed
State the matched document vector of query vector.
O: the electronic equipment as described in paragraph N, wherein the document that will be got respectively from each document data block into
Before row merges, further includes:
According to the inquiry content, in multiple document data blocks, execute inverted index retrieval, and obtain with it is described
Inquire the corresponding document of content;
For each document data block, the document and pass through inverted index that retrieval obtains will be matched by vector approximation
The document that retrieval obtains carries out hybrid-sorting, merges processing to the document according to ranking results, generate final
Query result.
A kind of P: device, comprising:
Query vector generation module, for generating query vector according to inquiry content;
Document vector obtains module, for executing vector approximation matching retrieval, obtains and the matched text of the query vector
Shelves vector;
Document obtains module, for obtaining corresponding document according to the document vector.
Q: the device as described in paragraph P, wherein the execution vector approximation matching retrieval obtains and the query vector
Matched document vector includes: to be searched for based on approximate KNN, is obtained and the matched document vector of the query vector.
A kind of R: device, comprising:
Query vector generation module, for generating query vector according to inquiry content;
Vector approximation matches retrieval module, for executing vector in multiple document vectors library according to the query vector
Approximate match retrieval, obtain with the matched document vector of the query vector, and according to the document vector, the document to
It measures in the corresponding document data block in library, obtains document corresponding with the document vector;
Query result generation module, for closing the document got respectively from each document data block
And generate final query result.
S: the device as described in paragraph R, wherein it is described according to the query vector, it is executed in multiple document vectors library
Vector approximation matching retrieval, acquisition include: with the matched document vector of the query vector
According to the corresponding vector index of the query vector and each document vector library, determined in each document vector library
The region of pending vector approximation matching retrieval;
According to the query vector, in the determining region, the vector approximation matching retrieval, acquisition and institute are executed
State the matched document vector of query vector.
T: the device as described in paragraph S, wherein further include multiple inverted index retrieval modules, for according to the inquiry
Content executes inverted index retrieval, and obtain document corresponding with the inquiry content in multiple document data blocks;
In the query result generation module, the document that will be got respectively from each document data block
It merges, generating final query result includes:
It is mixed by the document for matching retrieval acquisition by vector approximation and by the document that inverted index retrieval obtains
Sequence, merges processing to the document according to ranking results, carries out generating final query result.
Conclusion
Although this theme of the dedicated language description of structural features and or methods of action has been used, it is to be understood that appended power
Theme defined in sharp claim is not necessarily limited to described specific feature or action.But these specific features and movement are
It is disclosed as the illustrative form for realizing the claim.
Unless specifically stated otherwise, otherwise within a context be understood that and be used generally conditional statement (such as " energy ",
" can ", " possibility " or " can with ") indicate that particular example includes and other examples do not include special characteristic, element and/or step.
Therefore, such conditional statement is generally not intended to imply that requires feature, element for one or more examples in any way
And/or step, or one or more examples necessarily include inputting or mentioning for the logic of decision, with or without user
Show, whether to include or to execute these features, element and/or step in any specific embodiment.
Unless specifically stated otherwise, it should be understood that joint language (such as phrase " at least one in X, Y or Z ") indicates item, word
Language etc. can be any one of X, Y or Z, or combinations thereof.
Any customary description, element or frame should be understood to potentially in flow chart described in described herein and/or attached drawing
Expression include the code of one or more executable instructions for realizing logic function specific in the routine or element module,
Segment or part.Replacement example is included in the range of example described herein, and wherein each element or function can be deleted, or
It is inconsistently executed with sequence shown or discussed, including substantially simultaneously executes or execute in reverse order, this depends on
In related function, as those skilled in the art also will be understood that.
It should be emphasized that can to above-mentioned example, many modifications may be made and modification, element therein shows as other are acceptable
Example is understood that like that.All such modifications and variations are intended to include herein within the scope of this disclosure and by following right
Claim protection.
Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above-mentioned each method embodiment can lead to
The relevant hardware of program instruction is crossed to complete.Program above-mentioned can be stored in a computer readable storage medium.The journey
When being executed, execution includes the steps that above-mentioned each method embodiment to sequence;And storage medium above-mentioned include: ROM, RAM, magnetic disk or
The various media that can store program code such as person's CD.
Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent
Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to
So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into
Row equivalent replacement;And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution
The range of scheme.
Claims (20)
1. a kind of method, comprising:
Query vector is generated according to inquiry content;
Vector approximation matching retrieval is executed, is obtained and the matched document vector of the query vector;
Corresponding document is obtained according to the document vector.
2. according to the method described in claim 1, wherein, the execution vector approximation matching retrieval, obtain with it is described inquire to
Flux matched document vector includes:
It is searched for, is obtained and the matched document vector of the query vector based on approximate KNN.
3. according to the method described in claim 1, wherein, the query vector and the document vector are based on identical semanteme
Space generative semantics vector.
It is described query vector is generated according to inquiry content to include: 4. according to the method described in claim 1, wherein
The query vector is generated according to the context of the inquiry content.
5. according to the method described in claim 1, wherein, further includes:
One or more document vectors are generated according to the document content of document, the document content includes: title, link, anchor, point
Hit any one or multinomial combination in data.
6. a kind of method, comprising:
Query vector is generated according to inquiry content;
According to the query vector, vector approximation matching retrieval is executed in multiple document vectors library, obtain with it is described inquire to
Flux matched document vector, and according to the document vector, in the corresponding document data block in document vector library, obtain with
The corresponding document of the document vector;
The document got respectively from each document data block is merged, final query result is generated.
7. according to the method described in claim 6, wherein, according to the query vector, executed in multiple document vectors library to
Approximate match retrieval is measured, acquisition includes: with the matched document vector of the query vector
According to the corresponding vector index of the query vector and each document vector library, determined in each document vector library into
The region of row vector approximate match retrieval;
According to the query vector, in the determining region, the vector approximation matching retrieval is executed, acquisition is looked into described
Ask the document vector of Vectors matching.
8. according to the method described in claim 7, wherein, being carried out in the document that will be got respectively from each document data block
Before merging, further includes:
According to the inquiry content, in multiple document data blocks, inverted index retrieval is executed, and obtain and the inquiry
The corresponding document of content;
It is described to merge the document got respectively from each document data block, generate final query result packet
It includes:
Hybrid-sorting is carried out by the document for matching retrieval acquisition by vector approximation and by the document that inverted index retrieval obtains,
Processing is merged to the document according to ranking results, carries out generating final query result.
9. according to the method described in claim 6, wherein, further includes:
Piecemeal processing is carried out to document data, generates multiple document data blocks;
Multiple documents in each document data block are handled, are generated corresponding more with each document data block
A document vector library, each document vector library include corresponding with multiple documents in the document data block
Multiple document vectors, each corresponding one or more document vectors of the document.
10. according to the method described in claim 9, wherein, further includes:
Institute for carrying out subregion to each document vector in document vector library is established respectively to each document vector library
State vector index.
11. a kind of electronic equipment, comprising:
Processing unit;And
Memory is coupled to the processing unit and includes the instruction being stored thereon, and described instruction is single by the processing
Member makes the equipment execute movement when executing, and the movement includes:
Query vector is generated according to inquiry content;
Vector approximation matching retrieval is executed, is obtained and the matched document vector of the query vector;
Corresponding document is obtained according to the document vector.
12. electronic equipment according to claim 11, wherein execution vector approximation matching retrieval, obtain with it is described
The matched document vector of query vector includes:
It is searched for, is obtained and the matched document vector of the query vector based on approximate KNN.
13. a kind of electronic equipment, comprising:
Processing unit;And
Memory is coupled to the processing unit and includes the instruction being stored thereon, and described instruction is single by the processing
Member makes the equipment execute movement when executing, and the movement includes:
Query vector is generated according to inquiry content;
According to the query vector, vector approximation matching retrieval is executed in multiple document vectors library, obtain with it is described inquire to
Flux matched document vector, and according to the document vector, in the corresponding document data block in document vector library, obtain with
The corresponding document of the document vector;
The document got respectively from each document data block is merged, final query result is generated.
14. electronic equipment according to claim 13, wherein
Described that vector approximation matching retrieval is executed in multiple document vectors library according to the query vector, acquisition is looked into described
Ask Vectors matching document vector include:
According to the corresponding vector index of the query vector and each document vector library, determined in each document vector library into
The region of row vector approximate match retrieval;
According to the query vector, in the determining region, the vector approximation matching retrieval is executed, acquisition is looked into described
Ask the document vector of Vectors matching.
15. electronic equipment according to claim 14, wherein in the text that will be got respectively from each document data block
Before shelves merge, further includes:
According to the inquiry content, in multiple document data blocks, inverted index retrieval is executed, and obtain and the inquiry
The corresponding document of content;
It is described to merge the document got respectively from each document data block, generate final query result packet
It includes: carrying out hybrid-sorting by the document for matching retrieval acquisition by vector approximation and by the document that inverted index retrieval obtains,
Processing is merged to the document according to ranking results, carries out generating final query result.
16. a kind of device, comprising:
Query vector generation module, for generating query vector according to inquiry content;
Document vector obtain module, for execute vector approximation matching retrieval, obtain with the matched document of the query vector to
Amount;
Document obtains module, for obtaining corresponding document according to the document vector.
17. device according to claim 16, wherein the execution vector approximation matching retrieval obtains and the inquiry
The document vector of Vectors matching includes: to be searched for based on approximate KNN, is obtained and the matched document vector of the query vector.
18. a kind of device, comprising:
Query vector generation module, for generating query vector according to inquiry content;
Vector approximation matches retrieval module, for executing vector approximation in multiple document vectors library according to the query vector
Matching retrieval, acquisition and the matched document vector of the query vector, and according to the document vector, in document vector library
In corresponding document data block, document corresponding with the document vector is obtained;
Query result generation module, it is raw for merging the document got respectively from each document data block
At final query result.
19. device according to claim 18, wherein it is described according to the query vector, in multiple document vectors library
Vector approximation matching retrieval is executed, acquisition includes: with the matched document vector of the query vector
According to the corresponding vector index of the query vector and each document vector library, determined in each document vector library into
The region of row vector approximate match retrieval;
According to the query vector, in the determining region, the vector approximation matching retrieval is executed, acquisition is looked into described
Ask the document vector of Vectors matching.
20. device according to claim 19, wherein further include multiple inverted index retrieval modules, for according to
Content is inquired, in multiple document data blocks, executes inverted index retrieval, and obtain text corresponding with the inquiry content
Shelves;
It is described to carry out the document got respectively from each document data block in the query result generation module
Merge, generating final query result includes:
Hybrid-sorting is carried out by the document for matching retrieval acquisition by vector approximation and by the document that inverted index retrieval obtains,
Processing is merged to the document according to ranking results, carries out generating final query result.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711343103.8A CN109948044A (en) | 2017-12-14 | 2017-12-14 | Document query based on vector nearest neighbor search |
PCT/US2018/064146 WO2019118253A1 (en) | 2017-12-14 | 2018-12-06 | Document recall based on vector nearest neighbor search |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711343103.8A CN109948044A (en) | 2017-12-14 | 2017-12-14 | Document query based on vector nearest neighbor search |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109948044A true CN109948044A (en) | 2019-06-28 |
Family
ID=65199569
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711343103.8A Pending CN109948044A (en) | 2017-12-14 | 2017-12-14 | Document query based on vector nearest neighbor search |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109948044A (en) |
WO (1) | WO2019118253A1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111339241A (en) * | 2020-02-18 | 2020-06-26 | 北京百度网讯科技有限公司 | Question duplicate checking method and device and electronic equipment |
CN111339261A (en) * | 2020-03-17 | 2020-06-26 | 北京香侬慧语科技有限责任公司 | Document extraction method and system based on pre-training model |
CN111930880A (en) * | 2020-08-14 | 2020-11-13 | 易联众信息技术股份有限公司 | Text code retrieval method, device and medium |
US11354293B2 (en) | 2020-01-28 | 2022-06-07 | Here Global B.V. | Method and apparatus for indexing multi-dimensional records based upon similarity of the records |
CN115545853A (en) * | 2022-12-02 | 2022-12-30 | 云筑信息科技(成都)有限公司 | Searching method for searching suppliers |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7475071B1 (en) * | 2005-11-12 | 2009-01-06 | Google Inc. | Performing a parallel nearest-neighbor matching operation using a parallel hybrid spill tree |
CN101639831A (en) * | 2008-07-29 | 2010-02-03 | 华为技术有限公司 | Search method, search device and search system |
CN103136352A (en) * | 2013-02-27 | 2013-06-05 | 华中师范大学 | Full-text retrieval system based on two-level semantic analysis |
CN103838833A (en) * | 2014-02-24 | 2014-06-04 | 华中师范大学 | Full-text retrieval system based on semantic analysis of relevant words |
CN103838735A (en) * | 2012-11-21 | 2014-06-04 | 大连灵动科技发展有限公司 | Data retrieval method for improving retrieval efficiency and quality |
CN106909628A (en) * | 2017-01-24 | 2017-06-30 | 南京大学 | A kind of text similarity method based on interval |
-
2017
- 2017-12-14 CN CN201711343103.8A patent/CN109948044A/en active Pending
-
2018
- 2018-12-06 WO PCT/US2018/064146 patent/WO2019118253A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7475071B1 (en) * | 2005-11-12 | 2009-01-06 | Google Inc. | Performing a parallel nearest-neighbor matching operation using a parallel hybrid spill tree |
CN101639831A (en) * | 2008-07-29 | 2010-02-03 | 华为技术有限公司 | Search method, search device and search system |
CN103838735A (en) * | 2012-11-21 | 2014-06-04 | 大连灵动科技发展有限公司 | Data retrieval method for improving retrieval efficiency and quality |
CN103136352A (en) * | 2013-02-27 | 2013-06-05 | 华中师范大学 | Full-text retrieval system based on two-level semantic analysis |
CN103838833A (en) * | 2014-02-24 | 2014-06-04 | 华中师范大学 | Full-text retrieval system based on semantic analysis of relevant words |
CN106909628A (en) * | 2017-01-24 | 2017-06-30 | 南京大学 | A kind of text similarity method based on interval |
Non-Patent Citations (1)
Title |
---|
MUJA MARIUS ET AL: "Scalable Nearest Neighbor Algorithms for High Dimensional Data", 《IEEE COMPUTER SOCIETY》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11354293B2 (en) | 2020-01-28 | 2022-06-07 | Here Global B.V. | Method and apparatus for indexing multi-dimensional records based upon similarity of the records |
CN111339241A (en) * | 2020-02-18 | 2020-06-26 | 北京百度网讯科技有限公司 | Question duplicate checking method and device and electronic equipment |
CN111339241B (en) * | 2020-02-18 | 2024-02-13 | 北京百度网讯科技有限公司 | Problem duplicate checking method and device and electronic equipment |
CN111339261A (en) * | 2020-03-17 | 2020-06-26 | 北京香侬慧语科技有限责任公司 | Document extraction method and system based on pre-training model |
CN111930880A (en) * | 2020-08-14 | 2020-11-13 | 易联众信息技术股份有限公司 | Text code retrieval method, device and medium |
CN115545853A (en) * | 2022-12-02 | 2022-12-30 | 云筑信息科技(成都)有限公司 | Searching method for searching suppliers |
Also Published As
Publication number | Publication date |
---|---|
WO2019118253A1 (en) | 2019-06-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11030445B2 (en) | Sorting and displaying digital notes on a digital whiteboard | |
CN109948044A (en) | Document query based on vector nearest neighbor search | |
CN103339623B (en) | It is related to the method and apparatus of Internet search | |
WO2018072071A1 (en) | Knowledge map building system and method | |
US20160162476A1 (en) | Methods and systems for modeling complex taxonomies with natural language understanding | |
US20010044800A1 (en) | Internet organizer | |
CN107145496A (en) | The method for being matched image with content item based on keyword | |
JP6346218B2 (en) | Search method, apparatus and server for online trading platform | |
US8799257B1 (en) | Searching based on audio and/or visual features of documents | |
CN107885873A (en) | Method and apparatus for output information | |
CN103412903B (en) | The Internet of Things real-time searching method and system predicted based on object of interest | |
CN112131295A (en) | Data processing method and device based on Elasticissearch | |
CN107145497A (en) | The method of the image of metadata selected and content matching based on image and content | |
CN109918594A (en) | A kind of information display method and device | |
KR101446154B1 (en) | System and method for searching semantic contents using user query expansion | |
Antunes et al. | Context storage for m2m scenarios | |
US11314793B2 (en) | Query processing | |
KR20240020166A (en) | Method for learning machine-learning model with structured ESG data using ESG auxiliary tool and service server for generating automatically completed ESG documents with the machine-learning model | |
US20220027419A1 (en) | Smart search and recommendation method for content, storage medium, and terminal | |
US9195940B2 (en) | Jabba-type override for correcting or improving output of a model | |
KR101592670B1 (en) | Apparatus for searching data using index and method for using the apparatus | |
CN110110199B (en) | Information output method and device | |
CN107463570B (en) | Document retrieval/analysis method and device | |
CN104657456B (en) | A kind of multidimensional information searching system based on type | |
CN110110185A (en) | A kind of method, equipment and storage medium extracting browser searches engine |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |