CN116467494A - Vector data indexing method - Google Patents
Vector data indexing method Download PDFInfo
- Publication number
- CN116467494A CN116467494A CN202310729325.2A CN202310729325A CN116467494A CN 116467494 A CN116467494 A CN 116467494A CN 202310729325 A CN202310729325 A CN 202310729325A CN 116467494 A CN116467494 A CN 116467494A
- Authority
- CN
- China
- Prior art keywords
- vector
- database
- vectors
- data
- vector data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 239000013598 vector Substances 0.000 title claims abstract description 289
- 238000000034 method Methods 0.000 title claims abstract description 67
- 238000012549 training Methods 0.000 claims abstract description 48
- 238000012163 sequencing technique Methods 0.000 claims abstract description 6
- 238000005070 sampling Methods 0.000 claims description 11
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000010586 diagram Methods 0.000 description 3
- 238000007405 data analysis Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 239000002360 explosive Substances 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/906—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides an indexing method of vector data, which comprises the following steps: establishing a vector training model; inputting a query vector into a vector training model, and outputting a plurality of approximate result vectors by the vector training model; sequencing database vectors in a database according to the dimension size; selecting a plurality of database vectors which are respectively positioned around each approximate result vector from the ordered database vectors as first vector data; extracting ids of all first vector data, removing repeated ids, and extracting corresponding database vectors of the rest ids in the database to serve as second vector data; and calculating the distance between the second vector data and the query vector, and selecting a plurality of vector data with smaller distance from the second vector data. The indexing method optimizes the query process, improves the efficiency of an indexing algorithm, and also improves the accuracy of an indexing result.
Description
Technical Field
The invention relates to the technical field of data processing, in particular to an indexing method of vector data.
Background
With the explosive growth of unstructured data (such as images, video, and audio), unstructured data analysis is widely available in the rich context of real world applications. Many database systems began to incorporate unstructured data analysis to meet these needs. These unstructured data are eventually stored in the form of feature vectors in databases, so how to find the desired data in huge amounts of data has become a current research hotspot.
By means of the indexing method of the vector data, the wanted data can be found in the database according to the input query vector. In the vector indexing algorithm, some similarity comparisons are mainly used to obtain the desired data, and the comparison modes are mainly obtained by using distance calculation. In the indexing algorithm in the prior art, in order to improve the efficiency of the indexing algorithm, a method of using data clustering is almost adopted to realize optimization, such as IVFPQ, IVFFlat and other algorithms. The method comprises the steps of firstly clustering the original data, and then finding a clustering center, so that the comparison times of query vectors are reduced.
However, the prior art does not improve the efficiency of the indexing algorithm through query optimization, and the prior art has yet to improve the query efficiency.
Disclosure of Invention
The invention aims to provide an indexing method of vector data, which can optimize the query process, so that the efficiency of an indexing algorithm can be improved.
In order to achieve the above object, the present invention provides a method for indexing vector data, comprising:
establishing a vector training model;
inputting a query vector to the vector training model, wherein the vector training model outputs a plurality of approximate result vectors;
sequencing database vectors in a database according to the dimension size;
selecting a plurality of database vectors which are respectively positioned around each approximate result vector from the ordered database vectors as first vector data;
extracting all ids of the first vector data, removing repeated ids, and extracting database vectors corresponding to the residual ids in the database to serve as second vector data;
and calculating the distance between the second vector data and the query vector, and selecting a plurality of vector data with smaller distance from the second vector data.
Optionally, in the indexing method of vector data, the method for building a vector training model includes:
forming a generator;
forming a discriminator;
extracting a plurality of data samples from database vectors of a database, and training a discriminator by using the data samples and the noise samples;
a number of data samples are extracted from a database vector of a database, and a generator is trained using the data samples and noise samples.
Optionally, in the indexing method of vector data, the method for training the arbiter includes:
sampling m data samples from the database vector;
sampling m noise samples, and placing the m noise samples into a generator to generate m vectors;
and obtaining the distance between the m vectors and the m data sample distribution, and finishing the training of the discriminator when the distance is maximum.
Optionally, in the indexing method of vector data, the method of training the generator includes:
sampling m data samples from the database vector;
sampling m noise samples, and placing the m noise samples into a generator to generate m vectors;
and obtaining the distance between the m vectors and m data sample distribution, and finishing training of the generator when the distance is minimum.
Optionally, in the indexing method of vector data, a generator is trained once using a plurality of noise samples.
Optionally, in the indexing method of vector data, the database vector samples are used to train the multiple discriminators.
Optionally, in the indexing method of vector data, a query vector is input to the vector training model, and the vector training model outputs 10 approximate result vectors.
Optionally, in the method for indexing vector data, the database vectors are sorted from large to small in dimension, or the database vectors are sorted from small to large in dimension.
Optionally, in the method for indexing vector data, the method for sorting database vectors in the database according to dimension size includes: firstly, a table is formed by sorting according to the size of data in a first dimension, and then another table is formed by sorting according to the size of data in a second dimension.
Optionally, in the method for indexing vector data, selecting, from the sorted database vectors, a plurality of database vectors respectively located around each of the approximate result vectors, as the first vector data, the method includes: and selecting 10 128-dimensional database vectors respectively positioned above the database vector around each approximate result and 10 128-dimensional database vectors positioned below the database vector around each approximate result from the ordered database vectors.
Optionally, in the method for indexing vector data, 20 database vectors respectively located around each of the approximate result vectors are selected from the ordered database vectors as the first vector data.
Optionally, in the method for indexing vector data, euclidean distance calculation is performed on the second vector data and the query vector, and a plurality of vector data with smaller distance are selected from the second vector data.
The indexing method of vector data provided by the invention comprises the following steps: establishing a vector training model; inputting a query vector to the vector training model, wherein the vector training model outputs a plurality of approximate result vectors; sequencing database vectors in a database according to the dimension size; selecting a plurality of database vectors which are respectively positioned around each approximate result vector from the ordered database vectors as first vector data; extracting all ids of the first vector data, removing repeated ids, and extracting database vectors corresponding to the residual ids in the database to serve as second vector data; and calculating the distance between the second vector data and the query vector, and selecting a plurality of vector data with smaller distance from the second vector data. The indexing method optimizes the query process and improves the efficiency of the indexing algorithm.
Drawings
FIG. 1 is a flow chart of a method of indexing vector data according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of database vector storage;
FIG. 3 is a schematic diagram of a query vector;
FIG. 4 is a schematic diagram of ordered database vectors.
Detailed Description
Specific embodiments of the present invention will be described in more detail below with reference to the drawings. The advantages and features of the present invention will become more apparent from the following description. It should be noted that the drawings are in a very simplified form and are all to a non-precise scale, merely for convenience and clarity in aiding in the description of embodiments of the invention.
In the following, the terms "first," "second," and the like are used to distinguish between similar elements and are not necessarily used to describe a particular order or chronological order. It is to be understood that such terms so used are interchangeable under appropriate circumstances. Similarly, if a method described herein comprises a series of steps, and the order of the steps presented herein is not necessarily the only order in which the steps may be performed, and some of the described steps may be omitted and/or some other steps not described herein may be added to the method.
Referring to fig. 1, the present invention provides a vector data indexing method, which includes:
s11: establishing a vector training model;
s12: inputting a query vector into a vector training model, and outputting a plurality of approximate result vectors by the vector training model;
s13: sequencing database vectors in a database according to the dimension size;
s14: selecting a plurality of database vectors which are respectively positioned around each approximate result vector from the ordered database vectors as first vector data;
s15: extracting ids of all first vector data, removing repeated ids, and extracting corresponding database vectors of the rest ids in the database to serve as second vector data;
s16: and calculating the distance between the second vector data and the query vector, and selecting a plurality of vector data with smaller distance from the second vector data.
Preferably, the method for establishing the vector training model comprises the following steps: forming a generator; forming a discriminator; extracting a plurality of data samples from database vectors of a database, and training a discriminator by using the data samples and the noise samples; a number of data samples are extracted from a database vector of a database, and a generator is trained using the data samples and noise samples. Wherein the generator is trained once using a number of noise samples. Multiple discriminators are trained using database vector samples because the generator does not necessarily always produce vectors that are toward a more favorable direction, frequent updating of the generator can lead to model instability. Specifically, the vector training model in the embodiment of the present invention includes two networks, namely, a generator G and a arbiter D, where the generator and the arbiter of the vector training model train with database vectors in the database to generate a vector training model, and the vector training model is a GAN network countermeasure model. Generator G is a network that generates a vector that receives random noise z, by which the vector can be generatedDenoted as G (z). The discriminator D is a discrimination network for discriminating whether the vector is true or not. During the training process, the goal of the generator G is to generate a true vector to spoof the arbiter D. While the objective of the arbiter D is to distinguish as much as possible the vector generated by the generator G from the true vector. Ideally, generator G may generate vectors sufficient to "spurious true". It is difficult for the arbiter D to determine whether the vector generated by the generator G is true or not. The training arbiter D may use database vectors in the database. First sampling m data samples { x } from a database vector in a database 1 ,x 2 ,…x m }. Then, m noise samples { z } are sampled 1 ,z 2 ,…z m The sampling method adopts the prior art, m noise samples are put into a generator G to generate m vectors { r }, and 1 ,r 2 ,…,r m },r i =G(z i ) Training, gradient descent updating parameter theta d Such that the distance between the m vectors and the m data sample distributionAt maximum, training of the discriminant is completed, where D (x i ) Representing the probability that the arbiter D network determines whether the ith real data is real (because x is real, the closer this value is to 1 the better for arbiter D). And D (r) i ) Is the probability that the arbiter D network determines whether the i-th data generated by G is authentic. The training generator G comprises the following steps: sampling m data samples { x } from a database vector in a database 1 ,x 2 ,…x m Next, sample m noise samples { z 1 ,z 2 ,…z m M noise samples are put into a generator G to generate m vectors { r } 1 ,r 2 ,…,r m },r i =G(z i ) Next, training is performed, the gradient of which is reduced by the update parameter thetag so that the distance between the m vectors and the m data sample distributionAt minimum, complete generatorTraining of G, wherein D (x i ) Representing the probability that the arbiter D network determines whether the ith real data is real (because x is real, the closer this value is to 1 the better for arbiter D). And D (r) i ) Is the probability that the arbiter D network determines whether the i-th data generated by G is authentic.
In the embodiment of the invention, the query vector is input to the vector training model, and the vector training model of the embodiment of the invention can output a plurality of results by inputting one query vector.
The invention is used for searching the wanted data in the database vector of the database according to the input query vector, the wanted data are all in the database, and the database has massive data, so that some data can be input to query out the data similar to the input data. It is necessary to index similar vectors from databases that have a large number of database vectors. The vector data in the embodiment of the invention can be a data vector converted from data such as text data or picture data. The database vector may be a plurality of 128-dimensional (column) vector data lines, the lines numbered by ids, such as in fig. 2, the lines representing the database vectors and the id columns representing the ids corresponding to each database vector, wherein the numbers are for illustration only and are not values representing the database vectors. A query vector may be a 128-dimensional (column) vector of data, such as that of fig. 3, where the numbers are merely schematic, and are not values of the query vector. The database vectors are then sorted from big to small or from small to big in dimension. For example, first, one table is formed by ordering the data in the first dimension, then another table is formed by ordering the data in the second dimension, and so on for 128 tables. FIG. 4 is a sorted database vector table.
In the embodiment of the invention, 20 database vectors respectively positioned around each approximate result vector are selected from the ordered database vectors and used as first vector data. Since 10 approximation result vectors were previously selected, the database vectors around each approximation result vector would be selected in turn. The database vectors are divided into a plurality of rows, so the database vectors around the approximate result are the 10 128-dimensional database vectors above and the 10 128-dimensional database vectors below, so the embodiment of the invention has 128 database vector tables similar to those of fig. 4. So theoretically 1 x 128 x 20=2560 pieces of data will be obtained, and a total of 2560 x 10=25600 pieces of data will be obtained for 10 pieces of approximation result vectors. Some data may be repeatedly acquired and the first vector number should be less than or equal to 25600. At this time, the deduplication process is required, and since each vector data corresponds to one id, it is sufficient to remove the duplicate id and reserve one id according to whether or not the corresponding id is duplicated. The specific de-duplication method is the prior art, and will not be described here in detail.
In the embodiment of the invention, euclidean distance calculation is carried out on the second vector data and the query vector, and a plurality of vector data with smaller distance are selected from the second vector data. The embodiment of the invention adopts Euclidean distance calculation, and in other embodiments of the invention, other distance calculation methods can be adopted.
In the embodiment of the invention, the distance between the second vector data and the query vector is calculated, and 10 vector data with smaller distance are selected from the second vector data. The number of second vector data is plural, and thus, each distance from the query vector is required to be calculated, and the obtained distance is plural. The former pieces of data with smaller distances can be selected in order from small to large. In other embodiments of the present invention, other numbers of vector data with smaller distances, such as 100, may be selected, and the specific number may be set according to the required accuracy.
The indexing method of the vector data in the embodiment of the invention is applied to practice, and for similar results of inquiring 10 inquiry vectors in a database with 1000000 database vectors, the time delay of a single inquiry vector is 4.673ms, and the recall rate is 85.96%. Under the same conditions, several methods in the prior art, for example, the single delay of the Flat method is 144.146ms, and the recall rate is 100%; the single delay of the IVFFlat method is 80.821ms, and the recall rate is 68.73%; the single delay of the IVFPQ method was 7.436ms and the recall was 55.68%. Therefore, the method for indexing the vector data according to the embodiment of the invention has smaller time delay of a single query vector, so that the method for indexing the vector data according to the embodiment of the invention improves the efficiency of indexing the vector data. In summary, in the method for indexing vector data provided in the embodiment of the present invention, the method includes: establishing a vector training model; inputting a query vector into a vector training model, and outputting a plurality of approximate result vectors by the vector training model; sequencing database vectors in a database according to the dimension size; selecting a plurality of database vectors which are respectively positioned around each approximate result vector from the ordered database vectors as first vector data; extracting ids of all first vector data, removing repeated ids, and extracting corresponding database vectors of the rest ids in the database to serve as second vector data; and calculating the distance between the second vector data and the query vector, and selecting a plurality of vector data with smaller distance from the second vector data. The indexing method optimizes the query process, improves the efficiency of an indexing algorithm, and also improves the accuracy of an indexing result.
The foregoing is merely a preferred embodiment of the present invention and is not intended to limit the present invention in any way. Any person skilled in the art will make any equivalent substitution or modification to the technical solution and technical content disclosed in the invention without departing from the scope of the technical solution of the invention, and the technical solution of the invention is not departing from the scope of the invention.
Claims (12)
1. A method of indexing vector data, comprising:
establishing a vector training model;
inputting a query vector to the vector training model, wherein the vector training model outputs a plurality of approximate result vectors;
sequencing database vectors in a database according to the dimension size;
selecting a plurality of database vectors which are respectively positioned around each approximate result vector from the ordered database vectors as first vector data;
extracting all ids of the first vector data, removing repeated ids, and extracting database vectors corresponding to the residual ids in the database to serve as second vector data;
and calculating the distance between the second vector data and the query vector, and selecting a plurality of vector data with smaller distance from the second vector data.
2. The method of indexing vector data according to claim 1, wherein the method of building a vector training model comprises:
forming a generator;
forming a discriminator;
extracting a plurality of data samples from database vectors of a database, and training a discriminator by using the data samples and the noise samples;
a number of data samples are extracted from a database vector of a database, and a generator is trained using the data samples and noise samples.
3. The indexing method of vector data according to claim 2, the training discriminant method comprising:
sampling m data samples from the database vector;
sampling m noise samples, and placing the m noise samples into a generator to generate m vectors;
and obtaining the distance between the m vectors and the m data sample distribution, and finishing the training of the discriminator when the distance is maximum.
4. The indexing method of vector data according to claim 2, the training generator method comprising:
sampling m data samples from the database vector;
sampling m noise samples, and placing the m noise samples into a generator to generate m vectors;
and obtaining the distance between the m vectors and m data sample distribution, and finishing training of the generator when the distance is minimum.
5. The method of indexing vector data of claim 2 wherein the generator is trained once using a number of noise samples.
6. The method of indexing vector data of claim 2 wherein the multiple discriminators are trained using database vector samples.
7. The method of indexing vector data according to claim 1, wherein a query vector is input to the vector training model, and the vector training model outputs 10 approximate result vectors.
8. The method of indexing vector data according to claim 1, wherein the database vectors are ordered from large to small in dimension or from small to large in dimension.
9. The method of indexing vector data as claimed in claim 1, wherein the method of ordering database vectors in the database by dimension size comprises: firstly, a table is formed by sorting according to the size of data in a first dimension, and then another table is formed by sorting according to the size of data in a second dimension.
10. The method of indexing vector data according to claim 9, wherein selecting, as the first vector data, a plurality of database vectors respectively located around each of the approximate result vectors from the sorted database vectors, comprises: and selecting 10 128-dimensional database vectors respectively positioned above the database vector around each approximate result and 10 128-dimensional database vectors positioned below the database vector around each approximate result from the ordered database vectors.
11. The method of indexing vector data according to claim 1, wherein 20 database vectors respectively located around each of the approximate result vectors are selected from the sorted database vectors as the first vector data.
12. The indexing method of vector data according to claim 1, wherein the second vector data is subjected to euclidean distance calculation with the query vector, and a plurality of pieces of vector data having smaller distances are selected from the second vector data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310729325.2A CN116467494B (en) | 2023-06-20 | 2023-06-20 | Vector data indexing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310729325.2A CN116467494B (en) | 2023-06-20 | 2023-06-20 | Vector data indexing method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116467494A true CN116467494A (en) | 2023-07-21 |
CN116467494B CN116467494B (en) | 2023-08-29 |
Family
ID=87185185
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310729325.2A Active CN116467494B (en) | 2023-06-20 | 2023-06-20 | Vector data indexing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116467494B (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2000062251A1 (en) * | 1999-04-09 | 2000-10-19 | Merck & Co., Inc. | Chemical structure similarity ranking system and computer-implemented method for same |
CN103336795A (en) * | 2013-06-09 | 2013-10-02 | 华中科技大学 | Video indexing method based on multiple features |
CN109960737A (en) * | 2019-03-15 | 2019-07-02 | 西安电子科技大学 | Remote Sensing Images search method of the semi-supervised depth confrontation from coding Hash study |
CN110941754A (en) * | 2018-09-21 | 2020-03-31 | 微软技术许可有限责任公司 | Vector nearest neighbor search strategy based on reinforcement learning generation |
CN113377973A (en) * | 2021-06-10 | 2021-09-10 | 电子科技大学 | Article recommendation method based on countermeasures hash |
CN113761311A (en) * | 2021-01-28 | 2021-12-07 | 北京沃东天骏信息技术有限公司 | Information retrieval method, device, server and readable storage medium |
CN114691940A (en) * | 2022-04-18 | 2022-07-01 | 上海徐毓智能科技有限公司 | Index construction method and device, vector search method and retrieval system |
CN114860868A (en) * | 2022-03-08 | 2022-08-05 | 中国海洋大学 | Semantic similarity vector re-sparse coding indexing and retrieval method |
CN115630141A (en) * | 2022-11-11 | 2023-01-20 | 杭州电子科技大学 | Scientific and technological expert retrieval method based on community query and high-dimensional vector retrieval |
CN115934724A (en) * | 2022-12-19 | 2023-04-07 | 北京百度网讯科技有限公司 | Method for constructing database index, retrieval method, device, equipment and medium |
-
2023
- 2023-06-20 CN CN202310729325.2A patent/CN116467494B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2000062251A1 (en) * | 1999-04-09 | 2000-10-19 | Merck & Co., Inc. | Chemical structure similarity ranking system and computer-implemented method for same |
CN103336795A (en) * | 2013-06-09 | 2013-10-02 | 华中科技大学 | Video indexing method based on multiple features |
CN110941754A (en) * | 2018-09-21 | 2020-03-31 | 微软技术许可有限责任公司 | Vector nearest neighbor search strategy based on reinforcement learning generation |
CN109960737A (en) * | 2019-03-15 | 2019-07-02 | 西安电子科技大学 | Remote Sensing Images search method of the semi-supervised depth confrontation from coding Hash study |
CN113761311A (en) * | 2021-01-28 | 2021-12-07 | 北京沃东天骏信息技术有限公司 | Information retrieval method, device, server and readable storage medium |
CN113377973A (en) * | 2021-06-10 | 2021-09-10 | 电子科技大学 | Article recommendation method based on countermeasures hash |
CN114860868A (en) * | 2022-03-08 | 2022-08-05 | 中国海洋大学 | Semantic similarity vector re-sparse coding indexing and retrieval method |
CN114691940A (en) * | 2022-04-18 | 2022-07-01 | 上海徐毓智能科技有限公司 | Index construction method and device, vector search method and retrieval system |
CN115630141A (en) * | 2022-11-11 | 2023-01-20 | 杭州电子科技大学 | Scientific and technological expert retrieval method based on community query and high-dimensional vector retrieval |
CN115934724A (en) * | 2022-12-19 | 2023-04-07 | 北京百度网讯科技有限公司 | Method for constructing database index, retrieval method, device, equipment and medium |
Non-Patent Citations (3)
Title |
---|
Y WANG: "a sematic indexing structure for image retrieval", ARXIV * |
毛轶事: "一种基于向量空间模型的信息检索算法研究", 通讯世界 * |
胡不归: "图像检索:向量索引", Retrieved from the Internet <URL:https://zhuanlan.zhihu.com/p/627360179?utm_id=0> * |
Also Published As
Publication number | Publication date |
---|---|
CN116467494B (en) | 2023-08-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3752930B1 (en) | Random draw forest index structure for searching large scale unstructured data | |
Chen | K-nearest neighbor algorithm optimization in text categorization | |
CN112417381B (en) | Method and device for rapidly positioning infringement image applied to image copyright protection | |
CN102693299A (en) | System and method for parallel video copy detection | |
CN102129451A (en) | Method for clustering data in image retrieval system | |
CN112884005B (en) | Image retrieval method and device based on SPTAG and convolutional neural network | |
CN107291895B (en) | Quick hierarchical document query method | |
CN112527948B (en) | Sentence-level index-based real-time data deduplication method and system | |
CN108763295A (en) | A kind of video approximate copy searching algorithm based on deep learning | |
CN111104398A (en) | Detection method and elimination method for approximate repeated record of intelligent ship | |
CN113886587A (en) | Data classification method based on deep learning and map building method | |
CN114297415A (en) | Multi-source heterogeneous data storage method and retrieval method for full media data space | |
CN104317946A (en) | Multi-key image-based image content retrieval method | |
CN103020321A (en) | Neighbor searching method and neighbor searching system | |
CN111125396A (en) | Image retrieval method of single-model multi-branch structure | |
CN116467494B (en) | Vector data indexing method | |
Sahbudin et al. | MongoDB clustering using K-means for real-time song recognition | |
JP4215386B2 (en) | Similar object search method and similar object search device | |
CN111414863B (en) | Enhanced integrated remote sensing image classification method | |
CN104978395A (en) | Vision dictionary construction and application method and apparatus | |
CN114610941A (en) | Cultural relic image retrieval system based on comparison learning | |
CN114880690A (en) | Source data time sequence refinement method based on edge calculation | |
CN110704651A (en) | Rapid fingerprint database retrieval method based on core detail node support system | |
McConville et al. | Accelerating large scale centroid-based clustering with locality sensitive hashing | |
CN117708262B (en) | Method and device for carrying out data association on multidimensional and multi-source data and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |