CN116467494A - Vector data indexing method - Google Patents

Vector data indexing method Download PDF

Info

Publication number
CN116467494A
CN116467494A CN202310729325.2A CN202310729325A CN116467494A CN 116467494 A CN116467494 A CN 116467494A CN 202310729325 A CN202310729325 A CN 202310729325A CN 116467494 A CN116467494 A CN 116467494A
Authority
CN
China
Prior art keywords
vector
database
vectors
data
vector data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310729325.2A
Other languages
Chinese (zh)
Other versions
CN116467494B (en
Inventor
王鑫炜
苏鹏
李剑楠
李恒
黄炎
阎虎青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Aikesheng Information Technology Co ltd
Original Assignee
Shanghai Aikesheng Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Aikesheng Information Technology Co ltd filed Critical Shanghai Aikesheng Information Technology Co ltd
Priority to CN202310729325.2A priority Critical patent/CN116467494B/en
Publication of CN116467494A publication Critical patent/CN116467494A/en
Application granted granted Critical
Publication of CN116467494B publication Critical patent/CN116467494B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an indexing method of vector data, which comprises the following steps: establishing a vector training model; inputting a query vector into a vector training model, and outputting a plurality of approximate result vectors by the vector training model; sequencing database vectors in a database according to the dimension size; selecting a plurality of database vectors which are respectively positioned around each approximate result vector from the ordered database vectors as first vector data; extracting ids of all first vector data, removing repeated ids, and extracting corresponding database vectors of the rest ids in the database to serve as second vector data; and calculating the distance between the second vector data and the query vector, and selecting a plurality of vector data with smaller distance from the second vector data. The indexing method optimizes the query process, improves the efficiency of an indexing algorithm, and also improves the accuracy of an indexing result.

Description

Vector data indexing method
Technical Field
The invention relates to the technical field of data processing, in particular to an indexing method of vector data.
Background
With the explosive growth of unstructured data (such as images, video, and audio), unstructured data analysis is widely available in the rich context of real world applications. Many database systems began to incorporate unstructured data analysis to meet these needs. These unstructured data are eventually stored in the form of feature vectors in databases, so how to find the desired data in huge amounts of data has become a current research hotspot.
By means of the indexing method of the vector data, the wanted data can be found in the database according to the input query vector. In the vector indexing algorithm, some similarity comparisons are mainly used to obtain the desired data, and the comparison modes are mainly obtained by using distance calculation. In the indexing algorithm in the prior art, in order to improve the efficiency of the indexing algorithm, a method of using data clustering is almost adopted to realize optimization, such as IVFPQ, IVFFlat and other algorithms. The method comprises the steps of firstly clustering the original data, and then finding a clustering center, so that the comparison times of query vectors are reduced.
However, the prior art does not improve the efficiency of the indexing algorithm through query optimization, and the prior art has yet to improve the query efficiency.
Disclosure of Invention
The invention aims to provide an indexing method of vector data, which can optimize the query process, so that the efficiency of an indexing algorithm can be improved.
In order to achieve the above object, the present invention provides a method for indexing vector data, comprising:
establishing a vector training model;
inputting a query vector to the vector training model, wherein the vector training model outputs a plurality of approximate result vectors;
sequencing database vectors in a database according to the dimension size;
selecting a plurality of database vectors which are respectively positioned around each approximate result vector from the ordered database vectors as first vector data;
extracting all ids of the first vector data, removing repeated ids, and extracting database vectors corresponding to the residual ids in the database to serve as second vector data;
and calculating the distance between the second vector data and the query vector, and selecting a plurality of vector data with smaller distance from the second vector data.
Optionally, in the indexing method of vector data, the method for building a vector training model includes:
forming a generator;
forming a discriminator;
extracting a plurality of data samples from database vectors of a database, and training a discriminator by using the data samples and the noise samples;
a number of data samples are extracted from a database vector of a database, and a generator is trained using the data samples and noise samples.
Optionally, in the indexing method of vector data, the method for training the arbiter includes:
sampling m data samples from the database vector;
sampling m noise samples, and placing the m noise samples into a generator to generate m vectors;
and obtaining the distance between the m vectors and the m data sample distribution, and finishing the training of the discriminator when the distance is maximum.
Optionally, in the indexing method of vector data, the method of training the generator includes:
sampling m data samples from the database vector;
sampling m noise samples, and placing the m noise samples into a generator to generate m vectors;
and obtaining the distance between the m vectors and m data sample distribution, and finishing training of the generator when the distance is minimum.
Optionally, in the indexing method of vector data, a generator is trained once using a plurality of noise samples.
Optionally, in the indexing method of vector data, the database vector samples are used to train the multiple discriminators.
Optionally, in the indexing method of vector data, a query vector is input to the vector training model, and the vector training model outputs 10 approximate result vectors.
Optionally, in the method for indexing vector data, the database vectors are sorted from large to small in dimension, or the database vectors are sorted from small to large in dimension.
Optionally, in the method for indexing vector data, the method for sorting database vectors in the database according to dimension size includes: firstly, a table is formed by sorting according to the size of data in a first dimension, and then another table is formed by sorting according to the size of data in a second dimension.
Optionally, in the method for indexing vector data, selecting, from the sorted database vectors, a plurality of database vectors respectively located around each of the approximate result vectors, as the first vector data, the method includes: and selecting 10 128-dimensional database vectors respectively positioned above the database vector around each approximate result and 10 128-dimensional database vectors positioned below the database vector around each approximate result from the ordered database vectors.
Optionally, in the method for indexing vector data, 20 database vectors respectively located around each of the approximate result vectors are selected from the ordered database vectors as the first vector data.
Optionally, in the method for indexing vector data, euclidean distance calculation is performed on the second vector data and the query vector, and a plurality of vector data with smaller distance are selected from the second vector data.
The indexing method of vector data provided by the invention comprises the following steps: establishing a vector training model; inputting a query vector to the vector training model, wherein the vector training model outputs a plurality of approximate result vectors; sequencing database vectors in a database according to the dimension size; selecting a plurality of database vectors which are respectively positioned around each approximate result vector from the ordered database vectors as first vector data; extracting all ids of the first vector data, removing repeated ids, and extracting database vectors corresponding to the residual ids in the database to serve as second vector data; and calculating the distance between the second vector data and the query vector, and selecting a plurality of vector data with smaller distance from the second vector data. The indexing method optimizes the query process and improves the efficiency of the indexing algorithm.
Drawings
FIG. 1 is a flow chart of a method of indexing vector data according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of database vector storage;
FIG. 3 is a schematic diagram of a query vector;
FIG. 4 is a schematic diagram of ordered database vectors.
Detailed Description
Specific embodiments of the present invention will be described in more detail below with reference to the drawings. The advantages and features of the present invention will become more apparent from the following description. It should be noted that the drawings are in a very simplified form and are all to a non-precise scale, merely for convenience and clarity in aiding in the description of embodiments of the invention.
In the following, the terms "first," "second," and the like are used to distinguish between similar elements and are not necessarily used to describe a particular order or chronological order. It is to be understood that such terms so used are interchangeable under appropriate circumstances. Similarly, if a method described herein comprises a series of steps, and the order of the steps presented herein is not necessarily the only order in which the steps may be performed, and some of the described steps may be omitted and/or some other steps not described herein may be added to the method.
Referring to fig. 1, the present invention provides a vector data indexing method, which includes:
s11: establishing a vector training model;
s12: inputting a query vector into a vector training model, and outputting a plurality of approximate result vectors by the vector training model;
s13: sequencing database vectors in a database according to the dimension size;
s14: selecting a plurality of database vectors which are respectively positioned around each approximate result vector from the ordered database vectors as first vector data;
s15: extracting ids of all first vector data, removing repeated ids, and extracting corresponding database vectors of the rest ids in the database to serve as second vector data;
s16: and calculating the distance between the second vector data and the query vector, and selecting a plurality of vector data with smaller distance from the second vector data.
Preferably, the method for establishing the vector training model comprises the following steps: forming a generator; forming a discriminator; extracting a plurality of data samples from database vectors of a database, and training a discriminator by using the data samples and the noise samples; a number of data samples are extracted from a database vector of a database, and a generator is trained using the data samples and noise samples. Wherein the generator is trained once using a number of noise samples. Multiple discriminators are trained using database vector samples because the generator does not necessarily always produce vectors that are toward a more favorable direction, frequent updating of the generator can lead to model instability. Specifically, the vector training model in the embodiment of the present invention includes two networks, namely, a generator G and a arbiter D, where the generator and the arbiter of the vector training model train with database vectors in the database to generate a vector training model, and the vector training model is a GAN network countermeasure model. Generator G is a network that generates a vector that receives random noise z, by which the vector can be generatedDenoted as G (z). The discriminator D is a discrimination network for discriminating whether the vector is true or not. During the training process, the goal of the generator G is to generate a true vector to spoof the arbiter D. While the objective of the arbiter D is to distinguish as much as possible the vector generated by the generator G from the true vector. Ideally, generator G may generate vectors sufficient to "spurious true". It is difficult for the arbiter D to determine whether the vector generated by the generator G is true or not. The training arbiter D may use database vectors in the database. First sampling m data samples { x } from a database vector in a database 1 ,x 2 ,…x m }. Then, m noise samples { z } are sampled 1 ,z 2 ,…z m The sampling method adopts the prior art, m noise samples are put into a generator G to generate m vectors { r }, and 1 ,r 2 ,…,r m },r i =G(z i ) Training, gradient descent updating parameter theta d Such that the distance between the m vectors and the m data sample distributionAt maximum, training of the discriminant is completed, where D (x i ) Representing the probability that the arbiter D network determines whether the ith real data is real (because x is real, the closer this value is to 1 the better for arbiter D). And D (r) i ) Is the probability that the arbiter D network determines whether the i-th data generated by G is authentic. The training generator G comprises the following steps: sampling m data samples { x } from a database vector in a database 1 ,x 2 ,…x m Next, sample m noise samples { z 1 ,z 2 ,…z m M noise samples are put into a generator G to generate m vectors { r } 1 ,r 2 ,…,r m },r i =G(z i ) Next, training is performed, the gradient of which is reduced by the update parameter thetag so that the distance between the m vectors and the m data sample distributionAt minimum, complete generatorTraining of G, wherein D (x i ) Representing the probability that the arbiter D network determines whether the ith real data is real (because x is real, the closer this value is to 1 the better for arbiter D). And D (r) i ) Is the probability that the arbiter D network determines whether the i-th data generated by G is authentic.
In the embodiment of the invention, the query vector is input to the vector training model, and the vector training model of the embodiment of the invention can output a plurality of results by inputting one query vector.
The invention is used for searching the wanted data in the database vector of the database according to the input query vector, the wanted data are all in the database, and the database has massive data, so that some data can be input to query out the data similar to the input data. It is necessary to index similar vectors from databases that have a large number of database vectors. The vector data in the embodiment of the invention can be a data vector converted from data such as text data or picture data. The database vector may be a plurality of 128-dimensional (column) vector data lines, the lines numbered by ids, such as in fig. 2, the lines representing the database vectors and the id columns representing the ids corresponding to each database vector, wherein the numbers are for illustration only and are not values representing the database vectors. A query vector may be a 128-dimensional (column) vector of data, such as that of fig. 3, where the numbers are merely schematic, and are not values of the query vector. The database vectors are then sorted from big to small or from small to big in dimension. For example, first, one table is formed by ordering the data in the first dimension, then another table is formed by ordering the data in the second dimension, and so on for 128 tables. FIG. 4 is a sorted database vector table.
In the embodiment of the invention, 20 database vectors respectively positioned around each approximate result vector are selected from the ordered database vectors and used as first vector data. Since 10 approximation result vectors were previously selected, the database vectors around each approximation result vector would be selected in turn. The database vectors are divided into a plurality of rows, so the database vectors around the approximate result are the 10 128-dimensional database vectors above and the 10 128-dimensional database vectors below, so the embodiment of the invention has 128 database vector tables similar to those of fig. 4. So theoretically 1 x 128 x 20=2560 pieces of data will be obtained, and a total of 2560 x 10=25600 pieces of data will be obtained for 10 pieces of approximation result vectors. Some data may be repeatedly acquired and the first vector number should be less than or equal to 25600. At this time, the deduplication process is required, and since each vector data corresponds to one id, it is sufficient to remove the duplicate id and reserve one id according to whether or not the corresponding id is duplicated. The specific de-duplication method is the prior art, and will not be described here in detail.
In the embodiment of the invention, euclidean distance calculation is carried out on the second vector data and the query vector, and a plurality of vector data with smaller distance are selected from the second vector data. The embodiment of the invention adopts Euclidean distance calculation, and in other embodiments of the invention, other distance calculation methods can be adopted.
In the embodiment of the invention, the distance between the second vector data and the query vector is calculated, and 10 vector data with smaller distance are selected from the second vector data. The number of second vector data is plural, and thus, each distance from the query vector is required to be calculated, and the obtained distance is plural. The former pieces of data with smaller distances can be selected in order from small to large. In other embodiments of the present invention, other numbers of vector data with smaller distances, such as 100, may be selected, and the specific number may be set according to the required accuracy.
The indexing method of the vector data in the embodiment of the invention is applied to practice, and for similar results of inquiring 10 inquiry vectors in a database with 1000000 database vectors, the time delay of a single inquiry vector is 4.673ms, and the recall rate is 85.96%. Under the same conditions, several methods in the prior art, for example, the single delay of the Flat method is 144.146ms, and the recall rate is 100%; the single delay of the IVFFlat method is 80.821ms, and the recall rate is 68.73%; the single delay of the IVFPQ method was 7.436ms and the recall was 55.68%. Therefore, the method for indexing the vector data according to the embodiment of the invention has smaller time delay of a single query vector, so that the method for indexing the vector data according to the embodiment of the invention improves the efficiency of indexing the vector data. In summary, in the method for indexing vector data provided in the embodiment of the present invention, the method includes: establishing a vector training model; inputting a query vector into a vector training model, and outputting a plurality of approximate result vectors by the vector training model; sequencing database vectors in a database according to the dimension size; selecting a plurality of database vectors which are respectively positioned around each approximate result vector from the ordered database vectors as first vector data; extracting ids of all first vector data, removing repeated ids, and extracting corresponding database vectors of the rest ids in the database to serve as second vector data; and calculating the distance between the second vector data and the query vector, and selecting a plurality of vector data with smaller distance from the second vector data. The indexing method optimizes the query process, improves the efficiency of an indexing algorithm, and also improves the accuracy of an indexing result.
The foregoing is merely a preferred embodiment of the present invention and is not intended to limit the present invention in any way. Any person skilled in the art will make any equivalent substitution or modification to the technical solution and technical content disclosed in the invention without departing from the scope of the technical solution of the invention, and the technical solution of the invention is not departing from the scope of the invention.

Claims (12)

1. A method of indexing vector data, comprising:
establishing a vector training model;
inputting a query vector to the vector training model, wherein the vector training model outputs a plurality of approximate result vectors;
sequencing database vectors in a database according to the dimension size;
selecting a plurality of database vectors which are respectively positioned around each approximate result vector from the ordered database vectors as first vector data;
extracting all ids of the first vector data, removing repeated ids, and extracting database vectors corresponding to the residual ids in the database to serve as second vector data;
and calculating the distance between the second vector data and the query vector, and selecting a plurality of vector data with smaller distance from the second vector data.
2. The method of indexing vector data according to claim 1, wherein the method of building a vector training model comprises:
forming a generator;
forming a discriminator;
extracting a plurality of data samples from database vectors of a database, and training a discriminator by using the data samples and the noise samples;
a number of data samples are extracted from a database vector of a database, and a generator is trained using the data samples and noise samples.
3. The indexing method of vector data according to claim 2, the training discriminant method comprising:
sampling m data samples from the database vector;
sampling m noise samples, and placing the m noise samples into a generator to generate m vectors;
and obtaining the distance between the m vectors and the m data sample distribution, and finishing the training of the discriminator when the distance is maximum.
4. The indexing method of vector data according to claim 2, the training generator method comprising:
sampling m data samples from the database vector;
sampling m noise samples, and placing the m noise samples into a generator to generate m vectors;
and obtaining the distance between the m vectors and m data sample distribution, and finishing training of the generator when the distance is minimum.
5. The method of indexing vector data of claim 2 wherein the generator is trained once using a number of noise samples.
6. The method of indexing vector data of claim 2 wherein the multiple discriminators are trained using database vector samples.
7. The method of indexing vector data according to claim 1, wherein a query vector is input to the vector training model, and the vector training model outputs 10 approximate result vectors.
8. The method of indexing vector data according to claim 1, wherein the database vectors are ordered from large to small in dimension or from small to large in dimension.
9. The method of indexing vector data as claimed in claim 1, wherein the method of ordering database vectors in the database by dimension size comprises: firstly, a table is formed by sorting according to the size of data in a first dimension, and then another table is formed by sorting according to the size of data in a second dimension.
10. The method of indexing vector data according to claim 9, wherein selecting, as the first vector data, a plurality of database vectors respectively located around each of the approximate result vectors from the sorted database vectors, comprises: and selecting 10 128-dimensional database vectors respectively positioned above the database vector around each approximate result and 10 128-dimensional database vectors positioned below the database vector around each approximate result from the ordered database vectors.
11. The method of indexing vector data according to claim 1, wherein 20 database vectors respectively located around each of the approximate result vectors are selected from the sorted database vectors as the first vector data.
12. The indexing method of vector data according to claim 1, wherein the second vector data is subjected to euclidean distance calculation with the query vector, and a plurality of pieces of vector data having smaller distances are selected from the second vector data.
CN202310729325.2A 2023-06-20 2023-06-20 Vector data indexing method Active CN116467494B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310729325.2A CN116467494B (en) 2023-06-20 2023-06-20 Vector data indexing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310729325.2A CN116467494B (en) 2023-06-20 2023-06-20 Vector data indexing method

Publications (2)

Publication Number Publication Date
CN116467494A true CN116467494A (en) 2023-07-21
CN116467494B CN116467494B (en) 2023-08-29

Family

ID=87185185

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310729325.2A Active CN116467494B (en) 2023-06-20 2023-06-20 Vector data indexing method

Country Status (1)

Country Link
CN (1) CN116467494B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000062251A1 (en) * 1999-04-09 2000-10-19 Merck & Co., Inc. Chemical structure similarity ranking system and computer-implemented method for same
CN103336795A (en) * 2013-06-09 2013-10-02 华中科技大学 Video indexing method based on multiple features
CN109960737A (en) * 2019-03-15 2019-07-02 西安电子科技大学 Remote Sensing Images search method of the semi-supervised depth confrontation from coding Hash study
CN110941754A (en) * 2018-09-21 2020-03-31 微软技术许可有限责任公司 Vector nearest neighbor search strategy based on reinforcement learning generation
CN113377973A (en) * 2021-06-10 2021-09-10 电子科技大学 Article recommendation method based on countermeasures hash
CN113761311A (en) * 2021-01-28 2021-12-07 北京沃东天骏信息技术有限公司 Information retrieval method, device, server and readable storage medium
CN114691940A (en) * 2022-04-18 2022-07-01 上海徐毓智能科技有限公司 Index construction method and device, vector search method and retrieval system
CN114860868A (en) * 2022-03-08 2022-08-05 中国海洋大学 Semantic similarity vector re-sparse coding indexing and retrieval method
CN115630141A (en) * 2022-11-11 2023-01-20 杭州电子科技大学 Scientific and technological expert retrieval method based on community query and high-dimensional vector retrieval
CN115934724A (en) * 2022-12-19 2023-04-07 北京百度网讯科技有限公司 Method for constructing database index, retrieval method, device, equipment and medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000062251A1 (en) * 1999-04-09 2000-10-19 Merck & Co., Inc. Chemical structure similarity ranking system and computer-implemented method for same
CN103336795A (en) * 2013-06-09 2013-10-02 华中科技大学 Video indexing method based on multiple features
CN110941754A (en) * 2018-09-21 2020-03-31 微软技术许可有限责任公司 Vector nearest neighbor search strategy based on reinforcement learning generation
CN109960737A (en) * 2019-03-15 2019-07-02 西安电子科技大学 Remote Sensing Images search method of the semi-supervised depth confrontation from coding Hash study
CN113761311A (en) * 2021-01-28 2021-12-07 北京沃东天骏信息技术有限公司 Information retrieval method, device, server and readable storage medium
CN113377973A (en) * 2021-06-10 2021-09-10 电子科技大学 Article recommendation method based on countermeasures hash
CN114860868A (en) * 2022-03-08 2022-08-05 中国海洋大学 Semantic similarity vector re-sparse coding indexing and retrieval method
CN114691940A (en) * 2022-04-18 2022-07-01 上海徐毓智能科技有限公司 Index construction method and device, vector search method and retrieval system
CN115630141A (en) * 2022-11-11 2023-01-20 杭州电子科技大学 Scientific and technological expert retrieval method based on community query and high-dimensional vector retrieval
CN115934724A (en) * 2022-12-19 2023-04-07 北京百度网讯科技有限公司 Method for constructing database index, retrieval method, device, equipment and medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Y WANG: "a sematic indexing structure for image retrieval", ARXIV *
毛轶事: "一种基于向量空间模型的信息检索算法研究", 通讯世界 *
胡不归: "图像检索:向量索引", Retrieved from the Internet <URL:https://zhuanlan.zhihu.com/p/627360179?utm_id=0> *

Also Published As

Publication number Publication date
CN116467494B (en) 2023-08-29

Similar Documents

Publication Publication Date Title
EP3752930B1 (en) Random draw forest index structure for searching large scale unstructured data
Chen K-nearest neighbor algorithm optimization in text categorization
CN112417381B (en) Method and device for rapidly positioning infringement image applied to image copyright protection
CN102693299A (en) System and method for parallel video copy detection
CN102129451A (en) Method for clustering data in image retrieval system
CN112884005B (en) Image retrieval method and device based on SPTAG and convolutional neural network
CN107291895B (en) Quick hierarchical document query method
CN112527948B (en) Sentence-level index-based real-time data deduplication method and system
CN108763295A (en) A kind of video approximate copy searching algorithm based on deep learning
CN111104398A (en) Detection method and elimination method for approximate repeated record of intelligent ship
CN113886587A (en) Data classification method based on deep learning and map building method
CN114297415A (en) Multi-source heterogeneous data storage method and retrieval method for full media data space
CN104317946A (en) Multi-key image-based image content retrieval method
CN103020321A (en) Neighbor searching method and neighbor searching system
CN111125396A (en) Image retrieval method of single-model multi-branch structure
CN116467494B (en) Vector data indexing method
Sahbudin et al. MongoDB clustering using K-means for real-time song recognition
JP4215386B2 (en) Similar object search method and similar object search device
CN111414863B (en) Enhanced integrated remote sensing image classification method
CN104978395A (en) Vision dictionary construction and application method and apparatus
CN114610941A (en) Cultural relic image retrieval system based on comparison learning
CN114880690A (en) Source data time sequence refinement method based on edge calculation
CN110704651A (en) Rapid fingerprint database retrieval method based on core detail node support system
McConville et al. Accelerating large scale centroid-based clustering with locality sensitive hashing
CN117708262B (en) Method and device for carrying out data association on multidimensional and multi-source data and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant