CN110019875A - The generation method and device of index file - Google Patents
The generation method and device of index file Download PDFInfo
- Publication number
- CN110019875A CN110019875A CN201711470608.0A CN201711470608A CN110019875A CN 110019875 A CN110019875 A CN 110019875A CN 201711470608 A CN201711470608 A CN 201711470608A CN 110019875 A CN110019875 A CN 110019875A
- Authority
- CN
- China
- Prior art keywords
- index file
- training data
- class
- feature vector
- cpu
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
- G06F16/134—Distributed indices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/51—Indexing; Data structures therefor; Storage structures
Abstract
This disclosure relates to the generation method and device of index file.This method comprises: extracting the feature vector that training data concentrates each training data;According to the feature vector of each training data, the class center of training dataset is obtained;Sky index file is centrally generated according to the class of training dataset;Empty index file is sent to each node of cluster;Obtain the CPU index file that each node is returned based on empty index file;Each CPU index file is converted into GPU index file.The disclosure concentrates the feature vector of each training data by extracting training data, obtain the class center of training dataset, sky index file is centrally generated according to the class of training dataset, empty index file is sent to each node of cluster, obtain the CPU index file that each node is returned based on empty index file, and each CPU index file is converted into GPU index file, thus, it is possible to utilize GPU auxiliary, quickening image retrieval.
Description
Technical field
This disclosure relates to the generation method and device of field of computer technology more particularly to a kind of index file.
Background technique
In recent years, with the rapid development of multimedia technology and computer network, the quantity of global digital picture is just
Increasing at an amazing speed.In order to keep these numerous and jumbled image included in information be efficiently accessed and utilize, it is inevitable
Need a kind of technology that can quickly and accurately search access images, the i.e. retrieval technique of image.With large scale digital figure
As the appearance in library, traditional text based image retrieval technologies carried out dependent on artificial mark can no longer meet user day
The demand that benefit increases, CBIR (Content Based Image Retrieval, content-based image retrieval) technology are just met the tendency of
And it gives birth to.The common practices of CBIR is first to extract the feature of image to establish property data base, thus one in image library
Instance transfer is at a point in feature space.And characteristics of image is typically all the vector data of higher-dimension, so to image base
The closest retrieval to high dimensional feature vectors is converted in the similar to search of content.At the same time, for large-scale image
For database, property data base is also large-scale.Therefore, traditional sequential scan mode is unable to satisfy the retrieval of user
It is required that there is an urgent need to have suitable Indexing Mechanism to assist, accelerate the process of image retrieval.
Summary of the invention
In view of this, the present disclosure proposes a kind of generation method of index file and devices.
According to the one side of the disclosure, a kind of generation method of index file is provided, comprising:
Extract the feature vector that training data concentrates each training data;
According to the feature vector of each training data, the class center of the training dataset is obtained;
Sky index file is centrally generated according to the class of the training dataset;
The empty index file is sent to each node of cluster;
Obtain the CPU index file that each node is returned based on the empty index file;
Each CPU index file is converted into GPU index file.
In one possible implementation, after each CPU index file is converted to GPU index file,
The method also includes:
All CPU index files and all GPU index files are merged into master index file.
In one possible implementation, each CPU index file is converted into GPU index file, comprising:
Each CPU index file is converted into GPU index structure by Faiss tool, obtains each CPU rope
The corresponding GPU index file of quotation part.
In one possible implementation, according to the feature vector of each training data, the trained number is obtained
According to the class center of collection, comprising:
Product quantification treatment is carried out to the feature vector of each training data, is obtained in the class of the training dataset
The heart.
In one possible implementation, it is carried out at product quantization according to the feature vector of each training data
Reason, obtains the class center of the training dataset, comprising:
Each component of the feature vector of each training data is divided into M group, wherein M is the integer greater than 1;
K mean cluster is carried out to every group component, it is corresponding to obtain every group componentA class center;
It is corresponding according to every group component respectivelyA class center determines class centralization;
The cartesian product of the M class centralizations is determined as to K class center of the training dataset.
According to another aspect of the present disclosure, a kind of generating means of index file are provided, comprising:
Extraction module concentrates the feature vector of each training data for extracting training data;
Determining module obtains in the class of the training dataset for the feature vector according to each training data
The heart;
Generation module, for being centrally generated sky index file according to the class of the training dataset;
Sending module, for the empty index file to be sent to each node of cluster;
Module is obtained, the CPU index file returned for obtaining each node based on the empty index file;
Conversion module, for each CPU index file to be converted to GPU index file.
In one possible implementation, described device further include:
Merging module, for all CPU index files and all GPU index files to be merged into general index text
Part.
In one possible implementation, the conversion module is used for:
Each CPU index file is converted into GPU index structure by Faiss tool, obtains each CPU rope
The corresponding GPU index file of quotation part.
In one possible implementation, the determining module is used for:
Product quantification treatment is carried out to the feature vector of each training data, is obtained in the class of the training dataset
The heart.
In one possible implementation, the determining module includes:
It is grouped submodule, for each component of the feature vector of each training data to be divided into M group, wherein M is
Integer greater than 1;
Submodule is clustered, for carrying out K mean cluster to every group component, it is corresponding to obtain every group componentA class center;
First determines submodule, for corresponding according to every group component respectivelyA class center determines class centralization;
Second determines submodule, for the cartesian product of the M class centralizations to be determined as the training dataset
K class center.
According to another aspect of the present disclosure, a kind of generating means of index file are provided, comprising: processor;For depositing
Store up the memory of processor-executable instruction;Wherein, the processor is configured to executing the above method.
According to another aspect of the present disclosure, a kind of non-volatile computer readable storage medium storing program for executing is provided, is stored thereon with
Computer program instructions, wherein the computer program instructions realize the above method when being executed by processor.
The generation method and device of the index file of all aspects of this disclosure concentrate each training by extracting training data
The feature vector of data obtains the class center of training dataset, according to training data according to the feature vector of each training data
The class of collection is centrally generated sky index file, and empty index file is sent to each node of cluster, obtains each node and is based on sky
The CPU index file that index file returns, and each CPU index file is converted into GPU index file, thus, it is possible to utilize
GPU auxiliary accelerates image retrieval.
According to below with reference to the accompanying drawings to detailed description of illustrative embodiments, the other feature and aspect of the disclosure will become
It is clear.
Detailed description of the invention
Comprising in the description and constituting the attached drawing of part of specification and specification together illustrates the disclosure
Exemplary embodiment, feature and aspect, and for explaining the principles of this disclosure.
Fig. 1 shows the flow chart of the generation method of the index file according to one embodiment of the disclosure.
Fig. 2 shows the illustrative flow charts according to the generation method of the index file of one embodiment of the disclosure.
Fig. 3 show in the generation method according to the index file of one embodiment of the disclosure to the feature of each training data to
Amount carries out product quantification treatment, obtains an illustrative flow chart at the class center of training dataset.
Fig. 4 shows the block diagram of the generating means of the index file according to one embodiment of the disclosure.
Fig. 5 shows an illustrative block diagram of the generating means of the index file according to one embodiment of the disclosure.
Fig. 6 is a kind of block diagram of the device 1900 of generation for index file shown according to an exemplary embodiment.
Specific embodiment
Various exemplary embodiments, feature and the aspect of the disclosure are described in detail below with reference to attached drawing.It is identical in attached drawing
Appended drawing reference indicate element functionally identical or similar.Although the various aspects of embodiment are shown in the attached drawings, remove
It non-specifically points out, it is not necessary to attached drawing drawn to scale.
Dedicated word " exemplary " means " being used as example, embodiment or illustrative " herein.Here as " exemplary "
Illustrated any embodiment should not necessarily be construed as preferred or advantageous over other embodiments.
In addition, giving numerous details in specific embodiment below to better illustrate the disclosure.
It will be appreciated by those skilled in the art that without certain details, the disclosure equally be can be implemented.In some instances, for
Method, means, element and circuit well known to those skilled in the art are not described in detail, in order to highlight the purport of the disclosure.
Fig. 1 shows the flow chart of the generation method of the index file according to one embodiment of the disclosure.This method can be applied
In server.As shown in Figure 1, the method comprising the steps of S11 to step S16.
In step s 11, the feature vector that training data concentrates each training data is extracted.
For example, each training data of training dataset can be image.
In one possible implementation, each training data concentrated for training data, can extract the training
The local feature vectors and depth characteristic vector of data, and can by the local feature vectors of the training data and depth characteristic to
Amount is combined into a long vector, the feature vector as the training data.Wherein, depth characteristic vector, which can refer to, passes through deep learning
The feature vector that network extracts.
In alternatively possible implementation, for each training data that training data is concentrated, the instruction can be extracted
Practice data local feature vectors, and can using the local feature vectors of the training data as the feature of the training data to
Amount.
In alternatively possible implementation, for each training data that training data is concentrated, the instruction can be extracted
Practice data depth characteristic vector, and can using the depth characteristic vector of the training data as the feature of the training data to
Amount.
It should be noted that although in the way of describing the feature vector for extracting training data in a manner of implementation above such as
On, it is understood by one of ordinary skill in the art that the disclosure answer it is without being limited thereto.Those skilled in the art can be according to practical application field
Scape demand and/or personal preference flexible setting extract the mode of the feature vector of training data.
In step s 12, according to the feature vector of each training data, the class center of training dataset is obtained.
In one possible implementation, according to the feature vector of each training data, the class of training dataset is obtained
Center may include: to carry out product quantization (Product Quantize, PQ) processing to the feature vector of each training data,
Obtain the class center of training dataset.
In this implementation, due to quantifying to obtain the class center of training dataset based on product, thus it enables that generating
Empty index file structure it is small, so as to save space.In addition, being multiplied by the feature vector to each training data
Product quantification treatment, thus splits high dimensional feature, retrieves so as to be suitable for high dimensional feature.
In step s 13, sky index file is centrally generated according to the class of training dataset.
It in one possible implementation, can be based on the empty index file of Faiss training.Wherein, Faiss opens for one kind
The high dimensional indexing tool in source.
In step S14, empty index file is sent to each node of cluster.
In the present embodiment, by the way that empty index file to be sent to each point of cluster, empty index file can be disposed
On each node.
In step S15, CPU (the Central Processing that each node is returned based on empty index file is obtained
Unit, central processing unit) index file.
In step s 16, each CPU index file is converted into GPU (Graphics Processing Unit, figure
Processor) index file.
In one possible implementation, each CPU index file is converted into GPU index file, may include: logical
It crosses Faiss tool and each CPU index file is converted into GPU index structure, obtain the corresponding GPU rope of each CPU index file
Quotation part.In this implementation, each CPU index file can be copied on GPU, and can be incited somebody to action by Faiss tool
CPU index file is converted to GPU index file.
The present embodiment concentrates the feature vector of each training data by extracting training data, according to each training data
Feature vector, obtains the class center of training dataset, is centrally generated sky index file according to the class of training dataset, and sky is indexed
File is sent to each node of cluster, obtains the CPU index file that each node is returned based on empty index file, and will be each
CPU index file is converted to GPU index file, and thus, it is possible to utilize GPU auxiliary, quickening image retrieval.
It in one possible implementation,, can be with after obtaining empty index file for each node of cluster
The feature vector for extracting pending data, is divided into M group for each component of this feature vector, determines the class label of each group component,
According to the class label of each group component, the corresponding class center of each group component in sky index file is determined, and corresponding based on each group component
Class center, each group component is added in sky index file, CPU index file is obtained.Wherein, M is the integer greater than 1.
As an example of the implementation, pending data can be image.For example, pending data can for from
The image extracted in video.
An example as the implementation can extract the office of the pending data for each pending data
Portion's feature vector and depth characteristic vector, and the local feature vectors of the pending data and depth characteristic vector can be combined into
One long vector, the feature vector as the pending data.Wherein, depth characteristic vector can refer to through deep learning network
The feature vector of extraction.
The pending data can be extracted for each pending data as another example of the implementation
Local feature vectors, and can be using the local feature vectors of the pending data as the feature vector of the pending data.
The pending data can be extracted for each pending data as another example of the implementation
Depth characteristic vector, and can be using the depth characteristic vector of the pending data as the feature vector of the pending data.
It should be noted that although as above in such a way that above example describes the feature vector for extracting pending data,
It is understood by one of ordinary skill in the art that the disclosure answer it is without being limited thereto.Those skilled in the art can be according to practical application scene
Demand and/or personal preference flexible setting extract the mode of the feature vector of pending data.
As an example of the implementation, the dimension of the feature vector of pending data is D, then can be by this feature
D component of vector is divided into M group, wherein D is the integer more than or equal to M.
As an example of the implementation, K mean cluster can be carried out to every group component, obtain the class of every group component
Label.Wherein, K is positive integer.
It can be by the class label of the group component and empty rope for each group component as an example of the implementation
Residual error of the Euclidean distance at each class center as class label and class center in quotation part, and can by with the group component
The smallest class center of the residual error of class label is determined as the corresponding class center of the group component in sky index file.
It can be based in the corresponding class of the group component for any one group component as an example of the implementation
The residual error at class label class corresponding with the group component center of the class label of the group component and the group component is added to sky by the heart
In index file, CPU index file is obtained.
Fig. 2 shows the illustrative flow charts according to the generation method of the index file of one embodiment of the disclosure.Such as Fig. 2
Shown, this method may include step S11 to step S17.
In step s 11, the feature vector that training data concentrates each training data is extracted.
In step s 12, according to the feature vector of each training data, the class center of training dataset is obtained.
In step s 13, sky index file is centrally generated according to the class of training dataset.
In step S14, empty index file is sent to each node of cluster.
In step S15, the CPU index file that each node is returned based on empty index file is obtained.
In step s 16, each CPU index file is converted into GPU index file.
In step S17, all CPU index files and all GPU index files are merged into master index file.
In the present embodiment, each CPU index file and each GPU index file can be merged, obtain master index file,
It is used so as to which master index file is supplied to each query node.
Fig. 3 show in the generation method according to the index file of one embodiment of the disclosure to the feature of each training data to
Amount carries out product quantification treatment, obtains an illustrative flow chart at the class center of training dataset.As shown in figure 3, to each
The feature vector of training data carries out product quantification treatment, obtains the class center of training dataset, may include step S121 extremely
Step S124.
In step S121, each component of the feature vector of each training data is divided into M group, wherein M is greater than 1
Integer.
For example, the dimension of the feature vector of training data is D, then D component of each feature vector can be divided into M
Group, wherein D is the integer more than or equal to M.
In step S122, K mean cluster is carried out to every group component, it is corresponding to obtain every group componentA class center.
Wherein, K is positive integer.
In the present embodiment, K mean value (K-means) cluster is carried out to every group component, carries out M K mean cluster in total, obtains
To the corresponding class center of each group component in M group component.For example, K=256, M=8.
It is corresponding according to every group component respectively in step S123A class center determines class centralization.
In the present embodiment, corresponding according to every group componentA class center determines that class centralization, available M are a
Class centralization.
In step S124, the cartesian product of M class centralization is determined as to K class center of training dataset.
Fig. 4 shows the block diagram of the generating means of the index file according to one embodiment of the disclosure.As shown in figure 4, the device
Include: extraction module 41, the feature vector of each training data is concentrated for extracting training data;Determining module 42 is used for root
According to the feature vector of each training data, the class center of training dataset is obtained;Generation module 43, for according to training dataset
Class be centrally generated sky index file;Sending module 44, for empty index file to be sent to each node of cluster;Obtain mould
Block 45, the CPU index file returned for obtaining each node based on empty index file;Conversion module 46 is used for each CPU
Index file is converted to GPU index file.
Fig. 5 shows an illustrative block diagram of the generating means of the index file according to one embodiment of the disclosure.Such as Fig. 5 institute
Show:
In one possible implementation, device further include: merging module 47 is used for all CPU index files
Master index file is merged into all GPU index files.
In one possible implementation, conversion module 46 is used for: by Faiss tool by each CPU index file
GPU index structure is converted to, the corresponding GPU index file of each CPU index file is obtained.
In one possible implementation, determining module 42 is used for: being multiplied to the feature vector of each training data
Product quantification treatment, obtains the class center of training dataset.
In one possible implementation, determining module 42 includes: grouping submodule 421, is used for each trained number
According to each component of feature vector be divided into M group, wherein M is the integer greater than 1;Submodule 422 is clustered, for every component
Amount carries out K mean cluster, and it is corresponding to obtain every group componentA class center;First determines submodule 423, for basis respectively
Every group component is correspondingA class center determines class centralization;Second determines submodule 424, is used for M class center collection
The cartesian product of conjunction is determined as K class center of training dataset.
The present embodiment concentrates the feature vector of each training data by extracting training data, according to each training data
Feature vector, obtains the class center of training dataset, is centrally generated sky index file according to the class of training dataset, and sky is indexed
File is sent to each node of cluster, obtains the CPU index file that each node is returned based on empty index file, and will be each
CPU index file is converted to GPU index file, and thus, it is possible to utilize GPU auxiliary, quickening image retrieval.
Fig. 6 is a kind of block diagram of the device 1900 of generation for index file shown according to an exemplary embodiment.
For example, device 1900 may be provided as a server.Referring to Fig. 6, device 1900 includes processing component 1922, is further wrapped
One or more processors and memory resource represented by a memory 1932 are included, it can be by processing component for storing
The instruction of 1922 execution, such as application program.The application program stored in memory 1932 may include one or one with
On each correspond to one group of instruction module.In addition, processing component 1922 is configured as executing instruction, to execute above-mentioned side
Method.
Device 1900 can also include that a power supply module 1926 be configured as the power management of executive device 1900, and one
Wired or wireless network interface 1950 is configured as device 1900 being connected to network and input and output (I/O) interface
1958.Device 1900 can be operated based on the operating system for being stored in memory 1932, such as Windows ServerTM, Mac
OS XTM, UnixTM, LinuxTM, FreeBSDTM or similar.
In the exemplary embodiment, a kind of non-volatile computer readable storage medium storing program for executing is additionally provided, for example including calculating
The memory 1932 of machine program instruction, above-mentioned computer program instructions can be executed by the processing component 1922 of device 1900 to complete
The above method.
The disclosure can be system, method and/or computer program product.Computer program product may include computer
Readable storage medium storing program for executing, containing for making processor realize the computer-readable program instructions of various aspects of the disclosure.
Computer readable storage medium, which can be, can keep and store the tangible of the instruction used by instruction execution equipment
Equipment.Computer readable storage medium for example can be-- but it is not limited to-- storage device electric, magnetic storage apparatus, optical storage
Equipment, electric magnetic storage apparatus, semiconductor memory apparatus or above-mentioned any appropriate combination.Computer readable storage medium
More specific example (non exhaustive list) includes: portable computer diskette, hard disk, random access memory (RAM), read-only deposits
It is reservoir (ROM), erasable programmable read only memory (EPROM or flash memory), static random access memory (SRAM), portable
Compact disk read-only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanical coding equipment, for example thereon
It is stored with punch card or groove internal projection structure and the above-mentioned any appropriate combination of instruction.Calculating used herein above
Machine readable storage medium storing program for executing is not interpreted that instantaneous signal itself, the electromagnetic wave of such as radio wave or other Free propagations lead to
It crosses the electromagnetic wave (for example, the light pulse for passing through fiber optic cables) of waveguide or the propagation of other transmission mediums or is transmitted by electric wire
Electric signal.
Computer-readable program instructions as described herein can be downloaded to from computer readable storage medium it is each calculate/
Processing equipment, or outer computer or outer is downloaded to by network, such as internet, local area network, wide area network and/or wireless network
Portion stores equipment.Network may include copper transmission cable, optical fiber transmission, wireless transmission, router, firewall, interchanger, gateway
Computer and/or Edge Server.Adapter or network interface in each calculating/processing equipment are received from network to be counted
Calculation machine readable program instructions, and the computer-readable program instructions are forwarded, for the meter being stored in each calculating/processing equipment
In calculation machine readable storage medium storing program for executing.
Computer program instructions for executing disclosure operation can be assembly instruction, instruction set architecture (ISA) instructs,
Machine instruction, machine-dependent instructions, microcode, firmware instructions, condition setup data or with one or more programming languages
The source code or object code that any combination is write, the programming language include the programming language-of object-oriented such as
Smalltalk, C++ etc., and conventional procedural programming languages-such as " C " language or similar programming language.Computer
Readable program instructions can be executed fully on the user computer, partly execute on the user computer, be only as one
Vertical software package executes, part executes on the remote computer or completely in remote computer on the user computer for part
Or it is executed on server.In situations involving remote computers, remote computer can pass through network-packet of any kind
It includes local area network (LAN) or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as benefit
It is connected with ISP by internet).In some embodiments, by utilizing computer-readable program instructions
Status information carry out personalized customization electronic circuit, such as programmable logic circuit, field programmable gate array (FPGA) or can
Programmed logic array (PLA) (PLA), the electronic circuit can execute computer-readable program instructions, to realize each side of the disclosure
Face.
Referring herein to according to the flow chart of the method, apparatus (system) of the embodiment of the present disclosure and computer program product and/
Or block diagram describes various aspects of the disclosure.It should be appreciated that flowchart and or block diagram each box and flow chart and/
Or in block diagram each box combination, can be realized by computer-readable program instructions.
These computer-readable program instructions can be supplied to general purpose computer, special purpose computer or other programmable datas
The processor of processing unit, so that a kind of machine is produced, so that these instructions are passing through computer or other programmable datas
When the processor of processing unit executes, function specified in one or more boxes in implementation flow chart and/or block diagram is produced
The device of energy/movement.These computer-readable program instructions can also be stored in a computer-readable storage medium, these refer to
It enables so that computer, programmable data processing unit and/or other equipment work in a specific way, thus, it is stored with instruction
Computer-readable medium then includes a manufacture comprising in one or more boxes in implementation flow chart and/or block diagram
The instruction of the various aspects of defined function action.
Computer-readable program instructions can also be loaded into computer, other programmable data processing units or other
In equipment, so that series of operation steps are executed in computer, other programmable data processing units or other equipment, to produce
Raw computer implemented process, so that executed in computer, other programmable data processing units or other equipment
Instruct function action specified in one or more boxes in implementation flow chart and/or block diagram.
The flow chart and block diagram in the drawings show system, method and the computer journeys according to multiple embodiments of the disclosure
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
One module of table, program segment or a part of instruction, the module, program segment or a part of instruction include one or more use
The executable instruction of the logic function as defined in realizing.In some implementations as replacements, function marked in the box
It can occur in a different order than that indicated in the drawings.For example, two continuous boxes can actually be held substantially in parallel
Row, they can also be executed in the opposite order sometimes, and this depends on the function involved.It is also noted that block diagram and/or
The combination of each box in flow chart and the box in block diagram and or flow chart, can the function as defined in executing or dynamic
The dedicated hardware based system made is realized, or can be realized using a combination of dedicated hardware and computer instructions.
The presently disclosed embodiments is described above, above description is exemplary, and non-exclusive, and
It is not limited to disclosed each embodiment.Without departing from the scope and spirit of illustrated each embodiment, for this skill
Many modifications and changes are obvious for the those of ordinary skill in art field.The selection of term used herein, purport
In the principle, practical application or technological improvement to the technology in market for best explaining each embodiment, or lead this technology
Other those of ordinary skill in domain can understand each embodiment disclosed herein.
Claims (12)
1. a kind of generation method of index file characterized by comprising
Extract the feature vector that training data concentrates each training data;
According to the feature vector of each training data, the class center of the training dataset is obtained;
Sky index file is centrally generated according to the class of the training dataset;
The empty index file is sent to each node of cluster;
Obtain the CPU index file that each node is returned based on the empty index file;
Each CPU index file is converted into GPU index file.
2. the method according to claim 1, wherein each CPU index file is converted to GPU index
After file, the method also includes:
All CPU index files and all GPU index files are merged into master index file.
3. the method according to claim 1, wherein each CPU index file is converted to GPU index text
Part, comprising:
Each CPU index file is converted into GPU index structure by Faiss tool, obtains each CPU index text
The corresponding GPU index file of part.
4. the method according to claim 1, wherein being obtained according to the feature vector of each training data
The class center of the training dataset, comprising:
Product quantification treatment is carried out to the feature vector of each training data, obtains the class center of the training dataset.
5. according to the method described in claim 4, it is characterized in that, being multiplied according to the feature vector of each training data
Product quantification treatment, obtains the class center of the training dataset, comprising:
Each component of the feature vector of each training data is divided into M group, wherein M is the integer greater than 1;
K mean cluster is carried out to every group component, it is corresponding to obtain every group componentA class center;
It is corresponding according to every group component respectivelyA class center determines class centralization;
The cartesian product of the M class centralizations is determined as to K class center of the training dataset.
6. a kind of generating means of index file characterized by comprising
Extraction module concentrates the feature vector of each training data for extracting training data;
Determining module obtains the class center of the training dataset for the feature vector according to each training data;
Generation module, for being centrally generated sky index file according to the class of the training dataset;
Sending module, for the empty index file to be sent to each node of cluster;
Module is obtained, the CPU index file returned for obtaining each node based on the empty index file;
Conversion module, for each CPU index file to be converted to GPU index file.
7. device according to claim 6, which is characterized in that described device further include:
Merging module, for all CPU index files and all GPU index files to be merged into master index file.
8. device according to claim 6, which is characterized in that the conversion module is used for:
Each CPU index file is converted into GPU index structure by Faiss tool, obtains each CPU index text
The corresponding GPU index file of part.
9. device according to claim 6, which is characterized in that the determining module is used for:
Product quantification treatment is carried out to the feature vector of each training data, obtains the class center of the training dataset.
10. device according to claim 9, which is characterized in that the determining module includes:
Be grouped submodule, for each component of the feature vector of each training data to be divided into M group, wherein M be greater than
1 integer;
Submodule is clustered, for carrying out K mean cluster to every group component, it is corresponding to obtain every group componentA class center;
First determines submodule, for corresponding according to every group component respectivelyA class center determines class centralization;
Second determines submodule, for the cartesian product of the M class centralizations to be determined as to K of the training dataset
Class center.
11. a kind of generating means of index file characterized by comprising
Processor;
Memory for storage processor executable instruction;
Wherein, the processor is configured to method described in any one of perform claim requirement 1 to 5.
12. a kind of non-volatile computer readable storage medium storing program for executing, is stored thereon with computer program instructions, which is characterized in that institute
It states and realizes method described in any one of claim 1 to 5 when computer program instructions are executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711470608.0A CN110019875A (en) | 2017-12-29 | 2017-12-29 | The generation method and device of index file |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711470608.0A CN110019875A (en) | 2017-12-29 | 2017-12-29 | The generation method and device of index file |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110019875A true CN110019875A (en) | 2019-07-16 |
Family
ID=67187151
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711470608.0A Pending CN110019875A (en) | 2017-12-29 | 2017-12-29 | The generation method and device of index file |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110019875A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111582398A (en) * | 2020-05-14 | 2020-08-25 | 北京达佳互联信息技术有限公司 | Data clustering method, device, system, server and storage medium |
WO2023030184A1 (en) * | 2021-08-31 | 2023-03-09 | 华为技术有限公司 | Data retrieval method and related device |
CN117556068A (en) * | 2024-01-12 | 2024-02-13 | 中国科学技术大学 | Training method of target index model, information retrieval method and device |
CN117556068B (en) * | 2024-01-12 | 2024-05-17 | 中国科学技术大学 | Training method of target index model, information retrieval method and device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105160039A (en) * | 2015-10-13 | 2015-12-16 | 四川携创信息技术服务有限公司 | Query method based on big data |
CN105183845A (en) * | 2015-09-06 | 2015-12-23 | 华中科技大学 | ERVQ image indexing and retrieval method in combination with semantic features |
CN107085607A (en) * | 2017-04-19 | 2017-08-22 | 电子科技大学 | A kind of image characteristic point matching method |
-
2017
- 2017-12-29 CN CN201711470608.0A patent/CN110019875A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105183845A (en) * | 2015-09-06 | 2015-12-23 | 华中科技大学 | ERVQ image indexing and retrieval method in combination with semantic features |
CN105160039A (en) * | 2015-10-13 | 2015-12-16 | 四川携创信息技术服务有限公司 | Query method based on big data |
CN107085607A (en) * | 2017-04-19 | 2017-08-22 | 电子科技大学 | A kind of image characteristic point matching method |
Non-Patent Citations (2)
Title |
---|
人工智能学家: "重磅|Facebook发布AI搜索引擎Faiss:比最先进搜索算法快8.5倍", 《HTTPS://WWW.SOHU.COM/A/131550329_297710》 * |
杨国营: "基于MapReduce模型文本分类算法的研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111582398A (en) * | 2020-05-14 | 2020-08-25 | 北京达佳互联信息技术有限公司 | Data clustering method, device, system, server and storage medium |
CN111582398B (en) * | 2020-05-14 | 2022-03-25 | 北京达佳互联信息技术有限公司 | Data clustering method, device, system, server and storage medium |
WO2023030184A1 (en) * | 2021-08-31 | 2023-03-09 | 华为技术有限公司 | Data retrieval method and related device |
CN117556068A (en) * | 2024-01-12 | 2024-02-13 | 中国科学技术大学 | Training method of target index model, information retrieval method and device |
CN117556068B (en) * | 2024-01-12 | 2024-05-17 | 中国科学技术大学 | Training method of target index model, information retrieval method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021143800A1 (en) | System and method for semantic analysis of multimedia data using attention-based fusion network | |
JP7164729B2 (en) | CROSS-MODAL INFORMATION SEARCH METHOD AND DEVICE THEREOF, AND STORAGE MEDIUM | |
CN108629414B (en) | Deep hash learning method and device | |
CN104735468B (en) | A kind of method and system that image is synthesized to new video based on semantic analysis | |
US11551437B2 (en) | Collaborative information extraction | |
US11170270B2 (en) | Automatic generation of content using multimedia | |
KR20210124111A (en) | Method and apparatus for training model, device, medium and program product | |
US20170116521A1 (en) | Tag processing method and device | |
US10796203B2 (en) | Out-of-sample generating few-shot classification networks | |
CN109902672A (en) | Image labeling method and device, storage medium, computer equipment | |
CN111552766B (en) | Using machine learning to characterize reference relationships applied on reference graphs | |
CN114443899A (en) | Video classification method, device, equipment and medium | |
CN116303537A (en) | Data query method and device, electronic equipment and storage medium | |
WO2022042638A1 (en) | Deterministic learning video scene detection | |
CN110019875A (en) | The generation method and device of index file | |
CN110019910A (en) | Image search method and device | |
CN114282055A (en) | Video feature extraction method, device and equipment and computer storage medium | |
CN110019096A (en) | The generation method and device of index file | |
CN117312535A (en) | Method, device, equipment and medium for processing problem data based on artificial intelligence | |
WO2021059081A1 (en) | Systems and methods for training a model using a few-shot classification process | |
CN115203378B (en) | Retrieval enhancement method, system and storage medium based on pre-training language model | |
US11676410B1 (en) | Latent space encoding of text for named entity recognition | |
KR102621436B1 (en) | Voice synthesizing method, device, electronic equipment and storage medium | |
US11977842B2 (en) | Methods and systems for generating mobile enabled extraction models | |
US20210141819A1 (en) | Server and method for classifying entities of a query |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20200509 Address after: 310052 room 508, floor 5, building 4, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province Applicant after: Alibaba (China) Co.,Ltd. Address before: 200241 room 1162, building 555, Dongchuan Road, Shanghai, Minhang District Applicant before: SHANGHAI QUAN TOODOU CULTURAL COMMUNICATION Co.,Ltd. |
|
TA01 | Transfer of patent application right | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190716 |
|
RJ01 | Rejection of invention patent application after publication |