CN109829549A - Hash learning method and its unsupervised online Hash learning method based on the tree that develops - Google Patents
Hash learning method and its unsupervised online Hash learning method based on the tree that develops Download PDFInfo
- Publication number
- CN109829549A CN109829549A CN201910088472.XA CN201910088472A CN109829549A CN 109829549 A CN109829549 A CN 109829549A CN 201910088472 A CN201910088472 A CN 201910088472A CN 109829549 A CN109829549 A CN 109829549A
- Authority
- CN
- China
- Prior art keywords
- tree
- node
- develops
- data
- hash
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Image Analysis (AREA)
Abstract
The present invention relates to a kind of Hash learning methods based on the tree that develops, the tree that develops is trained by the data point in data set, obtain the evolution tree of training completion, the Hamming code coding that all nodes in the evolution tree completed to training in addition to root node are initialized, guarantor's similitude loss function of the whole tree that develops is optimized using the path code strategy of greed, Hash of the coding of Hamming code corresponding to similitude loss function minimum value as each leaf node for the tree that develops will be protected and encoded;Calculate optimal match point of a certain data point in the tree that develops, find the splitpath that leaf node corresponding to the optimal match point of the data point is divided out from root node, the Hash coding of corresponding leaf node in the optimal match point splitpath of the data point is subjected to sequential combination, is encoded as the Hash of the data point.Also disclose a kind of unsupervised online Hash learning method.The hash method can reduce encoder complexity, have preferable query performance.
Description
Technical field
The present invention relates to data processing field, in particular to a kind of Hash learning method and its unsupervised based on the tree that develops
Online Hash learning method.
Background technique
With the fast development of internet and each class of electronic devices, Various types of data, such as text, image and video are
It is skyrocketed through.Under many application scenarios, people require to retrieve related content from such large-scale data.However,
In large-scale data, the calculating time that the accurate arest neighbors of the given query point of lookup is spent is unacceptably.In order to
This problem is solved, has a large amount of research to have been directed to similar arest neighbors (Approximate Nearest recently
Neighbor, ANN) search, in large-scale data, the effect of ANN retrieval be can replace in accurate nearest _neighbor retrieval, and speed
It spends very fast.ANN retrieval based on Hash study is one kind more well-known in numerous ANN retrieval techniques, it combines engineering
Mapping of data points to hamming space is replaced the Euclidean distance of initial data with Hamming distances by habit mechanism, is guaranteeing accuracy rate
Meanwhile retrieval time and storage cost is greatly reduced.Come in recent years, has emerged many outstanding Hash study and calculated
Whether method utilizes the label information of sample according to learning model, can be divided into unsupervised model and monitor model.In view of obtaining
Label information needs huge cost of labor, therefore unsupervised Hash learning algorithm has obtained wider application.
In general, the hash algorithm of mainstream is divided into two classes: Dynamic data exchange Hash and data dependence Hash at present.In data
In individual Hash, independently of data set, Typical Representative is local sensitivity Hash (Locality for the generation of hash function race
Sensitive Hashing, LSH), it establishes Hash table using one group of random hash function, so that similar data point
Can be mapped to biggish probability in similar Hash bucket, but the disadvantage is that index establishment process be Dynamic data exchange,
In the retrieval of practical large-scale dataset, effect is poor.Data dependence Hash is also known as Hash study, passes through machine learning machine
Data are mapped as protecting the binary coding of similitude by system, are that machine learning techniques are typically answered at field of data retrieval one
With it is to realize guarantor's similitude of Hash coding that Hash, which learns most important purpose, and specifically, distance is smaller in luv space
Two data points after being mapped to hamming space, lesser Hamming distances are still able to maintain, for apart from remote data
Point, after being mapped, Hamming distances still maintain larger.In recent years, many Hash learning algorithms are put forward one after another, according to
Whether learning model utilizes the label information of sample, can be divided into unsupervised hash algorithm and with supervision hash algorithm.It is unsupervised
The famous representative of hash algorithm has principal component Hash (Principal Component Analysis Hashing, PCAH), changes
Generation quantization (Iterative Quantization, ITQ), K mean value Hash (K-Means Hash, KMH) etc., wherein PCAH is used
Input data space projection is mapped as Hash coding into lower dimensional space, then by low-dimensional data by principal component analysis, and ITQ attempts to seek
The rotation mode for looking for a kind of pair of initial data optimal when initial data is mapped as binary coding, quantifies loss reduction, KMH
It is encoded from the angle design Hash of cluster, basic thought is that data are polymerized to K class, and data use vector quantization strategy, system in class
One is quantified as the value of cluster centre point, in addition, encoding according to principle of similarity is protected to each cluster centre point.In inquiry rank
Section, is approximately the Hamming distance of the Hash codes of corresponding cluster centre by data point x, the distance between y.Band supervision hash algorithm master
It to include RBM, BRE, MFH, IMH, MLH, although supervision Hash shows searching accuracy more higher than unsupervised Hash method,
It is that their training requires label information, in the mass data epoch, data scale is big, and renewal speed is fast, obtains data label
Acquisition usually need huge cost of labor, therefore unsupervised Hash is more significant in practical application.However it is most
Unsupervised hash algorithm need disposably to load all data, a large amount of memory can be occupied, can not be suitable for stream data,
And correlative study is less.
Summary of the invention
First technical problem to be solved by this invention is the status for the prior art, and providing one kind can make to develop
It sets stable convergence and reduces the Hash learning method based on the tree that develops of encoder complexity.
Second technical problem to be solved by this invention is the status for the prior art, is provided a kind of using above-mentioned base
In the unsupervised online Hash learning method of the Hash learning method for the tree that develops, this method has preferable query performance and energy
Applied to stream data.
The present invention solves technical solution used by above-mentioned first technical problem are as follows: a kind of Hash based on the tree that develops
Learning method, for passing through the data point x in data set XiTo developing, tree is trained, and the evolution tree of training completion is obtained, to instruction
Practice the evolution tree completed to carry out protecting similitude coding, obtains the Hash coding of each leaf node in evolution tree, and calculate any
Data point obtains the Hash coding of any data point, it is characterised in that: including following step in the optimal match point to develop on tree
It is rapid:
One step 1, creation evolution tree, wherein only one root node of the evolution tree of initialization assigns the root node
Weight vector;
Step 2 is trained root node: all data points in data set being formed data flow at random, by the root section
Optimal match point of the point as first data point in the data flow, and number of the root node as optimal match point is recorded,
It is transferred to step 4;
Step 3 is trained using leaf node of first data point in data flow to the evolution tree that division is completed:
The Euclidean distance in the tree that develops between each node and the data point is calculated separately, Euclidean distance corresponding with the data point is found
The smallest node judges whether the node is leaf node, if so, currently training node as the data point in the tree that will then develop
Optimal match point, the record all leaf nodes in tree that develop become the number of optimal match point, and are transferred to step 4;If not,
It is transferred to step 6;
Step 4, to develop tree in root node and all leaf nodes successively perform the following operations respectively: judgement develop tree
In currently train node to become the number of optimal match point whether less than the first preset value, wherein currently training node in the tree that develops
For root node or any leaf node, if so, then updating the weight vector for currently training node in the tree that develops, and it is transferred to step 6;
If not, being transferred to step 5;Wherein, develop in setting and currently train the weight vector more new formula of node are as follows:
wi(t+1)=x (t)
Wherein, wiIt (t+1) is that the weight vector after node updates, w are currently trained in evolution treeiIt (t) is current in the tree that develops
Training node updates before weight vector, x (t) be with develop tree in currently train node pairing optimal match point weight to
Amount;
Step 5 judges the depth capacity for currently training the current depth of node whether to be less than the tree that develops in evolution tree, evolution
The depth capacity of tree is preset value, if so, then the current trained node in the tree that develops is divided, it will be current in the tree that develops
Training node split assigns different weight vectors to each leaf node at n leaf node, and the node of the division is denoted as
Trunk node reformulates data flow, and the number of composition data stream at this time is counted again, is transferred to step 3;If not, this
When the evolution tree completed for training of evolution tree, and be transferred to step 8;Wherein, the calculation formula of the weight vector of n leaf node
Are as follows:
W ' (t)=(1- β) w (t)+β r (t)
Wherein, w ' (t) is the weight vector of new leaf node, and w (t) is the weight of the corresponding trunk node of new leaf node
Vector, r (t) are the random unitary vector with w (t) identical dimensional, and β is preset hyper parameter, for controlling random perturbation degree;
Step 6 judges whether the data point in the data flow has all trained, if not, using next in data flow
Data point is trained the tree that develops, and continuing all nodes in record evolution tree becomes the number of optimal match point, and is transferred to step
Rapid 4;If so, being transferred to step 7;
Whether step 7, the number for judging composition data stream, if so, then reformulating data flow, weigh less than the second preset value
Newly the tree that develops is trained, and the number for becoming optimal match point to the training node in the tree that develops adds up, and is transferred to
Step 4;If not, the evolution tree that evolution tree at this time is completed for training, and it is transferred to step 8;
The Hamming code that all nodes in step 8, the evolution tree completed to training in addition to root node are initialized encodes,
Guarantor's similitude loss function of the whole tree that develops is optimized using the path code strategy of greed, similitude will be protected and lose letter
Hash coding of the Hamming code coding corresponding to number minimum value as each leaf node for the tree that develops;
Wherein, optimization aim are as follows:
Wherein, E is guarantor's similitude penalty values of the whole tree that develops, WkFor whole develop tree trunk node k weight to
Amount, Wk={ w1,w2,...,wn, w1,w2,...,wnThe weight vector for the n leaf node that respectively trunk node k is divided out;
N={ W1,W2,...,WcBe whole develop tree in all trunk nodes set;F(Wk) it is the corresponding leaf of each trunk node
Guarantor's similitude loss function of child node coding,Wherein, wi
For the weight vector of i-th of leaf node in trunk node k, wjFor j-th of leaf node in trunk node k weight to
Amount;d(wi,wj) indicate leaf node wiWith leaf node wjBetween Euclidean distance, λ is default hyper parameter, b (wi) indicate leaf
Node wiHamming code, b (wj) indicate leaf node wjHamming code, dh(b(wi),b(wj)) indicate b (wi) and b (wj) between
Hamming distances;
Step 9 calculates optimal match point of a certain data point in the tree that develops, and finds from root node and divides out the data point
Optimal match point corresponding to leaf node splitpath, and according to obtained in step 8 evolution tree in each leaf section
The Hash coding of point, by the Hash of the corresponding leaf node of optimal match point splitpath of data point coding according to depth from
It is small to carry out sequential combination to big mode, it is encoded as the Hash of the data point, the Hash coding expression of the data point
Are as follows: y=u1u2...udep-1, wherein u1For Hash coding of the data point in the corresponding node that the tree depth that develops is 2;u2For
Hash coding of the data point in the corresponding node that the tree depth that develops is 3, dep is the depth capacity of evolution tree;udep-1For this
Hash coding of the data point in the corresponding node for developing tree depth capacity.
The present invention solves technical solution used by above-mentioned second technical problem are as follows: a kind of unsupervised online Hash
Learning method, it is characterised in that: more evolution trees of creation, and forest is formed in sequence, using the above-mentioned Hash based on the tree that develops
Learning method is trained forest, and the data point in data set is formed data flow at random, uses first in data flow
Data point develops to each in the forest to set respectively according to sequencing to be trained: to each evolution in forest
Tree, one number of stochastical sampling from the Poisson distribution that intensity is 1, is denoted as K, using the data point in data flow respectively to every
The tree that develops is trained K times;Successively the forest is trained using the data point in data flow, after the completion of training, to every in forest
The leaf node of tree of developing carries out protecting similitude coding respectively, obtains in forest the Hash coding of the every tree that develops, calculates number
Strong point xiEvery optimal match point to develop on tree in forest, and by data point xiIt is corresponding to set upper best match in every evolution
The Hash coding of point forms data point x in sequenceiHash coding, wherein data point xiHash coding expression are as follows:Indicate that kth develops tree to data point x in forestiCoding, T indicate forest
The sum of the middle tree that develops.
Compared with the prior art, the advantages of the present invention are as follows: by carrying out Hash study to developing to set, in training evolution tree
When pass through introduce weight inheritance mechanism and update when only optimal match point is adjusted, keep training process simpler, develop
Tree balance and stable convergence as far as possible, and optimized by the path code strategy of greed to similitude loss function is protected,
Realize that similitude is protected in the part between child node;Also by proposing forest on the basis of the Hash learning method for the tree that develops
Hash learning method, the code length used is longer, and query performance is more preferable, and can apply to stream data.
Detailed description of the invention
Fig. 1 is the training flow chart of tree of developing in the embodiment of the present invention;
Fig. 2 is the specific original data space distribution in the present embodiment;
Fig. 3 is the evolution tree initial stage figure in the present embodiment;
Fig. 4 is the second stage figure after developing tree first division in Fig. 3;
Fig. 5 is the phase III figure after developing tree second division in Fig. 3;
Fig. 6 is the structure chart of tree of developing in Fig. 5;
Fig. 7 is the leaf node distribution map after developing tree training completion in Fig. 3.
Specific embodiment
The present invention will be described in further detail below with reference to the embodiments of the drawings.
As shown in Figure 1, a kind of Hash learning method based on the tree that develops, for passing through the data point x in data set XiIt is right
The tree that develops is trained, and obtains the evolution tree of training completion, carries out protecting similitude coding to the evolution tree that training is completed, be drilled
Change the Hash coding of each leaf node in tree, and calculate any data point in the optimal match point to develop on tree, obtains any
The Hash of data point encodes, comprising the following steps:
One step 1, creation evolution tree, wherein only one root node of the evolution tree of initialization assigns the root node
Weight vector;
Step 2 is trained root node: all data points in data set being formed data flow at random, by the root section
Optimal match point of the point as first data point in the data flow, and number of the root node as optimal match point is recorded,
It is transferred to step 4;
Step 3 is trained using leaf node of first data point in data flow to the evolution tree that division is completed:
The Euclidean distance in the tree that develops between each node and the data point is calculated separately, Euclidean distance corresponding with the data point is found
The smallest node judges whether the node is leaf node, if so, currently training node as the data point in the tree that will then develop
Optimal match point, the record all leaf nodes in tree that develop become the number of optimal match point, and are transferred to step 4;If not,
It is transferred to step 6;
Step 4, to develop tree in root node and all leaf nodes successively perform the following operations respectively: judgement develop tree
In currently train node to become the number of optimal match point whether less than the first preset value, wherein currently training node in the tree that develops
For root node or any leaf node, if so, then updating the weight vector for currently training node in the tree that develops, and it is transferred to step 6;
If not, being transferred to step 5;Wherein, develop in setting and currently train the weight vector more new formula of node are as follows:
wi(t+1)=x (t)
Wherein, wiIt (t+1) is that the weight vector after node updates, w are currently trained in evolution treeiIt (t) is current in the tree that develops
Training node updates before weight vector, x (t) be with develop tree in currently train node pairing optimal match point weight to
Amount;In the present embodiment, the first preset value is 60;
Step 5 judges the depth capacity for currently training the current depth of node whether to be less than the tree that develops in evolution tree, evolution
The depth capacity of tree is preset value, if so, then the current trained node in the tree that develops is divided, it will be current in the tree that develops
Training node split assigns different weight vectors to each leaf node at n leaf node, and the node of the division is denoted as
Trunk node reformulates data flow, and the number of composition data stream at this time is counted again, is transferred to step 3;If not, this
When the evolution tree completed for training of evolution tree, and be transferred to step 8;Wherein, the calculation formula of the weight vector of n leaf node
Are as follows:
W ' (t)=(1- β) w (t)+β r (t)
Wherein, w ' (t) is the weight vector of new leaf node, and w (t) is the weight of the corresponding trunk node of new leaf node
Vector, r (t) are the random unitary vector with w (t) identical dimensional, and β is preset hyper parameter, for controlling random perturbation degree;
In the present embodiment, β=0.05;
In the present embodiment, by introducing weight inheritance mechanism, i.e., when trunk node split goes out new leaf node, young leaves
Child node can inherit most of weight of father node, and the random perturbation of fraction is added, and can guarantee evolution tree stable convergence.
Step 6 judges whether the data point in the data flow has all trained, if not, using next in data flow
Data point is trained the tree that develops, and continuing all nodes in record evolution tree becomes the number of optimal match point, and is transferred to step
Rapid 4;If so, being transferred to step 7;
Whether step 7, the number for judging composition data stream, if so, then reformulating data flow, weigh less than the second preset value
Newly the tree that develops is trained, and the number for becoming optimal match point to the training node in the tree that develops adds up, and is transferred to
Step 4;If not, the evolution tree that evolution tree at this time is completed for training, and it is transferred to step 8;In the present embodiment, the second preset value
It is 10;
As shown in Figure 2 to 7, the process that is trained, instruction in Fig. 2 are set using data set shown in Fig. 2 evolution
Practice data to be made of 890 two-dimensional coordinate points, 5 class clusters are formed in two-dimensional space.Fig. 3,4,5 respectively indicate evolution tree training
Preceding three phases in the process, wherein dotted line with the arrow indicates the direction of growth of current generation leaf node.In the initial stage,
As shown in figure 3, only one root node R in two-dimensional space, as training carries out, as shown in figure 4, in second stage, root section
Point R divides out three leaf nodes A, B, C, as shown in figure 5, leaf node A, B, C divide out respective sub- section in the phase III
Point, { A1, A2, A3 }, { B1, B2, B3 }, { C1, C2, C3 }, Fig. 6 are the evolution tree construction of phase III;According to the above leaf section
The position of point is it can be found that at this point, the tree that develops briefly learns to have arrived the topological structure of data, as shown in fig. 7, to develop
Tree completes the spatial position of all leaf nodes after training, it can be found that the tree that develops at this time has learnt opening up to training data
Flutter structure.
The Hamming code that all nodes in step 8, the evolution tree completed to training in addition to root node are initialized encodes,
Guarantor's similitude loss function of the whole tree that develops is optimized using the path code strategy of greed, similitude will be protected and lose letter
Hash coding of the Hamming code coding corresponding to number minimum value as each leaf node for the tree that develops;
Wherein, optimization aim are as follows:
Wherein, E is guarantor's similitude penalty values of the whole tree that develops, WkFor whole develop tree trunk node k weight to
Amount, Wk={ w1,w2,...,wn, w1,w2,...,wnThe weight vector for the n leaf node that respectively trunk node k is divided out;
N={ W1,W2,...,WcBe whole develop tree in all trunk nodes set;F(Wk) it is the corresponding leaf of each trunk node
Guarantor's similitude loss function of child node coding,Wherein, wi
For the weight vector of i-th of leaf node in trunk node k, wjFor j-th of leaf node in trunk node k weight to
Amount;d(wi,wj) indicate leaf node wiWith leaf node wjBetween Euclidean distance, λ is default hyper parameter, b (wi) indicate leaf
Node wiHamming code, b (wj) indicate leaf node wjHamming code, dh(b(wi),b(wj)) indicate b (wi) and b (wj) between
Hamming distances;In the present embodiment, effect is best when λ=0.6;
Optimize each F (Wk) it is all independent, and operation is consistent, can design the shared office of all trunk nodes
Portion hamming code book Mlocal, in order to which similitude, M are protected in the part between each child node of strict guaranteelocalNeed to meet following 2
It is required that: 1, the number of Hamming code be equal to the child node quantity n of trunk node, and Hamming code is different;2, between Hamming code
Hamming distances range be [1, n-1], and any one Hamming code is different from a distance from other each Hamming codes.
In the present embodiment, n=3, Mlocal={ 00,11,01 }.Since the quantity of leaf node can't significantly affect evolution
The effect of tree study initial data topological structure, therefore, in order to reduce the algorithm complexity of coded portion, the quantity of leaf node
It is generally set to 3 or 4, at this time MlocalThe quantity of fully intermeshing I is only 6 or 24.Optimal part volume is solved by traversing I
The time complexity of code, the algorithm of coded portion is O (6n) or O (24n), can reduce encoder complexity.
Step 9 calculates optimal match point of a certain data point in the tree that develops, and finds from root node and divides out the data point
Optimal match point corresponding to leaf node splitpath, and according to obtained in step 8 evolution tree in each leaf section
The Hash coding of point carries out the Hash coding of corresponding leaf node in the optimal match point splitpath of the data point orderly
Combination encodes, the Hash coding expression of the data point are as follows: y=u as the Hash of the data point1u2...udep-1,
In, u1For Hash coding of the data point in the corresponding node that the tree depth that develops is 2;u2It is the data point in the tree depth that develops
For the Hash coding in 3 corresponding node, dep is the depth capacity set that develops;udep-1It is maximum deep in the tree that develops for the data point
Hash coding in the corresponding node of degree.
Due to developing, tree is not balanced tree, if a certain data point xiThe depth capacity of corresponding optimal match point is less than
Develop the depth capacity set, and in order to guarantee the unification of code length, is then mended using the coding of the depth capacity of the optimal match point
The coding lacked entirely, at this point, the Hash coding expression of the data point are as follows: y=u1u2...umax-1...udep-1, wherein u1For
Coding of the data point in the corresponding node that the tree depth that develops is 2;u2The corresponding section for being 3 in the tree depth that develops for the data point
Coding on point, max are the depth capacity of optimal match point, umax-2=umax-3...=udep-1=umax-1, dep is the tree that develops
Depth capacity;udep-1For coding of the data point in the corresponding node of the depth capacity for the tree that develops.
In above-mentioned evolution tree Hash, converted the leaf node that training obtains to using the path code strategy of greed
The binary coding of similitude is protected, the complexity of the Hash coding for the tree that develops is small, by calculating any data point in the tree that develops
Optimal match point, and according to optimal match point develop tree in it is corresponding coding as the data point Hash coding, therefore
By calculating the Hamming distances between Hash coding, the similitude between data point is obtained, computation complexity is reduced.
But the scope of application of the evolution tree hash method is only limitted to short coding, but the query performance of short coding is compiled compared to long
Code is often poor, it is difficult to competent actual task, it is desirable to the long codes of higher query performance are obtained by the hash method for the tree that develops
It is very difficult thing, it is directly proportional to the depth of tree from the code length of the angle analysis of space-time expense, the tree Hash that develops, it is single
It is pure that expand coding by increasing the depth for the tree that develops be unpractical.With the increase for the tree depth that develops, whole evolution tree
Number of nodes grow exponentially, in limited memory resource and effective time range it is almost impossible complete develop tree
Trained and path code, furthermore the quantity of leaf node is considerably beyond the quantity of data point, such quantification manner
Also without in all senses.It is integrated using parallel type in order to solve the problems, such as long codes and for the succinct and high efficiency of coding
Bagging extended coding in learning method, while in view of traditional Bagging needs to obtain whole sample datas, and
It is not suitable for stream data sampling, therefore proposes a kind of unsupervised online Hash on the basis of Online-Bagging
Learning method.
A kind of unsupervised online Hash learning method creates more evolution trees, and forms forest in sequence, in use
It states the Hash learning method based on the tree that develops to be trained forest, the data point in data set is formed into data flow at random, is made
The each evolution tree in the forest is trained respectively according to sequencing with first data point in data flow: to gloomy
Each evolution tree in woods, one number of stochastical sampling from the Poisson distribution that intensity is 1 are denoted as K, using in data flow
Data point respectively trains every evolution tree K times;Successively the forest is trained using the data point in data flow, has been trained
Cheng Hou carries out protecting similitude coding respectively to the leaf node of every in the forest tree that develops, and obtains the every tree that develops in forest
Hash coding, calculates data point xiEvery optimal match point to develop on tree in forest, and by data point xiCorrespondence is drilled at every
The Hash coding for changing the upper optimal match point of tree forms data point x in sequenceiHash coding, wherein data point xiHash
Coding expression are as follows: Indicate that kth develops tree to data point x in forestiVolume
Yard, the sum for the tree that develops in T expression forest.
Since the training method for the tree that develops is on-line training, different random subsample set passes through Online-Bagging
Another randomness is introduced in itself, that is, the randomness for the tree direction of growth that develops, when developing, tree is extended to evolution forest
Later, from multiple positions in luv space, the spatial distribution of data can be being captured at random, the tree that develops is more, then can
The Space expanding for enough capturing data is more comprehensive, to alleviate the defect of the coding in path.
Therefore, unsupervised online Hash learning method is on the basis of based on the Hash learning method for the tree that develops into one
Step is improved, and can neatly adjust code length, trained and coding staff by adjusting the depth for the tree that develops and the quantity for the tree that develops
Formula is very succinct, but possesses preferable query performance, and can apply to stream data.Since the evolution tree in forest is all phase
It is mutually independent, it is very suitable to distributed platform.
Claims (2)
1. a kind of Hash learning method based on the tree that develops, for passing through the data point x in data set XiTo developing, tree is instructed
Practice, obtain the evolution tree that training is completed, the evolution tree that training is completed is carried out to protect similitude coding, obtains each leaf in evolution tree
The Hash of child node encodes, and calculates any data point in the optimal match point to develop on tree, and the Hash for obtaining the data point is compiled
Code, it is characterised in that: the following steps are included:
One step 1, creation evolution tree, wherein only one root node of the evolution tree of initialization assigns weight to the root node
Vector;
Step 2 is trained root node: all data points in data set being formed data flow at random, which is made
It for the optimal match point of first data point in the data flow, and records root node and becomes the number of optimal match point, be transferred to
Step 4;
Step 3 is trained using leaf node of first data point in data flow to the evolution tree that division is completed: respectively
The Euclidean distance in the tree that develops between each node and the data point is calculated, it is minimum to find Euclidean distance corresponding with the data point
Node, judge whether the node is leaf node, if so, then will develop tree in currently train node as the data point most
Good match point, all leaf nodes recorded in the tree that develops become the number of optimal match point, and are transferred to step 4;If not, being transferred to
Step 6;
Step 4, to develop tree in root node and all leaf nodes successively perform the following operations respectively: judgement develop tree in when
Whether preceding trained node becomes the number of optimal match point less than the first preset value, wherein currently training node in the tree that develops is root
Node or any leaf node if so, then updating the weight vector for currently training node in the tree that develops, and are transferred to step 6;Such as
It is no, then it is transferred to step 5;Wherein, develop in setting and currently train the weight vector more new formula of node are as follows:
wi(t+1)=x (t)
Wherein, wiIt (t+1) is that the weight vector after node updates, w are currently trained in evolution treeiIt (t) is currently to be trained in evolution tree
Weight vector before node updates, x (t) are the weight vector with the optimal match point for currently training node pairing in the tree that develops;
Step 5 judges currently whether the current depth of trained node is less than the depth capacity for the tree that develops in evolution tree, and develop tree
Depth capacity is preset value, if so, then dividing to the current trained node in the tree that develops, by the current training in the tree that develops
Node split assigns different weight vectors to each leaf node at n leaf node, and the node of the division is denoted as trunk
Node reformulates data flow, and the number of composition data stream at this time is counted again, is transferred to step 3;If not, at this time
The evolution tree that the tree that develops is completed for training, and it is transferred to step 8;Wherein, the calculation formula of the weight vector of n leaf node are as follows:
W ' (t)=(1- β) w (t)+β r (t)
Wherein, w ' (t) be new leaf node weight vector, w (t) be the corresponding trunk node of new leaf node weight to
Amount, r (t) are the random unitary vector with w (t) identical dimensional, and β is preset hyper parameter, for controlling random perturbation degree;
Step 6 judges whether the data point in the data flow has all trained, if not, using next data in data flow
Point is trained the tree that develops, and continuing all nodes in record evolution tree becomes the number of optimal match point, and is transferred to step 4;
If so, being transferred to step 7;
Whether step 7 judges the number of composition data stream less than the second preset value, again right if so, then reformulate data flow
The tree that develops is trained, and the number for becoming optimal match point to the training node in the tree that develops adds up, and is transferred to step
4;If not, the evolution tree that evolution tree at this time is completed for training, and it is transferred to step 8;
The Hamming code that all nodes in step 8, the evolution tree completed to training in addition to root node are initialized encodes, and uses
The path code strategy of greed optimizes guarantor's similitude loss function of the whole tree that develops, and will protect similitude loss function most
Hash coding of the coding of Hamming code corresponding to small value as each leaf node for the tree that develops;
Wherein, optimization aim are as follows:
Wherein, E is guarantor's similitude penalty values of the whole tree that develops, WkFor whole develop tree trunk node k weight vector,
In, Wk={ w1,w2,...,wn, w1,w2,...,wnThe weight vector for the n leaf node that respectively trunk node k is divided out;
N={ W1,W2,...,WcBe whole develop tree in all trunk nodes set;F(Wk) it is the corresponding leaf of each trunk node
Guarantor's similitude loss function of child node coding,Wherein, wi
For the weight vector of i-th of leaf node in trunk node k, wjFor j-th of leaf node in trunk node k weight to
Amount;d(wi,wj) indicate leaf node wiWith leaf node wjBetween Euclidean distance, λ is default hyper parameter, b (wi) indicate leaf
Node wiHamming code, b (wj) indicate leaf node wjHamming code, dh(b(wi),b(wj)) indicate b (wi) and b (wj) between
Hamming distances;
Step 9 calculates optimal match point of a certain data point in the tree that develops, and finds from root node and divides out the data point most
The splitpath of leaf node corresponding to good match point, and each leaf node in the evolution tree according to obtained in step 8
The Hash coding of corresponding leaf node in the optimal match point splitpath of the data point is carried out orderly group by Hash coding
It closes, is encoded as the Hash of the data point, the Hash coding expression of the data point are as follows: y=u1u2...udep-1, wherein
u1For Hash coding of the data point in the corresponding node that the tree depth that develops is 2;u2Setting depth in evolution for the data point is 3
Corresponding node on Hash coding, dep be develop set depth capacity;udep-1It is the data point in the tree depth capacity that develops
Hash coding in corresponding node.
2. a kind of unsupervised online Hash learning method, it is characterised in that: more evolution trees of creation, and form in sequence gloomy
Woods is trained forest using method described in claim 1, and the data point in data set is formed data flow at random, is made
The each evolution tree in the forest is trained respectively according to sequencing with first data point in data flow: to gloomy
Each evolution tree in woods, one number of stochastical sampling from the Poisson distribution that intensity is 1 are denoted as K, using in data flow
Data point respectively trains every evolution tree K times;Successively the forest is trained using the data point in data flow, has been trained
Cheng Hou carries out protecting similitude coding respectively to the leaf node of every in the forest tree that develops, and obtains the every tree that develops in forest
Hash coding, calculates data point xiEvery optimal match point to develop on tree in forest, and by data point xiCorrespondence is drilled at every
The Hash coding for changing the upper optimal match point of tree forms data point x in sequenceiHash coding, wherein data point xiHash
Coding expression are as follows: Indicate that kth develops tree to data point x in forestiVolume
Yard, the sum for the tree that develops in T expression forest.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910088472.XA CN109829549A (en) | 2019-01-30 | 2019-01-30 | Hash learning method and its unsupervised online Hash learning method based on the tree that develops |
CN202010070802.5A CN111079949A (en) | 2019-01-30 | 2020-01-21 | Hash learning method, unsupervised online Hash learning method and application thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910088472.XA CN109829549A (en) | 2019-01-30 | 2019-01-30 | Hash learning method and its unsupervised online Hash learning method based on the tree that develops |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109829549A true CN109829549A (en) | 2019-05-31 |
Family
ID=66863000
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910088472.XA Pending CN109829549A (en) | 2019-01-30 | 2019-01-30 | Hash learning method and its unsupervised online Hash learning method based on the tree that develops |
CN202010070802.5A Pending CN111079949A (en) | 2019-01-30 | 2020-01-21 | Hash learning method, unsupervised online Hash learning method and application thereof |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010070802.5A Pending CN111079949A (en) | 2019-01-30 | 2020-01-21 | Hash learning method, unsupervised online Hash learning method and application thereof |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN109829549A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110209867A (en) * | 2019-06-05 | 2019-09-06 | 腾讯科技(深圳)有限公司 | Training method, device, equipment and the storage medium of image encrypting algorithm |
CN110674335A (en) * | 2019-09-16 | 2020-01-10 | 重庆邮电大学 | Hash code and image bidirectional conversion method based on multi-generation and multi-countermeasure |
CN110909027A (en) * | 2019-10-17 | 2020-03-24 | 宁波大学 | Hash retrieval method |
CN111078911A (en) * | 2019-12-13 | 2020-04-28 | 宁波大学 | Unsupervised hashing method based on self-encoder |
CN111079949A (en) * | 2019-01-30 | 2020-04-28 | 宁波大学 | Hash learning method, unsupervised online Hash learning method and application thereof |
CN111625258A (en) * | 2020-05-22 | 2020-09-04 | 深圳前海微众银行股份有限公司 | Mercker tree updating method, device, equipment and readable storage medium |
CN112699942A (en) * | 2020-12-30 | 2021-04-23 | 东软睿驰汽车技术(沈阳)有限公司 | Operating vehicle identification method, device, equipment and storage medium |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113537495B (en) * | 2021-08-05 | 2024-08-02 | 南方电网数字电网研究院有限公司 | Model training system, method and device based on federal learning and computer equipment |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108830333A (en) * | 2018-06-22 | 2018-11-16 | 河南广播电视大学 | A kind of nearest neighbor search method based on three times bit quantization and non symmetrical distance |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106777038B (en) * | 2016-12-09 | 2019-06-14 | 厦门大学 | A kind of ultralow complexity image search method retaining Hash based on sequence |
CN108182256A (en) * | 2017-12-31 | 2018-06-19 | 厦门大学 | It is a kind of based on the discrete efficient image search method for being locally linear embedding into Hash |
CN109166615B (en) * | 2018-07-11 | 2021-09-10 | 重庆邮电大学 | Medical CT image storage and retrieval method based on random forest hash |
CN109829549A (en) * | 2019-01-30 | 2019-05-31 | 宁波大学 | Hash learning method and its unsupervised online Hash learning method based on the tree that develops |
-
2019
- 2019-01-30 CN CN201910088472.XA patent/CN109829549A/en active Pending
-
2020
- 2020-01-21 CN CN202010070802.5A patent/CN111079949A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108830333A (en) * | 2018-06-22 | 2018-11-16 | 河南广播电视大学 | A kind of nearest neighbor search method based on three times bit quantization and non symmetrical distance |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111079949A (en) * | 2019-01-30 | 2020-04-28 | 宁波大学 | Hash learning method, unsupervised online Hash learning method and application thereof |
CN110209867A (en) * | 2019-06-05 | 2019-09-06 | 腾讯科技(深圳)有限公司 | Training method, device, equipment and the storage medium of image encrypting algorithm |
CN110209867B (en) * | 2019-06-05 | 2023-05-16 | 腾讯科技(深圳)有限公司 | Training method, device, equipment and storage medium for image retrieval model |
CN110674335A (en) * | 2019-09-16 | 2020-01-10 | 重庆邮电大学 | Hash code and image bidirectional conversion method based on multi-generation and multi-countermeasure |
CN110674335B (en) * | 2019-09-16 | 2022-08-23 | 重庆邮电大学 | Hash code and image bidirectional conversion method based on multiple generation and multiple countermeasures |
CN110909027A (en) * | 2019-10-17 | 2020-03-24 | 宁波大学 | Hash retrieval method |
CN110909027B (en) * | 2019-10-17 | 2022-04-01 | 宁波大学 | Hash retrieval method |
CN111078911B (en) * | 2019-12-13 | 2022-03-22 | 宁波大学 | Unsupervised hashing method based on self-encoder |
CN111078911A (en) * | 2019-12-13 | 2020-04-28 | 宁波大学 | Unsupervised hashing method based on self-encoder |
WO2021233182A1 (en) * | 2020-05-22 | 2021-11-25 | 深圳前海微众银行股份有限公司 | Merkle tree updating method, apparatus and device, and readable storage medium |
CN111625258B (en) * | 2020-05-22 | 2021-08-27 | 深圳前海微众银行股份有限公司 | Mercker tree updating method, device, equipment and readable storage medium |
CN111625258A (en) * | 2020-05-22 | 2020-09-04 | 深圳前海微众银行股份有限公司 | Mercker tree updating method, device, equipment and readable storage medium |
CN112699942A (en) * | 2020-12-30 | 2021-04-23 | 东软睿驰汽车技术(沈阳)有限公司 | Operating vehicle identification method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN111079949A (en) | 2020-04-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109829549A (en) | Hash learning method and its unsupervised online Hash learning method based on the tree that develops | |
CN106503106B (en) | A kind of image hash index construction method based on deep learning | |
CN113868366B (en) | Streaming data-oriented online cross-modal retrieval method and system | |
CN115688913B (en) | Cloud edge end collaborative personalized federal learning method, system, equipment and medium | |
CN103116625A (en) | Volume radio direction finde (RDF) data distribution type query processing method based on Hadoop | |
CN109818971B (en) | Network data anomaly detection method and system based on high-order association mining | |
CN112819192A (en) | Method for predicting short-term power load of RF _ GRU network based on swarm algorithm optimization | |
CN107783998A (en) | The method and device of a kind of data processing | |
CN104915388B (en) | It is a kind of that method is recommended based on spectral clustering and the book labels of mass-rent technology | |
CN112241240B (en) | Method, apparatus and computer program product for parallel transmission of data | |
Dong et al. | Knowledge restore and transfer for multi-label class-incremental learning | |
CN118012602A (en) | Distributed cluster data equalization method based on balanced multi-way tree | |
CN107886132A (en) | A kind of Time Series method and system for solving music volume forecasting | |
CN116089731A (en) | Online hash retrieval method and system for relieving catastrophic forgetting | |
CN116720519A (en) | Seedling medicine named entity identification method | |
CN110334869A (en) | A kind of mangrove forest ecological health forecast training method based on dynamic colony optimization algorithm | |
CN115905903A (en) | Multi-view clustering method and system based on graph attention automatic encoder | |
CN114756713A (en) | Graph representation learning method based on multi-source interaction fusion | |
CN115099309A (en) | Method for designing cost evaluation model for storage and index of graph data | |
CN115115966A (en) | Video scene segmentation method and device, computer equipment and storage medium | |
CN114564681A (en) | Method for analyzing inorganic salt content data of forest soil | |
Sarkar et al. | Accuracy-based learning classification system | |
CN113341696A (en) | Intelligent setting method for attitude control parameters of carrier rocket | |
CN113191555A (en) | Iterative self-organizing clustering combined algorithm based on improved culture algorithm | |
CN112817959A (en) | Construction method of ancient biomorphic phylogenetic tree based on multi-metric index weight |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190531 |