CN110069666A - The Hash learning method and device kept based on Near-neighbor Structure - Google Patents
The Hash learning method and device kept based on Near-neighbor Structure Download PDFInfo
- Publication number
- CN110069666A CN110069666A CN201910264740.9A CN201910264740A CN110069666A CN 110069666 A CN110069666 A CN 110069666A CN 201910264740 A CN201910264740 A CN 201910264740A CN 110069666 A CN110069666 A CN 110069666A
- Authority
- CN
- China
- Prior art keywords
- video
- training video
- training
- neighbour
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/75—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Image Analysis (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The present invention proposes a kind of Hash learning method and device based on Near-neighbor Structure holding, wherein method includes: to obtain video training set, and extract M frame level feature of each training video;The time domain external appearance characteristic of each training video is extracted, clock synchronization domain external appearance characteristic is clustered, and anchor point characteristic set is obtained;The corresponding time domain appearance neighbour's feature of each training video is obtained from anchor point characteristic set;Each training video is encoded to corresponding depth and is expressed according to time domain appearance neighbour's feature using coding network;A column two-value code is converted by the corresponding depth expression of each training video;The corresponding M reconstruct frame level feature of each training video is reconstructed according to two-value code;Generate reconstructed error function and neighbour's similitude error function;Network is trained, so that reconstructed error function and neighbour's similitude error function minimize.It can be realized the intact preservation for guaranteeing Near-neighbor Structure in Hamming space, improve the retrieval precision on extensive unsupervised video database.
Description
Technical field
The present invention relates to technical field of video processing more particularly to a kind of Hash learning methods kept based on Near-neighbor Structure
And device.
Background technique
Extensive video frequency searching, it is intended to retrieved in the database huge from one with to the similar view of inquiry video
Frequently, under normal circumstances, video can be indicated with a series of obtained video frames of sampling, also, every frame video frame can be by
One feature is indicated.In video frequency searching, relevant video can be determined according to the corresponding characteristic set of video.
In face of high dimensional feature and mass data, hash method achieved in extensive vision retrieval tasks it is very big at
Just, video Hash encodes video into fine and close two-value code, and guarantees the Similarity Structure of sdi video, in Hamming space
To save.Video hash method heuristic data characteristic based on study simultaneously achieves hash method more preferably property than hand-designed
Can because eliminating the trouble manually marked, unsupervised Hash compared supervision Hash in extensive video frequency searching task more
Add feasible.
Currently, most of unsupervised Hash are conceived to characterization and timing information using video, but have ignored and neighbour is tied
The utilization of structure will have no to absorb the content of input video differentially so as to cause coding network, distinguish that these contents are without going
It is no similar to neighbour's content, it is unfavorable for the preservation of neighbour's similitude in this way, thus on extensive unsupervised video database
When carrying out video frequency searching, the precision of retrieval not can guarantee.
Summary of the invention
The present invention proposes a kind of Hash learning method and device based on Near-neighbor Structure holding, guarantees Hamming space to realize
The intact preservation of middle Near-neighbor Structure improves the retrieval precision on extensive unsupervised video database, for solving the prior art
In unsupervised Hash be conceived to characterization and timing information using video, but have ignored the utilization to Near-neighbor Structure, not can guarantee
The technical issues of precision of video frequency searching.
One aspect of the present invention embodiment proposes a kind of Hash learning method kept based on Near-neighbor Structure, comprising:
S1, video training set is obtained, for each training video in the video training set, extracts each training
M frame level feature of video;
S2, using autocoder, extract the time domain external appearance characteristic of each training video, and to the time domain external appearance characteristic
It is clustered, obtains anchor point characteristic set;
S3, be directed to each training video, obtained from the anchor point characteristic set each training video it is corresponding when it is overseas
See neighbour's feature;
S4, each training video is encoded to according to time domain appearance neighbour's feature by corresponding depth using coding network
Degree expression;
S5, according to the full linking layer for using activation primitive, the corresponding depth of each training video is expressed, is converted into
One column two-value code;
S6, using decoding network, it is special that the corresponding M reconstruct frame level of each training video is reconstructed according to the two-value code
Sign;
S7, according to the corresponding frame level feature of each training video and the reconstruct frame level feature, generate reconstructed error
Function, and according to the time domain external appearance characteristic and the two-value code, generate neighbour's similitude error function;
S8, network is trained, so that the reconstructed error function minimization, and make neighbour's similitude error
Function minimization;Wherein, the network includes the coding network, the full linking layer and the decoding network.
The Hash learning method of the embodiment of the present invention kept based on Near-neighbor Structure, by obtaining video training set, for
Each training video in video training set extracts M frame level feature of each training video, later, using autocoder,
The time domain external appearance characteristic of each training video is extracted, and clock synchronization domain external appearance characteristic is clustered, obtains anchor point characteristic set, and
Afterwards, for each training video, the corresponding time domain appearance neighbour's feature of each training video is obtained from anchor point characteristic set, and
Each training video is encoded to corresponding depth and is expressed according to time domain appearance neighbour's feature using coding network, later, root
According to the full linking layer for using activation primitive, the corresponding depth of each training video is expressed, a column two-value code is converted into, then,
Using decoding network, the corresponding M reconstruct frame level feature of each training video is reconstructed according to two-value code, later, according to each
The corresponding frame level feature of training video and reconstruct frame level feature, generate reconstructed error function, and according to time domain external appearance characteristic and two
It is worth code, generates neighbour's similitude error function, finally, network is trained, so that reconstructed error function minimization, and make
Neighbour's similitude error function minimizes;Wherein, network includes encoding nerve network, full linking layer and decoding network.The present invention
In, the neighbour of video is embedded into coding network, is carried out in cataloged procedure in the frame level feature to video as a result, the video
In content similar with its neighbour be able to more be paid close attention to, and then the inspection on extensive unsupervised video database can be improved
Suo Jingdu.Also, by minimizing reconstruction error and neighbour's similitude error, it is ensured that Near-neighbor Structure is complete in Hamming space
It is good to save, further increase the retrieval precision on video database.
Another aspect of the invention embodiment proposes a kind of Hash learning device kept based on Near-neighbor Structure, comprising:
Module is obtained, for obtaining video training set, for each training video in the video training set, extracts institute
State M frame level feature of each training video;
Extraction module extracts the time domain external appearance characteristic of each training video for using autocoder, and to it is described when
Domain external appearance characteristic is clustered, and anchor point characteristic set is obtained;
The acquisition module is also used to obtain each training from the anchor point characteristic set for each training video
The corresponding time domain appearance neighbour's feature of video;
Coding module, for being encoded each training video according to time domain appearance neighbour's feature using coding network
For the expression of corresponding depth;
Conversion module, for according to the full linking layer for using activation primitive, by the corresponding depth of each training video
Expression, is converted into a column two-value code;
Reconstructed module reconstructs the corresponding M weight of each training video according to the two-value code for using decoding network
Structure frame level feature;
Generation module, for according to the corresponding frame level feature of each training video and the reconstruct frame level feature, life
At reconstructed error function, and according to the time domain external appearance characteristic and the two-value code, neighbour's similitude error function is generated;
Training module, for being trained to network, so that the reconstructed error function minimization, and make the neighbour
Similitude error function minimizes;Wherein, the network includes the coding network, the full linking layer and the decoding net
Network.
The Hash learning device of the embodiment of the present invention kept based on Near-neighbor Structure, by obtaining video training set, for
Each training video in video training set extracts M frame level feature of each training video, later, using autocoder,
The time domain external appearance characteristic of each training video is extracted, and clock synchronization domain external appearance characteristic is clustered, obtains anchor point characteristic set, and
Afterwards, for each training video, the corresponding time domain appearance neighbour's feature of each training video is obtained from anchor point characteristic set, and
Each training video is encoded to corresponding depth and is expressed according to time domain appearance neighbour's feature using coding network, later, root
According to the full linking layer for using activation primitive, the corresponding depth of each training video is expressed, a column two-value code is converted into, then,
Using decoding network, the corresponding M reconstruct frame level feature of each training video is reconstructed according to two-value code, later, according to each
The corresponding frame level feature of training video and reconstruct frame level feature, generate reconstructed error function, and according to time domain external appearance characteristic and two
It is worth code, generates neighbour's similitude error function, finally, network is trained, so that reconstructed error function minimization, and make
Neighbour's similitude error function minimizes;Wherein, network includes coding network, full linking layer and decoding network.It, will in the present invention
The neighbour of video is embedded into coding network, as a result, to video frame level feature carry out cataloged procedure in, in the video with its
The similar content of neighbour is able to more be paid close attention to, and then the retrieval essence on extensive unsupervised video database can be improved
Degree.Also, by minimizing reconstruction error and neighbour's similitude error, it is ensured that the intact guarantor of Near-neighbor Structure in Hamming space
It deposits, further increases the retrieval precision on video database.
The additional aspect of the present invention and advantage will be set forth in part in the description, and will partially become from the following description
Obviously, or practice through the invention is recognized.
Detailed description of the invention
Above-mentioned and/or additional aspect and advantage of the invention will become from the following description of the accompanying drawings of embodiments
Obviously and it is readily appreciated that, in which:
Fig. 1 is the process signal of the Hash learning method kept provided by the embodiment of the present invention one based on Near-neighbor Structure
Figure;
Fig. 2 is Hash learning process schematic diagram one in the embodiment of the present invention;
Fig. 3 is the Hash learning process schematic diagram two of the embodiment of the present invention;
Fig. 4 is the structural representation of the Hash learning device kept provided by the embodiment of the present invention two based on Near-neighbor Structure
Figure.
Specific embodiment
The embodiment of the present invention is described below in detail, the example of embodiment is shown in the accompanying drawings, wherein identical from beginning to end
Or similar label indicates same or similar element or element with the same or similar functions.It is retouched below with reference to attached drawing
The embodiment stated is exemplary, it is intended to is used to explain the present invention, and is not considered as limiting the invention.
Currently, hash function has been integrated into deep neural network by some video hash methods, specifically, pass through depth
Convolutional network extracts the feature of video frame, these features are operated by timing pondization or deep-cycle network, is further compiled
Code is at two-value code.Because eliminating the trouble manually marked, unsupervised Hash has compared supervision Hash in extensive video frequency searching
It is more feasible in task.
However, most of unsupervised Hash are conceived to characterization and timing information using video, but have ignored and neighbour is tied
The utilization of structure devises certain neighbour's similarity cost function in spite of some hash methods to train network, but Near-neighbor Structure
It is only applied to instruct the generation of two-value code, and is not utilized on video features coding.Under this mode, design
Coding network will have no differentially to absorb the content of input video, distinguish whether these contents are similar to neighbour's content without going,
It is unfavorable for the preservation of neighbour's similitude, thus when carrying out video frequency searching on extensive unsupervised video database, Wu Fabao
Demonstrate,prove the precision of retrieval.
Therefore, it is conceived to present invention is generally directed to unsupervised Hash in the prior art and is believed using the characterization and timing of video
The technical issues of ceasing, but having ignored the utilization to Near-neighbor Structure, not can guarantee the precision of video frequency searching proposes a kind of based on neighbour
Structure-preserved Hash learning method.
The Hash learning method of the embodiment of the present invention kept based on Near-neighbor Structure, by the way that the neighbour of video is embedded into volume
In code network, carried out in cataloged procedure to video frame level feature as a result, content similar with its neighbour is able to by more in video
More concerns, and then the retrieval precision on extensive unsupervised video database can be improved.Also, it is trained to network
When, by minimizing reconstruction error and neighbour's similitude error, it is ensured that the intact preservation of Near-neighbor Structure in Hamming space, into
One step improves the retrieval precision on video database.
Below with reference to the accompanying drawings the Hash learning method kept based on Near-neighbor Structure and device of the embodiment of the present invention are described.
Fig. 1 is the process signal of the Hash learning method kept provided by the embodiment of the present invention one based on Near-neighbor Structure
Figure.
The embodiment of the present invention is configured in the Hash learning method kept based on Near-neighbor Structure and is kept based on Near-neighbor Structure
Hash learning device in come for example, should the Hash learning device that be kept based on Near-neighbor Structure can be applied to any calculating
In machine equipment, so that the computer equipment can execute the Hash learning functionality kept based on Near-neighbor Structure.
Wherein, computer equipment can be PC (PersonalComputer, abbreviation PC), cloud device, movement
Equipment etc., mobile device can for example have for mobile phone, tablet computer, personal digital assistant, wearable device, mobile unit etc.
The hardware device of various operating systems, touch screen and/or display screen.
As shown in Figure 1, the Hash learning method that should be kept based on Near-neighbor Structure may comprise steps of:
S1 obtains video training set for each training video in video training set and extracts the M of each training video
A frame level feature.
It include N number of training video in video training set in the embodiment of the present invention, N number of training video can set for computer
The standby video being locally stored, or, or the video of computer equipment download online, with no restriction to this.Wherein, N
Size be it is pre-set, the size of M is also pre-set.
In the embodiment of the present invention, marking video training set isIt is regarded for the training of each of video training set
Frequently, the M frame level that the corresponding dimension of each training video is l can be extracted to its uniform sampling M frame, and by depth convolutional network
Feature then can convert frame level characteristic set for each training video
S2 extracts the time domain external appearance characteristic of each training video using autocoder, and clock synchronization domain external appearance characteristic carries out
Cluster, obtains anchor point characteristic set.
In the embodiment of the present invention, for each training video, the time domain appearance that d dimension can be obtained by autocoder is special
Sign.For each training video, and can be arranged by calculating the training video in video library at a distance from other videos
Sequence determines the corresponding a time domain appearance neighbour's feature of the training video, for example, can calculate difference by using two norms
The distance between video, since this calculating process is also required to calculate in test phase, and towards the close of entire video training set
Neighbour's retrieval can consume the plenty of time, be unpractical.Therefore, the present invention in, can to the training video in video training set into
Row K mean cluster obtains n cluster centre, for example, K mean cluster can be carried out with clock synchronization domain external appearance characteristic, obtains n cluster
Center.For each cluster centre, (or apart from the smallest) time domain external appearance characteristic nearest with the cluster centre can be determined, from
And obtain n time domain external appearance characteristic.Then, can be using n time domain external appearance characteristic as anchor point, and it is included in anchor point characteristic set,
The anchor point characteristic set is marked to be
It is close to obtain the corresponding time domain appearance of each training video for each training video from anchor point characteristic set by S3
Adjacent feature.
In the embodiment of the present invention, for each training video, it is corresponding can to obtain the training video from anchor point characteristic set
A time domain appearance neighbour's feature, respectivelyDue to a < < n < < N, obtain a time domain
Appearance neighbour feature need to only consume the micro time, can greatly promote the efficiency of video frequency searching.
Each training video is encoded to corresponding depthmeter according to time domain appearance neighbour's feature using coding network by S4
It reaches.
In the embodiment of the present invention, coding network has learnt to obtain the corresponding time domain appearance neighbour feature of each video and depthmeter
Corresponding relationship between reaching can regard each training after determining the corresponding time domain appearance neighbour's feature of each training video
Frequently corresponding time domain appearance neighbour's feature, is input to coding network, to obtain the corresponding depth expression of each training video.
As a kind of possible implementation, in neighbour's attention study mechanism, need to obtain Near-neighbor Structure expression ni。
Specifically, for each training video, the corresponding a time domain appearance neighbour's feature of the training video can be arranged to merging
To primary vectorAnd the Near-neighbor Structure that the primary vector is mapped as b dimension is expressed into ni, then Near-neighbor Structure table
Up to niAre as follows:
Wherein, FC indicates full linking layer mapping.
For each training video, first moment, the first frame frame level feature of the training video is input to coding
Network, and by neighbour's structure representation niIt is embedded in the memory state of b dimension as follows:
Wherein, d is fixed value, Wq、Wk、WvFor the parameter value of coding network,It indicates to arrange to merging,Indicate training
The frame level feature of first moment input of video, mi,1Indicate first moment corresponding memory state.
By way of such as formula (2), the information of Near-neighbor Structure will be present in the memory state at each moment, in 1 < t
When≤M, when there is new video frame level feature to be input to coding network, memory state can be carried out as follows more
It is new:
Wherein,Indicate the video frame level feature of t-th of moment input, mi,tIndicate t-th of moment corresponding memory shape
State, mi,t-1Indicate the t-1 moment corresponding memory state.
By way of such as formula (2) and (3), at each moment, memory state ties the neighbour for being included according to it
Structure information, to select information useful in input feature vector to be written into new memory state.By above-mentioned neighbour's attention learning machine
System is embedded in coding network, available each arithmetic element are as follows:
Wherein, MLP indicates that multi-level mapping, BN indicate batch standardization, Wiv、Wih、Wfv、Wfh、Wov、WohPresentation code network
Parameter value,Indicate that inner product, the calculation of sigma function are σ=1/ (1+e-x), the calculation of tanh function is σ=(ex-e-x)/(ex+e-x).H is exported in the last one moment resulting hidden layeri,M, the as depth expression of training video.Specifically, for
Each training video, the depth expression of the training video are as follows:
Wherein,Indicate the corresponding frame level feature of the training video, the parameter of θ presentation code network.
In the embodiment of the present invention, the neighbour of video is embedded into coding network, as a result, the frame level feature to video into
In row cataloged procedure, content similar with its neighbour is able to more be paid close attention in the video, and then extensive nothing can be improved
Supervise the retrieval precision on video database.
The corresponding depth of each training video is expressed according to the full linking layer for using activation primitive, is converted into a column by S5
Two-value code.
In the embodiment of the present invention, the training is regarded according to the full linking layer for using activation primitive for each training video
Frequently corresponding depth expression, the column two-value code converted are as follows:
bi=sgn (ti);(6)
Wherein, ti=FC (hi,M,k);FC indicates full linking layer mapping, and sgn indicates sign function, works as tiWhen greater than 0, sgn
(ti) it is 1, work as tiWhen less than or equal to 0, sgn (ti) it is the length that -1, k indicates a column two-value code.
S6 reconstructs the corresponding M reconstruct frame level feature of each training video according to two-value code using decoding network.
It, can be using long memory network in short-term (Long Short Term Memory, abbreviation in the embodiment of the present invention
LSTM) it is used as decoding network.Specifically, the corresponding column two-value code of each training video can be mapped as l dimensional vector?
It first moment, willIt is input to decoding network, the video frame level feature of available first reconstructThe embodiment of the present invention
In be denoted as reconstruct frame level featureIt, will second momentIt is input to decoding network, obtains second reconstruct frame level feature
It willIt is input to decoding network, obtains third reconstruct frame level featureSo circulation, until decoding network exports m-th weight
Structure frame level featureWhen, decoding is completed.
As an example, referring to fig. 2, Fig. 2 is Hash learning process schematic diagram one in the embodiment of the present invention.Wherein, exist
Obtain the corresponding M frame level feature v of training video1、v2、…、vMAfterwards, corresponding depth expression can be exported by coding network,
And the full linking layer by using activation primitive, after obtaining corresponding two-value code, corresponding M can be exported by decoding network
A reconstruct frame level feature
S7 generates reconstructed error function, and root according to the corresponding frame level feature of each training video and reconstruct frame level feature
According to time domain appearance neighbour feature and two-value code, neighbour's similitude error function is generated.
In the embodiment of the present invention, two loss functions are devised to train network, respectively reconstructed error function LrWith it is close
Adjacent similitude error function Ls。
Wherein, reconstructed error function LrIndicate the corresponding frame level feature of training video of input and the reconstructed frame that decoding obtains
Difference between grade feature, can be used mean square error to indicate reconstructed error function Lr:
Wherein,Indicate m-th of frame level feature in i-th of training video,Indicate the m in i-th of training video
A reconstruct frame level feature.
In the embodiment of the present invention, neighbour's similitude error function indicates similitude knot in original video space and Hamming space
The difference of structure can obtain neighbour's similitude error function L according to formula (8)s:
Wherein, sijIndicate i-th of training video time domain external appearance characteristic and j-th of training video time domain external appearance characteristic it
Between similitude,Indicate i-th of corresponding two-value code biTwo-value code b corresponding with j-thjBetween similitude.
For the s in calculation formula (8)ij, approximate similarity matrix A can be established as follows.It is possible, firstly, to root
According to the corresponding frame level feature x of training videoiAnd corresponding a time domain appearance neighbour's featureDefinition
One similarity matrix deletedEach of Y element Y can be indicated with formula (9)ij:
Wherein,<i>indicates position of a time domain appearance neighbour feature in anchor point characteristic set, and Dist indicates distance meter
Function is calculated, two norm calculation distances can be used, t indicates bandwidth parameter.
Approximate similarity matrix A can be calculated according to formula (10):
A=Y Λ-1YT;(10)
Wherein,
It is sparse nonnegative matrix according to the A that formula (10) is calculated, the sum of each column of every a line of matrix are 1, work as Aij>
It, can be by s when 0ijIt is set as 1, and works as AijIt, can be by s when≤0ijIt is set as 0.
Two-value code b is indicated in formula (8)iWith two-value code bjBetween similitudeIt can be defined asIn order to
Avoid the concussion in network training process, Ke YiyongApproximate representationWherein, tiFor two-value code biLoose list
Show.
In order to reduceWithBetween approximate error, can introduce about tiAnd biAuxiliary lose item, then can will be public
Formula (8) conversion are as follows:
S8 is trained network, so that reconstructed error function minimization, and keep neighbour's similitude error function minimum
Change;Wherein, network includes coding network, full linking layer and decoding network.
In the embodiment of the present invention, network can be divided into three parts: first part, which is one, has neighbour's attention learning machine
The coding network of system is expressed by the depth that the coding network can learn to obtain training video;The second part is a band
There is the full linking layer of nonlinear activation function, for depth expression to be converted to the two-value code of a K dimension;Third part is
One decoding network decodes the reconstruct frame level feature of each frame of training video from the two-value code that coding obtains.
In the embodiment of the present invention, network can be carried out according to reconstructed error function and neighbour's similitude error function
Training is led to by the way that the information for preferably being included using the training video of input may be implemented by reconstructed error function minimization
It crosses and minimizes neighbour's similitude error function, can maximize and save neighbour's similitude.When being trained to network, use
Training loss function can be reconstruct error function and the weighting of neighbour's similitude error function:
L=α Ls+(1-α)Lr;(12)
Wherein, α indicates the hyper parameter of balance reconstruct error function and neighbour's similitude error function.
, can be by the way of the conduction of reversed gradient in end-to-end training network in the embodiment of the present invention, Lai Youhua
Network parameter.
It as an example, is the Hash learning process schematic diagram two of the embodiment of the present invention referring to Fig. 3, Fig. 3.To network into
When row training, when inputting a training video, time domain appearance neighbour's feature can be embedded into Hash coding network and be carried out
Hash study, generates Hash codes by a coding network, is protected by minimizing reconstruction error and neighbour's similitude error
Demonstrate,prove the intact preservation of Near-neighbor Structure in Hamming space.
The Hash learning method of the embodiment of the present invention kept based on Near-neighbor Structure, by obtaining video training set, for
Each training video in video training set extracts M frame level feature of each training video, later, using autocoder,
The time domain external appearance characteristic of each training video is extracted, and clock synchronization domain external appearance characteristic is clustered, obtains anchor point characteristic set, and
Afterwards, for each training video, the corresponding time domain appearance neighbour's feature of each training video is obtained from anchor point characteristic set, and
Each training video is encoded to corresponding depth and is expressed according to time domain appearance neighbour's feature using coding network, later, root
According to the full linking layer for using activation primitive, the corresponding depth of each training video is expressed, a column two-value code is converted into, then,
Using decoding network, the corresponding M reconstruct frame level feature of each training video is reconstructed according to two-value code, later, according to each
The corresponding frame level feature of training video and reconstruct frame level feature, generate reconstructed error function, and according to time domain external appearance characteristic and two
It is worth code, generates neighbour's similitude error function, finally, network is trained, so that reconstructed error function minimization, and make
Neighbour's similitude error function minimizes;Wherein, network includes coding network, full linking layer and decoding network.It, will in the present invention
The neighbour of video is embedded into coding network, as a result, to video frame level feature carry out cataloged procedure in, in the video with its
The similar content of neighbour is able to more be paid close attention to, and then the retrieval essence on extensive unsupervised video database can be improved
Degree.Also, by minimizing reconstruction error and neighbour's similitude error, it is ensured that the intact guarantor of Near-neighbor Structure in Hamming space
It deposits, further increases the retrieval precision on video database.
In order to realize above-described embodiment, the present invention also proposes a kind of Hash learning device kept based on Near-neighbor Structure.
Fig. 4 is the structural representation of the Hash learning device kept provided by the embodiment of the present invention two based on Near-neighbor Structure
Figure.
As shown in figure 4, the Hash learning device that should be kept based on Near-neighbor Structure includes: to obtain module 101, extraction module
102, coding module 103, conversion module 104, reconstructed module 105, generation module 106 and training module 107.
Wherein, module 101 is obtained, for each training video in video training set, to mention for obtaining video training set
Take M frame level feature of each training video.
Extraction module 102 extracts the time domain external appearance characteristic of each training video, and clock synchronization for using autocoder
Domain external appearance characteristic is clustered, and anchor point characteristic set is obtained.
Module 101 is obtained, is also used to obtain each training video pair from anchor point characteristic set for each training video
The time domain appearance neighbour's feature answered.
Coding module 103, for being encoded each training video according to time domain appearance neighbour's feature using coding network
For the expression of corresponding depth.
As a kind of possible implementation, each training video has a time domain appearance neighbour's feature, respectively Coding module 103, is specifically used for:
By the corresponding a time domain appearance neighbour's feature of each training video, arranges to merging and obtain primary vector
The Near-neighbor Structure that primary vector is mapped as b dimension is expressed into ni, whereinFC table
Show full linking layer mapping;
For each training video, first moment, the first frame frame level feature of each training video is input to volume
Code network, and by neighbour's structure representation niIt is embedded in the memory state of b dimension as follows:
Wherein, d is fixed value, Wq、Wk、WvFor the parameter value of coding network,It indicates to arrange to merging,It indicates to correspond to
The frame level feature of first moment input of training video, mi,1Indicate first moment corresponding memory state;
When there is new frame level feature to be input to coding network, memory state is carried out as follows update:
Wherein, 1 < t≤M,Indicate the frame level feature of t-th of moment input, mi,tIndicate the corresponding memory of t-th of moment
State, mi,t-1Indicate the t-1 moment corresponding memory state;
Coding network is LSTM network, each arithmetic element in coding network are as follows:
Wherein, MLP indicates that multi-level mapping, BN indicate batch standardization, Wiv、Wih、Wfv、Wfh、Wov、WohPresentation code network
Parameter value,Indicate inner product;
The last one moment resulting hidden layer is exported into hi,M, as the depth expression for corresponding to training video;Wherein, Indicate the frame level feature of corresponding training video, the parameter of θ presentation code network.
Conversion module 104, for according to the full linking layer for using activation primitive, by the corresponding depthmeter of each training video
It reaches, is converted into a column two-value code.
As a kind of possible implementation, according to the full linking layer for using activation primitive, by the depth of corresponding training video
Degree expression is converted, an obtained column two-value code are as follows: bi=sgn (ti);Wherein, ti=FC (hi,M,k);FC indicates full link
Layer mapping, sgn indicate sign function, work as tiWhen greater than 0, sgn (ti) it is 1, work as tiWhen less than or equal to 0, sgn (ti) it is -1, k
Indicate the length of a column two-value code.
Reconstructed module 105 reconstructs the corresponding M weight of each training video according to two-value code for using decoding network
Structure frame level feature.
Generation module 106, for generating reconstruct according to the corresponding frame level feature of each training video and reconstruct frame level feature
Error function, and according to time domain external appearance characteristic and two-value code, generate neighbour's similitude error function.
It include N number of training video, reconstructed error function in video training set as a kind of possible implementation are as follows:Wherein,Indicate m-th of frame level feature in i-th of training video,It indicates i-th
M-th of reconstruct frame level feature in training video.
As a kind of possible implementation, neighbour's similitude error function are as follows:
Wherein, sijIndicate i-th of training video time domain external appearance characteristic and j-th of training video time domain external appearance characteristic it
Between similitude, tiFor two-value code biRelaxation indicate.
Training module 107, for being trained to network, so that reconstructed error function minimization, and keep neighbour similar
Property error function minimize;Wherein, network includes coding network, full linking layer and decoding network.
It should be noted that the aforementioned explanation to the Hash learning method embodiment kept based on Near-neighbor Structure is also fitted
For the Hash learning device of the embodiment kept based on Near-neighbor Structure, details are not described herein again.
The Hash learning device of the embodiment of the present invention kept based on Near-neighbor Structure, by obtaining video training set, for
Each training video in video training set extracts M frame level feature of each training video, later, using autocoder,
The time domain external appearance characteristic of each training video is extracted, and clock synchronization domain external appearance characteristic is clustered, obtains anchor point characteristic set, and
Afterwards, for each training video, the corresponding time domain appearance neighbour's feature of each training video is obtained from anchor point characteristic set, and
Each training video is encoded to corresponding depth and is expressed according to time domain appearance neighbour's feature using coding network, later, root
According to the full linking layer for using activation primitive, the corresponding depth of each training video is expressed, a column two-value code is converted into, then,
Using decoding network, the corresponding M reconstruct frame level feature of each training video is reconstructed according to two-value code, later, according to each
The corresponding frame level feature of training video and reconstruct frame level feature, generate reconstructed error function, and according to time domain external appearance characteristic and two
It is worth code, generates neighbour's similitude error function, finally, network is trained, so that reconstructed error function minimization, and make
Neighbour's similitude error function minimizes;Wherein, network includes coding network, full linking layer and decoding network.It, will in the present invention
The neighbour of video is embedded into coding network, as a result, to video frame level feature carry out cataloged procedure in, in the video with its
The similar content of neighbour is able to more be paid close attention to, and then the retrieval essence on extensive unsupervised video database can be improved
Degree.Also, by minimizing reconstruction error and neighbour's similitude error, it is ensured that the intact guarantor of Near-neighbor Structure in Hamming space
It deposits, further increases the retrieval precision on video database.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show
The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example
Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not
It must be directed to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be in office
It can be combined in any suitable manner in one or more embodiment or examples.In addition, without conflicting with each other, the skill of this field
Art personnel can tie the feature of different embodiments or examples described in this specification and different embodiments or examples
It closes and combines.
In addition, term " first ", " second " are used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance
Or implicitly indicate the quantity of indicated technical characteristic.Define " first " as a result, the feature of " second " can be expressed or
Implicitly include at least one this feature.In the description of the present invention, the meaning of " plurality " is at least two, such as two, three
It is a etc., unless otherwise specifically defined.
Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes
It is one or more for realizing custom logic function or process the step of executable instruction code module, segment or portion
Point, and the range of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discussed suitable
Sequence, including according to related function by it is basic simultaneously in the way of or in the opposite order, Lai Zhihang function, this should be of the invention
Embodiment person of ordinary skill in the field understood.
Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use
In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for
Instruction execution system, device or equipment (such as computer based system, including the system of processor or other can be held from instruction
The instruction fetch of row system, device or equipment and the system executed instruction) it uses, or combine these instruction execution systems, device or set
It is standby and use.For the purpose of this specification, " computer-readable medium ", which can be, any may include, stores, communicates, propagates or pass
Defeated program is for instruction execution system, device or equipment or the dress used in conjunction with these instruction execution systems, device or equipment
It sets.The more specific example (non-exhaustive list) of computer-readable medium include the following: there is the electricity of one or more wirings
Interconnecting piece (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory
(ROM), erasable edit read-only storage (EPROM or flash memory), fiber device and portable optic disk is read-only deposits
Reservoir (CDROM).In addition, computer-readable medium can even is that the paper that can print described program on it or other are suitable
Medium, because can then be edited, be interpreted or when necessary with it for example by carrying out optical scanner to paper or other media
His suitable method is handled electronically to obtain described program, is then stored in computer storage.
It should be appreciated that each section of the invention can be realized with hardware, software, firmware or their combination.Above-mentioned
In embodiment, software that multiple steps or method can be executed in memory and by suitable instruction execution system with storage
Or firmware is realized.Such as, if realized with hardware in another embodiment, following skill well known in the art can be used
Any one of art or their combination are realized: have for data-signal is realized the logic gates of logic function from
Logic circuit is dissipated, the specific integrated circuit with suitable combinational logic gate circuit, programmable gate array (PGA), scene can compile
Journey gate array (FPGA) etc..
Those skilled in the art are understood that realize all or part of step that above-described embodiment method carries
It suddenly is that relevant hardware can be instructed to complete by program, the program can store in a kind of computer-readable storage medium
In matter, which when being executed, includes the steps that one or a combination set of embodiment of the method.
It, can also be in addition, each functional unit in each embodiment of the present invention can integrate in a processing module
It is that each unit physically exists alone, can also be integrated in two or more units in a module.Above-mentioned integrated mould
Block both can take the form of hardware realization, can also be realized in the form of software function module.The integrated module is such as
Fruit is realized and when sold or used as an independent product in the form of software function module, also can store in a computer
In read/write memory medium.
Storage medium mentioned above can be read-only memory, disk or CD etc..Although having been shown and retouching above
The embodiment of the present invention is stated, it is to be understood that above-described embodiment is exemplary, and should not be understood as to limit of the invention
System, those skilled in the art can be changed above-described embodiment, modify, replace and become within the scope of the invention
Type.
Claims (10)
1. a kind of Hash learning method kept based on Near-neighbor Structure, which is characterized in that the described method includes:
S1, video training set is obtained, for each training video in the video training set, extracts each training video
M frame level feature;
S2, using autocoder, extract the time domain external appearance characteristic of each training video, and carry out to the time domain external appearance characteristic
Cluster, obtains anchor point characteristic set;
S3, it is directed to each training video, it is close that the corresponding time domain appearance of each training video is obtained from the anchor point characteristic set
Adjacent feature;
S4, each training video is encoded to according to time domain appearance neighbour's feature by corresponding depthmeter using coding network
It reaches;
S5, according to the full linking layer for using activation primitive, the corresponding depth of each training video is expressed, a column are converted into
Two-value code;
S6, using decoding network, the corresponding M reconstruct frame level feature of each training video is reconstructed according to the two-value code;
S7, according to the corresponding frame level feature of each training video and the reconstruct frame level feature, generate reconstructed error function,
And according to the time domain external appearance characteristic and the two-value code, neighbour's similitude error function is generated;
S8, network is trained, so that the reconstructed error function minimization, and make neighbour's similitude error function
It minimizes;Wherein, the network includes the coding network, the full linking layer and the decoding network.
2. the method according to claim 1, wherein each training video have a time domain appearance neighbour's feature,
RespectivelyStep S4 is specifically included:
S41, by the corresponding a time domain appearance neighbour's feature of each training video, arrange to merging and obtain primary vector
S42, the Near-neighbor Structure expression n that the primary vector is mapped as to b dimensioni, wherein
FC indicates full linking layer mapping;
S43, the first frame frame level feature of each training video is input to institute first moment for each training video
State coding network, and by neighbour's structure representation niIt is embedded in the memory state of b dimension as follows:
Wherein, d is fixed value, Wq、Wk、WvFor the parameter value of coding network,It indicates to arrange to merging,Indicate corresponding training
The frame level feature of first moment input of video, mi,1Indicate first moment corresponding memory state;
S44, when there is new frame level feature to be input to coding network, memory state is carried out as follows update:
Wherein, 1 < t≤M,Indicate the frame level feature of t-th of moment input, mi,tIndicate t-th of moment corresponding memory state,
mi,t-1Indicate the t-1 moment corresponding memory state;
The coding network is LSTM network, each arithmetic element in the coding network are as follows:
Wherein, MLP indicates that multi-level mapping, BN indicate batch standardization, Wiv、Wih、Wfv、Wfh、Wov、WohIndicate the coding network
Parameter value, ⊙ indicate inner product;
The last one moment resulting hidden layer is exported h by S45i,M, as the depth expression for corresponding to training video;
Wherein, Indicate the frame level feature of corresponding training video, the parameter of θ presentation code network.
3. according to the method described in claim 2, it is characterized in that, being instructed according to the full linking layer for using activation primitive by corresponding
The depth expression for practicing video is converted, an obtained column two-value code are as follows:
bi=sgn (ti);
Wherein, ti=FC (hi,M,k);FC indicates full linking layer mapping, and sgn indicates sign function, works as tiWhen greater than 0, sgn (ti)
It is 1, works as tiWhen less than or equal to 0, sgn (ti) it is the length that -1, k indicates the column two-value code.
4. the method according to claim 1, wherein in the video training set include N number of training video,
The reconstructed error function are as follows:
Wherein,Indicate m-th of frame level feature in i-th of training video,Indicate m-th of weight in i-th of training video
Structure frame level feature.
5. according to the method described in claim 4, it is characterized in that, neighbour's similitude error function are as follows:
Wherein, sijIt indicates between the time domain external appearance characteristic of i-th of training video and the time domain external appearance characteristic of j-th of training video
Similitude, tiFor two-value code biRelaxation indicate.
6. a kind of Hash learning device kept based on Near-neighbor Structure, which is characterized in that described device includes:
Module is obtained, for obtaining video training set, for each training video in the video training set, is extracted described every
M frame level feature of a training video;
Extraction module extracts the time domain external appearance characteristic of each training video for using autocoder, and to it is described when it is overseas
It sees feature to be clustered, obtains anchor point characteristic set;
The acquisition module is also used to obtain each training video from the anchor point characteristic set for each training video
Corresponding time domain appearance neighbour's feature;
According to time domain appearance neighbour's feature, each training video is encoded to pair for using coding network for coding module
The depth expression answered;
Conversion module expresses the corresponding depth of each training video for according to the full linking layer for using activation primitive,
It is converted into a column two-value code;
Reconstructed module reconstructs the corresponding M reconstructed frame of each training video according to the two-value code for using decoding network
Grade feature;
Generation module, for generating weight according to the corresponding frame level feature of each training video and the reconstruct frame level feature
Structure error function, and according to the time domain external appearance characteristic and the two-value code, generate neighbour's similitude error function;
Training module, for being trained to network, so that the reconstructed error function minimization, and keep the neighbour similar
Property error function minimize;Wherein, the network includes the coding network, the full linking layer and the decoding network.
7. device according to claim 6, which is characterized in that each training video has a time domain appearance neighbour's feature,
RespectivelyThe coding module, is specifically used for:
By the corresponding a time domain appearance neighbour's feature of each training video, arranges to merging and obtain primary vector
The Near-neighbor Structure that the primary vector is mapped as b dimension is expressed into ni, whereinFC table
Show full linking layer mapping;
For each training video, first moment, the first frame frame level feature of each training video is input to the volume
Code network, and by neighbour's structure representation niIt is embedded in the memory state of b dimension as follows:
Wherein, d is fixed value, Wq、Wk、WvFor the parameter value of coding network,It indicates to arrange to merging,Indicate corresponding training
The frame level feature of first moment input of video, mi,1Indicate first moment corresponding memory state;
When there is new frame level feature to be input to coding network, memory state is carried out as follows update:
Wherein, 1 < t≤M,Indicate the frame level feature of t-th of moment input, mi,tIndicate t-th of moment corresponding memory state,
mi,t-1Indicate the t-1 moment corresponding memory state;
The coding network is LSTM network, each arithmetic element in the coding network are as follows:
Wherein, MLP indicates that multi-level mapping, BN indicate batch standardization, Wiv、Wih、Wfv、Wfh、Wov、WohIndicate the coding network
Parameter value, ⊙ indicate inner product;
The last one moment resulting hidden layer is exported into hi,M, as the depth expression for corresponding to training video;Wherein, Indicate the frame level feature of corresponding training video, the parameter of θ presentation code network.
8. device according to claim 7, which is characterized in that according to the full linking layer for using activation primitive, instructed corresponding
The depth expression for practicing video is converted, an obtained column two-value code are as follows:
bi=sgn (ti);
Wherein, ti=FC (hi,M,k);FC indicates full linking layer mapping, and sgn indicates sign function, works as tiWhen greater than 0, sgn (ti)
It is 1, works as tiWhen less than or equal to 0, sgn (ti) it is the length that -1, k indicates the column two-value code.
9. device according to claim 6, which is characterized in that it include N number of training video in the video training set,
The reconstructed error function are as follows:
Wherein,Indicate m-th of frame level feature in i-th of training video,Indicate m-th of weight in i-th of training video
Structure frame level feature.
10. device according to claim 8, which is characterized in that neighbour's similitude error function are as follows:
Wherein, sijIt indicates between the time domain external appearance characteristic of i-th of training video and the time domain external appearance characteristic of j-th of training video
Similitude, tiFor two-value code biRelaxation indicate.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910264740.9A CN110069666B (en) | 2019-04-03 | 2019-04-03 | Hash learning method and device based on neighbor structure keeping |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910264740.9A CN110069666B (en) | 2019-04-03 | 2019-04-03 | Hash learning method and device based on neighbor structure keeping |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110069666A true CN110069666A (en) | 2019-07-30 |
CN110069666B CN110069666B (en) | 2021-04-06 |
Family
ID=67366914
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910264740.9A Active CN110069666B (en) | 2019-04-03 | 2019-04-03 | Hash learning method and device based on neighbor structure keeping |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110069666B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112199520A (en) * | 2020-09-19 | 2021-01-08 | 复旦大学 | Cross-modal Hash retrieval algorithm based on fine-grained similarity matrix |
CN113111836A (en) * | 2021-04-25 | 2021-07-13 | 山东省人工智能研究院 | Video analysis method based on cross-modal Hash learning |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012077818A1 (en) * | 2010-12-10 | 2012-06-14 | 国立大学法人豊橋技術科学大学 | Method for determining conversion matrix for hash function, hash-type approximation nearest neighbour search method using said hash function, and device and computer program therefor |
CN103744973A (en) * | 2014-01-11 | 2014-04-23 | 西安电子科技大学 | Video copy detection method based on multi-feature Hash |
CN106777130A (en) * | 2016-12-16 | 2017-05-31 | 西安电子科技大学 | A kind of index generation method, data retrieval method and device |
CN107229757A (en) * | 2017-06-30 | 2017-10-03 | 中国科学院计算技术研究所 | The video retrieval method encoded based on deep learning and Hash |
CN108304808A (en) * | 2018-02-06 | 2018-07-20 | 广东顺德西安交通大学研究院 | A kind of monitor video method for checking object based on space time information Yu depth network |
CN108763481A (en) * | 2018-05-29 | 2018-11-06 | 清华大学深圳研究生院 | A kind of picture geographic positioning and system based on extensive streetscape data |
CN109151501A (en) * | 2018-10-09 | 2019-01-04 | 北京周同科技有限公司 | A kind of video key frame extracting method, device, terminal device and storage medium |
CN109299097A (en) * | 2018-09-27 | 2019-02-01 | 宁波大学 | A kind of online high dimensional data K-NN search method based on Hash study |
CN109409208A (en) * | 2018-09-10 | 2019-03-01 | 东南大学 | A kind of vehicle characteristics extraction and matching process based on video |
-
2019
- 2019-04-03 CN CN201910264740.9A patent/CN110069666B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012077818A1 (en) * | 2010-12-10 | 2012-06-14 | 国立大学法人豊橋技術科学大学 | Method for determining conversion matrix for hash function, hash-type approximation nearest neighbour search method using said hash function, and device and computer program therefor |
CN103744973A (en) * | 2014-01-11 | 2014-04-23 | 西安电子科技大学 | Video copy detection method based on multi-feature Hash |
CN106777130A (en) * | 2016-12-16 | 2017-05-31 | 西安电子科技大学 | A kind of index generation method, data retrieval method and device |
CN107229757A (en) * | 2017-06-30 | 2017-10-03 | 中国科学院计算技术研究所 | The video retrieval method encoded based on deep learning and Hash |
CN108304808A (en) * | 2018-02-06 | 2018-07-20 | 广东顺德西安交通大学研究院 | A kind of monitor video method for checking object based on space time information Yu depth network |
CN108763481A (en) * | 2018-05-29 | 2018-11-06 | 清华大学深圳研究生院 | A kind of picture geographic positioning and system based on extensive streetscape data |
CN109409208A (en) * | 2018-09-10 | 2019-03-01 | 东南大学 | A kind of vehicle characteristics extraction and matching process based on video |
CN109299097A (en) * | 2018-09-27 | 2019-02-01 | 宁波大学 | A kind of online high dimensional data K-NN search method based on Hash study |
CN109151501A (en) * | 2018-10-09 | 2019-01-04 | 北京周同科技有限公司 | A kind of video key frame extracting method, device, terminal device and storage medium |
Non-Patent Citations (2)
Title |
---|
CHEN,ZHIXIANG;LU,JIWEN: ""Nonlinear Structural Hashing for Scalable Video Search"", 《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 》 * |
鲁继文: ""二值表示学习及其应用"", 《模式识别与人工智能》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112199520A (en) * | 2020-09-19 | 2021-01-08 | 复旦大学 | Cross-modal Hash retrieval algorithm based on fine-grained similarity matrix |
CN112199520B (en) * | 2020-09-19 | 2022-07-22 | 复旦大学 | Cross-modal Hash retrieval algorithm based on fine-grained similarity matrix |
CN113111836A (en) * | 2021-04-25 | 2021-07-13 | 山东省人工智能研究院 | Video analysis method based on cross-modal Hash learning |
Also Published As
Publication number | Publication date |
---|---|
CN110069666B (en) | 2021-04-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lin et al. | Tuigan: Learning versatile image-to-image translation with two unpaired images | |
Cappello et al. | Use cases of lossy compression for floating-point data in scientific data sets | |
JP5235666B2 (en) | Associative matrix method, system and computer program product using bit-plane representation of selected segments | |
CN110782395B (en) | Image processing method and device, electronic equipment and computer readable storage medium | |
CN110019793A (en) | A kind of text semantic coding method and device | |
CN109478204A (en) | The machine of non-structured text understands | |
CN105144157B (en) | System and method for the data in compressed data library | |
Zhang et al. | An improved YOLOv3 model based on skipping connections and spatial pyramid pooling | |
CN110069666A (en) | The Hash learning method and device kept based on Near-neighbor Structure | |
KR20230072454A (en) | Apparatus, method and program for bidirectional generation between image and text | |
Zhou et al. | Using Siamese capsule networks for remote sensing scene classification | |
Zheng et al. | Trading positional complexity vs deepness in coordinate networks | |
Wang et al. | STCD: efficient Siamese transformers-based change detection method for remote sensing images | |
Xie et al. | GAGCN: Generative adversarial graph convolutional network for non‐homogeneous texture extension synthesis | |
Li et al. | Deep unsupervised hashing for large-scale cross-modal retrieval using knowledge distillation model | |
Debattista | Application‐Specific Tone Mapping Via Genetic Programming | |
CN116561272A (en) | Open domain visual language question-answering method and device, electronic equipment and storage medium | |
US11373274B1 (en) | Method for super resolution imaging based on deep learning | |
Varshney et al. | On palimpsests in neural memory: An information theory viewpoint | |
CN115272082A (en) | Model training method, video quality improving method, device and computer equipment | |
Boyd et al. | Vectors, matrices, and least squares | |
Imanpour et al. | Memory‐and time‐efficient dense network for single‐image super‐resolution | |
He et al. | Towards lifelong scene graph generation with knowledge-ware in-context prompt learning | |
Wang | A Markov Model‐Based Fusion Algorithm for Distorted Electronic Technology Archives | |
CN109800359A (en) | Information recommendation processing method, device, electronic equipment and readable storage medium storing program for executing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |