CN108280233A - A kind of VideoGIS data retrieval method based on deep learning - Google Patents
A kind of VideoGIS data retrieval method based on deep learning Download PDFInfo
- Publication number
- CN108280233A CN108280233A CN201810162847.8A CN201810162847A CN108280233A CN 108280233 A CN108280233 A CN 108280233A CN 201810162847 A CN201810162847 A CN 201810162847A CN 108280233 A CN108280233 A CN 108280233A
- Authority
- CN
- China
- Prior art keywords
- videogis
- frame
- layer
- data
- retrieval
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/73—Querying
- G06F16/735—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Library & Information Science (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of VideoGIS data retrieval method based on deep learning, including:First in the case where carrying out room and time sampling to VideoGIS data, the Euclidean distance of VideoGIS frame frame difference is calculated, and key-frame extraction is carried out to video lens;Then the depth convolutional neural networks model being alternately made of convolutional layer, active coating and pond layer is established, the VideoGIS frame image of input is mapped layer by layer, realizes that the depth characteristic of VideoGIS frame image indicates;Finally carry out layering retrieval:First layer is to carry out coarse search with hash method and Hamming distance;The second layer is filtered the result of first layer coarse search, realizes the preceding m essence retrieval of the VideoGIS frame image from candidate pool.The present invention extracts key frame using frame difference Euclidean distance so that effectiveness of retrieval greatly improves, and is trained using depth convolutional neural networks model, extracts higher level character representation so that retrieval time and storage overhead are greatly reduced.
Description
Technical field
The present invention relates to a kind of VideoGIS (Geographic Information System, ground based on deep learning
Manage information system) data retrieval method, belong to technical field of computer vision.
Background technology
VideoGIS is that geographical video merges a kind of new video generated with GIS, and the retrieval of the video is to governability and people
People's livelihood work brings huge facility.With the lasting enhancing of application breadth and depth, VideoGIS related industry has become newly
Industry growth point.Meanwhile the raising that the development and city security protection built with smart city require, it is how big from VideoGIS
The data needed for user are accurately found and obtained in data faces a series of bottleneck problems.On the one hand we have had accumulated flood tide
VideoGIS data, and also continuing to throw huge fund creation data, on the other hand, multitude of video GIS data is limited by huge body
Amount and the effective analysis of shortage, limit the breadth and depth of its application.Therefore, to these data be subject to analysis and utilization become for
How key quickly and effectively retrieves oneself required data from these VideoGIS data and becomes and study recently
Hot spot.
Traditional video frequency searching mode is video frequency searching and content based video retrieval system based on text key word
(Content-Based Video Retrieval, CBVR).Since descriptive power is limited, subjectivity is strong and heavy workload etc. is former
Cause, it is helpless for above-mentioned typical case based on the video frequency searching of text key word, the inspection of VideoGIS data depth cannot be met
The demand of rope.Content based video retrieval system (CBVR) is exactly according to content input by user (image etc.), in video database
In retrieve same or similar video clip or the process of key frame.In content based video retrieval system, retrieval
Object be often no longer limited to video data itself, but based on video " content " description data, such as color characteristic and
Textural characteristics.
Video frequency searching is generally divided into two video pre-filtering, feature extraction steps.Video pre-filtering it is the most key be close
The extraction of key frame.Key frame is the characteristics of image for the key content for describing a video lens, and face can be extracted from key frame
The low-level image features such as color, texture, shape, using the data source as video frequency abstract and database index.If extracting each frame of video,
Data volume is huge, and there is the video frame repeated with redundancy, therefore the extraction of key frame is very heavy to establishing video index
It wants.
In terms of feature extraction, traditional video frequency searching feature extraction algorithm (color characteristic, textural characteristics and shape feature
Deng) very high domain knowledge is needed to the description of feature, and deep learning simulates the structure of human brain, utilizes convolutional Neural net
The basic structures such as convolutional layer, pond layer and the full articulamentum of network, so that it may to allow network structure oneself to learn and extraction correlated characteristic.
Therefore, can be had to VideoGIS image using deep learning extraction feature and more accurately describe degree so that VideoGIS number
It is substantially reduced according to the range of retrieval, to reach accurate and quickly retrieve purpose.
In the prior art in order to efficiently indicate video frequency feature data, the method or two of real number character representation may be used
It is worth the method for Hash coded representation.Method based on real number character representation refers to the real number feature vector for extracting video frame images
It as expression, but takes since this representation method is comparable in retrieval and accounts for memory space, cannot meet extensive
VideoGIS data retrieval demand;Method based on two-value Hash coded representation be by video frame images binary-coding to
Amount carrys out coded representation, and compared to the method using real number character representation, under the expression of equal length, memory space significantly subtracts
It is few.For example, in luv space, if as soon as a video feature vector accounts for 1024 bytes, then 100,000,000 video features need
The memory space of 100G is wanted, and if the Hash coded representation of 128 bits of each video features, all video Hash
Memory space only needs 1.6G.Meanwhile similar video frame images have similar two-value code, are then measured using Hamming distance
Similitude between two-value code, speed are quite fast.
Invention content
Goal of the invention:In order to overcome the deficiencies in the prior art, the present invention to provide a kind of regarding based on deep learning
Frequency GIS data search method is consumed with solving to be difficult to obtain in VideoGIS data retrieval accurately retrieval result, memory space
Greatly, the slow problem of retrieval rate.
Realize foregoing invention content, it is necessary to solve several key problems:(1) it is directed to exist in VideoGIS library and repeat
The problem of with the VideoGIS frame of redundancy, designs a kind of efficient extraction method of key frame;(2) it is directed in the prior art to video
The not strong problem of GIS characteristics of the underlying image ability to express is realized using deep learning method based on depth convolutional neural networks
Feature extraction algorithm;(3) the problem of being directed to retrieval rate is designed a kind of VideoGIS data retrieval method of layering retrieval, is being examined
Suo Sudu, precision etc. meet the Search Requirement of extensive VideoGIS data.
Technical solution:To achieve the above object, the technical solution adopted by the present invention is:
A kind of VideoGIS data retrieval method based on deep learning, which is characterized in that include the following steps:
A. key-frame extraction
In order to ensure validity (i.e. the quantity of key frame is enough to represent video lens) and the inspection of VideoGIS data of key frame
The efficiency of rope, and the time response of video is reflected, the present invention calculates video in the case where carrying out room and time sampling to video
The Euclidean distance of GIS frames frame difference, and key-frame extraction is carried out to video lens;
The calculating of frame difference uses Euclidean distance wherein between consecutive frame, under normal circumstances, the VideoGIS in same camera lens
Frame difference fluctuates above and below average value between frame, and changes smaller.It is assumed that frame difference between consecutive frame be (D1, D2 ... ..., Dn-1,
N indicates the totalframes of camera lens), and VideoGIS frame is coloured image, needs to convert thereof into gray level image, it is assumed that the frame of conversion
It is (X [1], X [2] ..., X [n]), then formula 1 is the frame difference calculation formula between all VideoGIS frames in camera lens.
It needs to specialize, since VideoGIS data are high-definition datas, key frame pixel is relatively high, causes follow-up
The key point obtained when extraction is excessive, and characteristic matching speed is slow, influences VideoGIS data search efficiency, therefore the present invention is protecting
Before depositing key frame, sampling processing has been carried out to camera lens, pass is reduced in the case that ensureing that key frame information is complete as far as possible
The pixel of key frame.
B. depth characteristic is extracted
Establish the depth convolutional neural networks model being alternately made of convolutional layer, active coating and pond layer, the video of input
GIS frames image is mapped layer by layer in a network, obtains each layer for the different representation of VideoGIS frame image, realization regards
The depth characteristic of frequency GIS frame images indicates;
C. layering retrieval
The retrieving includes that coarse search and essence are retrieved:The high dimensional feature vector that depth network model is learnt first
It is converted to two-value code, the similitude between Hamming distance measurement two-value code is then used, obtains the candidate of candidate similar key frame
Pond;Then between the VideoGIS frame image in VideoGIS frame image to be retrieved and candidate pool being measured them with Euclidean distance
Similitude, m similar retrieval results before finally obtaining.
Further, a. key-frame extractions specifically include:
Input:Video lens V={ V1, V2... Vn, the crucial frame number of selection:K;
Output:The key frame of video;
A1. the frame that adjacent key frame is calculated using Euclidean distance is poor, and cyclic variable i is from 1 to n-2 for setting, and n indicates camera lens
Totalframes;
A2. it as i=n-2, indicates that all VideoGIS frames of camera lens have stepped through end, exports the Europe of VideoGIS frame difference
Otherwise formula distance, end loop continue to execute a1;
A3. extreme value, maximum value, minimum value and the median of frame difference Euclidean distance are calculated;
If a4. extreme value>Median then filters out extreme value, otherwise deletes the extreme point less than or equal to median;
If the number of the extreme point for the crucial frame number K > screenings a5. chosen, the extreme value of screening is chosen as key frame,
Otherwise, preceding K frames are chosen in the extreme value of screening as key frame.
Further, the b. depth characteristics extraction specifically includes:
B1. the size of the preceding unified image of training:Size is uniformly arrived using the method placed in the middle for cutting (centerCrop)
224*224, i.e., first zoom to 224 proportionality coefficient according to minimum edge, then carries out whole scaling, is then with center to long side
Benchmark does isometric cutting to both sides respectively, retains 224 length, can ensure that image is indeformable while protruding image substantially in this way
Main body;
B2. depth convolutional neural networks model is established:Including 5 sections of convolution sums, 3 full articulamentums, there is 2-3 in every section of convolution
A convolutional layer, while every section of convolution tail portion can connect a maximum pond layer to reduce the size of picture;Each convolutional layer has 3*3
Filter, then use activation primitive be correct linear unit (Rectified Linear Unit, ReLU), by activation letter
Nonlinear transformation is counted up into, learning ability of this model to feature is enhanced;
B3. loss function and optimization method:After above-mentioned model construction, it would be desirable to training pattern, wherein loss function
The logarithm of multiclass is selected to lose (categorical_crossentropy) function, carrying out parameter by stochastic gradient descent method seeks
It is excellent to minimize loss function, wherein learning rate is 0.1, and attenuation term 1e-6, momentum 0.9 uses newton momentum (nesterov)
Optimal gradient optimization algorithm;
B4. it is based on model extraction feature:When extracting feature, a unified size is scaled the images to by b1., and
Image is inputted in above-mentioned model and is calculated, while training convolutional neural networks, finally obtains the feature vector of higher-dimension;First
Stage beginning carries out feature extraction operation to VideoGIS key frame library first, generates higher-dimension real-valued, to construct one
Property data base;When carrying out VideoGIS data retrieval, feature extraction operation is carried out to VideoGIS frame image to be retrieved, it is raw
At feature to be retrieved.
Further, the depth convolutional neural networks specifically include:
First segment:Including 2 convolutional layers and a pond layer, input as 224 × 224 × 3 image datas, by 64 mistakes
Filter, window size be 3*3 convolutional layer handle, then carry out ReLU activation primitive processing, output be characterized as 224 × 224 ×
64, the core of maximum pond 2*2 is carried out by pond layer, step-length 2 obtains 112 × 112 × 64 data;
Second segment:Including 2 convolutional layers and a pond layer, input data 112 × 112 × 64 is filtered by 128
Device, window size be 3*3 convolutional layer handle, then carry out ReLU activation primitive processing, output be characterized as 112 × 112 ×
128, the core of maximum pond 2*2 is carried out by pond layer, step-length 2 obtains 56 × 56 × 128 data;
Third section:Including 3 convolutional layers and a pond layer, input data 56 × 56 × 128, by 256 filters,
The convolutional layer that window size is 3*3 is handled, and then carries out ReLU activation primitive processing, and output is characterized as 56 × 56 × 256, passes through
Pond layer carries out the core of maximum pond 2*2, and step-length 2 obtains 28 × 28 × 256 data;
4th section:Including 3 convolutional layers and a pond layer, input data 28 × 28 × 256, by 512 filters,
The convolutional layer that window size is 3*3 is handled, and then carries out ReLU activation primitive processing, and output is characterized as 28 × 28 × 512, passes through
Pond layer carries out the core of maximum pond 2*2, and step-length 2 obtains 14 × 14 × 512 data;
5th section:Including 3 convolutional layers and a pond layer, input data 14 × 14 × 512, by 512 filters,
The convolutional layer that window size is 3*3 is handled, and then carries out ReLU activation primitive processing, and output is characterized as 14 × 14 × 512, passes through
Pond layer carries out the core of maximum pond 2*2, and step-length 2 obtains 7 × 7 × 512 data;
6th section:Input data 7 × 7 × 512, it is complete to connect, 4096 features are obtained, are then carried out at ReLU activation primitives
Reason, output are characterized as 4096, by Dropout processing (preventing model over-fitting), finally obtain 4096 data;
7th section:Input data 4096, it is complete to connect, 4096 features are obtained, ReLU activation primitive processing is then carried out, it is defeated
Go out to be characterized as 4096, by Dropout processing, finally obtains 4096 data;
8th section:Input data 4096, it is complete to connect, obtain 1000 characteristics.
Further, the first layer coarse search specifically includes:
In order to carry out efficient VideoGIS data retrieval, the high dimensional feature vector learnt by depth network model is turned
It is melted into two-value code, the similitude between Hamming distance measurement two-value code is then used, obtains the candidate pool of candidate similar key frame.
In order to learn to obtain character representation simultaneously and obtain one group of hash function, in the good convolutional neural networks of pre-training
Between 7th section and the 8th section, it is inserted into a new full articulamentum, (S types grow bent this layer using sigmoid activation primitives
Line) feature vector of the 7th section of model output is converted to two-value code;Wherein, the initial parameter of depth convolutional neural networks be from
Training is obtained on ImageNet data sets (an existing image data base), and new full articulamentum is initially joined
Number, cryptographic Hash is built by the way of random projection transforms;
For VideoGIS frame to be retrieved, what is extracted first is the feature of the output of new full articulamentum, by activation
Threshold value binarization after obtain two-value code;Finally by two in the two-value code of VideoGIS frame to be retrieved and property data base
Hamming distance between value code is less than those of given threshold value VideoGIS frame image and is put into candidate pool.
Further, the second layer essence retrieval specifically includes:
In coarse search, the Hamming distance between two-value Hash codes is less than those of given threshold value VideoGIS frame image
It is put into candidate pool, in order to obtain more accurately retrieval result, further using precisely retrieving on the basis of coarse search
Method.
For the candidate pool image obtained in VideoGIS frame image to be retrieved and coarse search, according to from convolutional Neural net
The feature of 7th section of extraction of network, specifically calculates the similarity between them with Euclidean distance, is regarded from candidate pool to determine
The preceding m retrieval result of frequency GIS frame images.Euclidean distance is smaller, and the similarity of two images is higher, before thus determining
M similar retrieval results.
Advantageous effect:A kind of VideoGIS data retrieval method based on deep learning provided by the invention, relative to existing
Technology has the following advantages:1, it due to the extraction method of key frame using the Euclidean distance based on frame difference, has well solved and has regarded
There are problems that repeating the VideoGIS frame with redundancy in the libraries frequency GIS, reduces the occupancy of memory, accelerate the index of video;2、
Due to carrying out feature extraction using depth convolutional neural networks, such feature vector has more precisely VideoGIS frame image
Description degree, have preferable experiment effect, realize the feature extraction of each key frame in extensive VideoGIS data;3, exist
Layering retrieval when, the speed of retrieval is improved under conditions of ensureing precision using the thought of two-value Hash, accomplished not only soon but also
Standard meets the Search Requirement of extensive VideoGIS data.
Description of the drawings
Fig. 1 is a kind of flow chart of the VideoGIS data retrieval method based on deep learning of the present invention;
Fig. 2 is the flow chart of a. key-frame extractions in the present invention;
Fig. 3 is the structure chart of depth convolutional neural networks model in the present invention.
Specific implementation mode
The present invention is further described below in conjunction with the accompanying drawings.
It is as shown in Figure 1 a kind of VideoGIS data retrieval method based on deep learning, mainly includes the following steps that:
A. key-frame extraction
For VideoGIS data, there are many information repeated with redundancy in VideoGIS data, if do not carried out to it pre-
Processing, then VideoGIS data volume can be quite big, effectiveness of retrieval will substantially reduce.For example, can in VideoGIS data
Static picture can be will appear, if each frame of extraction video, then there will be the VideoGIS frame of repetition or redundancy.Cause
This, we need to pre-process VideoGIS data first, are split to camera lens, choose valuable information, and representative regards
The main contents of frequency camera lens, i.e. key frame.
Simultaneously as VideoGIS data are high-definition datas, key frame pixel is relatively high, causes to obtain when subsequent extracted
Key point is excessive, and characteristic matching speed is slow, influences VideoGIS data search efficiency, therefore the present invention is before preserving key frame,
Sampling processing has been carried out to camera lens first, has ensured the complete picture for reducing key frame of key frame information as far as possible
Element.In key-frame extraction, need colored VideoGIS frame image being converted to gray level image, then calculate neighbor frame difference it
Between Euclidean distance, to obtain the key frame of VideoGIS data, structure key frame library.
Fig. 2 indicates the flow chart of key-frame extraction, is as follows:
Input:Video lens V=V1, V2 ... and Vn }, the crucial frame number of selection:K=5;
Output:The key frame of video;
A1. the frame that adjacent key frame is calculated using Euclidean distance is poor, and cyclic variable i is from 1 to n-2 for setting, and n indicates camera lens
Totalframes;
A2. it as i=n-2, indicates that all VideoGIS frames of camera lens have stepped through end, exports the Europe of VideoGIS frame difference
Otherwise formula distance, end loop continue to execute a1;
A3. extreme value, maximum value, minimum value and the median of frame difference Euclidean distance are calculated;
If a4. extreme value>Median then filters out extreme value, otherwise deletes the extreme point less than or equal to median;
If the number of the extreme point for the crucial frame number K > screenings a5. chosen, the extreme value of screening is chosen as key frame,
Otherwise, preceding K frames are chosen in the extreme value of screening as key frame.
B. depth characteristic is extracted
Depth network has very strong feature abstraction ability, can extract the spy rich in semantic information to VideoGIS data
Sign indicates.Therefore, identification is had more in order to make the Hash of acquisition encode, is extracted using depth characteristic, to obtain VideoGIS number
According to depth characteristic indicate.
The present invention describes the feature of VideoGIS frame image with VGGNet (depth convolutional Neural) network architectures, and depth is special
Sign extracting method is designed to 5 sections of convolution, including subsidiary pond layer and nonlinear activation layer, behind the last one convolutional layer
Add a global pool layer to quantify to feature, as shown in Figure 3.It specifically includes:
B1. the size of the preceding unified image of training:It is using the method for centerCrop that size is unified to 224*224, i.e., first
224 proportionality coefficient is zoomed to according to minimum edge, then carries out whole scaling, then long side is divided on the basis of center to both sides
Isometric cutting is not done, retains 224 size;
B2. depth convolutional neural networks model is established:Including 5 sections of convolution sums, 3 full articulamentums, there is 2-3 in every section of convolution
A convolutional layer, while every section of convolution tail portion connects a maximum pond layer to reduce the size of picture;Each convolutional layer has 3*3's
Then filter uses activation primitive ReLU, completes nonlinear transformation by activation primitive, enhance study energy of this model to feature
Power;
B3. loss function and optimization method:After above-mentioned model construction, it would be desirable to training pattern, wherein selecting
Categorical_crossentropy loss functions carry out parameter optimization to minimize loss letter by stochastic gradient descent method
Number, wherein learning rate are 0.1, and attenuation term 1e-6, momentum 0.9 uses nesterov Optimal gradient optimization algorithms;
B4. it is based on model extraction feature:When extracting feature, a unified size is scaled the images to by b1., and
Image is inputted in above-mentioned model and is calculated, while training convolutional neural networks, finally obtains the feature vector of higher-dimension;First
Stage beginning carries out feature extraction operation to VideoGIS key frame library first, generates higher-dimension real-valued, to construct one
Property data base;When carrying out VideoGIS data retrieval, feature extraction operation is carried out to VideoGIS frame image to be retrieved, it is raw
At feature to be retrieved.
Wherein, the depth convolutional neural networks model specifically includes:
First segment:Including 2 convolutional layers and a pond layer, input as 224 × 224 × 3 image datas, by 64 mistakes
Filter, window size be 3*3 convolutional layer handle, then carry out ReLU activation primitive processing, output be characterized as 224 × 224 ×
64, the core of maximum pond 2*2 is carried out by pond layer, step-length 2 obtains 112 × 112 × 64 data;
Second segment:Including 2 convolutional layers and a pond layer, input data 112 × 112 × 64 is filtered by 128
Device, window size be 3*3 convolutional layer handle, then carry out ReLU activation primitive processing, output be characterized as 112 × 112 ×
128, the core of maximum pond 2*2 is carried out by pond layer, step-length 2 obtains 56 × 56 × 128 data;
Third section:Including 3 convolutional layers and a pond layer, input data 56 × 56 × 128, by 256 filters,
The convolutional layer that window size is 3*3 is handled, and then carries out ReLU activation primitive processing, and output is characterized as 56 × 56 × 256, passes through
Pond layer carries out the core of maximum pond 2*2, and step-length 2 obtains 28 × 28 × 256 data;
4th section:Including 3 convolutional layers and a pond layer, input data 28 × 28 × 256, by 512 filters,
The convolutional layer that window size is 3*3 is handled, and then carries out ReLU activation primitive processing, and output is characterized as 28 × 28 × 512, passes through
Pond layer carries out the core of maximum pond 2*2, and step-length 2 obtains 14 × 14 × 512 data;
5th section:Including 3 convolutional layers and a pond layer, input data 14 × 14 × 512, by 512 filters,
The convolutional layer that window size is 3*3 is handled, and then carries out ReLU activation primitive processing, and output is characterized as 14 × 14 × 512, passes through
Pond layer carries out the core of maximum pond 2*2, and step-length 2 obtains 7 × 7 × 512 data;
6th section:Input data 7 × 7 × 512, it is complete to connect, 4096 features are obtained, are then carried out at ReLU activation primitives
Reason, output are characterized as 4096, by Dropout processing, finally obtain 4096 data;
7th section:Input data 4096, it is complete to connect, 4096 features are obtained, ReLU activation primitive processing is then carried out, it is defeated
Go out to be characterized as 4096, by Dropout processing, finally obtains 4096 data;
8th section:Input data 4096, it is complete to connect, obtain 1000 characteristics.
C. layering retrieval
Retrieving is divided into two levels, and coarse search and essence are retrieved.First layer is carried out with hash method and Hamming distance
Coarse search;The second layer is filtered the result of first layer coarse search, realize the VideoGIS frame image from candidate pool preceding m
Essence retrieval.
1) a kind of coarse search with hash method and Hamming distance
In order to carry out efficient VideoGIS data retrieval, first by this model learning to high dimensional feature vector be converted to
Then two-value code uses the similitude between Hamming distance measurement two-value code, obtains the candidate pool of candidate similar key frame.
In order to learn to obtain character representation simultaneously and obtain one group of hash function, in the good convolutional neural networks of pre-training
Between 7th section and the 8th section, it is inserted into a new full articulamentum, (S types grow bent this layer using sigmoid activation primitives
Line) feature vector of the 7th section of model output is converted to two-value code;Wherein, the initial parameter of depth convolutional neural networks be from
Training is obtained on ImageNet data sets, and for new full articulamentum initial parameter, using the side of random projection transforms
Formula builds cryptographic Hash;
For VideoGIS frame to be retrieved, what is extracted first is the feature of the output of new full articulamentum, by activation
Threshold value binarization after obtain two-value code, wherein threshold value is 0.5;Finally by the two-value code of VideoGIS frame to be retrieved and spy
The Hamming distance between two-value code in sign database is less than those of given threshold value VideoGIS frame image and is put into candidate pool
In.
2) the preceding m essence retrieval of a kind of VideoGIS frame image from candidate pool
In coarse search, the Hamming distance between two-value Hash codes is less than those of threshold value VideoGIS frame image and is put into
Into candidate pool, more accurately retrieval result in order to obtain, further using the method precisely retrieved on the basis of coarse search.
For the candidate pool image obtained in VideoGIS frame image to be retrieved and coarse search, according to from convolutional Neural net
The feature of 7th section of extraction of network, specifically calculates the similarity between them with Euclidean distance, is regarded from candidate pool to determine
The preceding m retrieval result of frequency GIS frame images.Euclidean distance is smaller, and the similarity of two images is higher, before thus determining
M similar retrieval results.
Compared with the existing technology, the VideoGIS data retrieval method based on deep learning provided in the present invention, is adopted
Key frame is extracted with the frame difference of VideoGIS frame so that effectiveness of retrieval greatly improves;Using depth convolutional neural networks mould
Type is trained, and extracts higher level character representation;Meanwhile it being carried under conditions of ensureing precision using the thought of two-value Hash
The high speed of retrieval so that retrieval time and storage overhead are greatly reduced.
The above is only a preferred embodiment of the present invention, it should be pointed out that:For the ordinary skill people of the art
For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered
It is considered as protection scope of the present invention.
Claims (6)
1. a kind of VideoGIS data retrieval method based on deep learning, which is characterized in that include the following steps:
A. key-frame extraction
In the case where carrying out room and time sampling to VideoGIS data, the Euclidean distance of VideoGIS frame frame difference is calculated, and to video
Camera lens carries out key-frame extraction;
B. depth characteristic is extracted
The depth convolutional neural networks model being alternately made of convolutional layer, active coating and pond layer is established, to the VideoGIS of input
Frame image is mapped layer by layer, is obtained each layer for the different representation of VideoGIS frame image, is realized VideoGIS frame image
Depth characteristic indicate;
C. layering retrieval
The retrieving includes that coarse search and essence are retrieved:The higher-dimension that first layer arrives depth convolutional neural networks model learning is special
Sign vector is converted to two-value code, then uses the similitude between Hamming distance measurement two-value code, obtains candidate similar key frame
Candidate pool;The second layer measures VideoGIS frame image to be retrieved and the VideoGIS frame image in candidate pool with Euclidean distance
Similitude between them, m similar retrieval results before finally obtaining.
2. a kind of VideoGIS data retrieval method based on deep learning according to claim 1, which is characterized in that institute
A. key-frame extractions are stated to specifically include:
Input:Video lens V={ V1, V2... Vn, the crucial frame number of selection:K;
Output:The key frame of video;
A1. the frame that adjacent key frame is calculated using Euclidean distance is poor, and cyclic variable i is from 1 to n-2 for setting, and n indicates the total of camera lens
Frame number;
A2. as i=n-2, indicate that all VideoGIS frames of camera lens have stepped through end, output VideoGIS frame difference it is European away from
From otherwise end loop continues to execute a1;
A3. extreme value, maximum value, minimum value and the median of frame difference Euclidean distance are calculated;
If a4. extreme value>Median then filters out extreme value, otherwise deletes the extreme point less than or equal to median;
If the number of the extreme point for the crucial frame number K > screenings a5. chosen, chooses the extreme value of screening as key frame, otherwise,
Preceding K frames are as key frame in the extreme value of selection screening.
3. a kind of VideoGIS data retrieval method based on deep learning according to claim 1, which is characterized in that institute
The extraction of b. depth characteristics is stated to specifically include:
B1. the size of the preceding unified image of training:It is using the method for centerCrop that picture size is unified to 224*224, i.e., first
224 proportionality coefficient is zoomed to according to minimum edge, and carries out whole scaling, and then long side is distinguished on the basis of center to both sides
Isometric cutting is done, 224 size is retained;
B2. depth convolutional neural networks model is established:Including 5 sections of convolution sums, 3 full articulamentums, there is 2-3 volume in every section of convolution
Lamination, while every section of convolution tail portion connects a maximum pond layer to reduce the size of picture;Each convolutional layer has the filtering of 3*3
Then device uses activation primitive ReLU, completes nonlinear transformation by activation primitive, enhance learning ability of this model to feature;
B3. loss function and optimization method:After above-mentioned model construction, need to train the model, wherein selecting
Categorical_crossentropy loss functions carry out parameter optimization to minimize loss letter by stochastic gradient descent method
Number, wherein learning rate are 0.1, and attenuation term 1e-6, momentum 0.9 uses nesterov Optimal gradient optimization algorithms;
B4. it is based on model extraction feature:When extracting feature, a unified size is scaled the images to by b1., and will figure
It is calculated as inputting in above-mentioned model, while training convolutional neural networks, finally obtains the feature vector of higher-dimension;It is initializing
Stage carries out feature extraction operation to VideoGIS key frame library first, generates higher-dimension real-valued, to one feature of construction
Database;When carrying out VideoGIS data retrieval, feature extraction operation is carried out to VideoGIS frame image to be retrieved, generation waits for
Retrieval character.
4. a kind of VideoGIS data retrieval method based on deep learning according to claim 3, which is characterized in that institute
Depth convolutional neural networks model is stated to specifically include:
First segment:Including 2 convolutional layers and a pond layer, inputs as 224 × 224 × 3 image datas, filtered by 64
Device, the convolutional layer that window size is 3*3 are handled, and then carry out ReLU activation primitive processing, and output is characterized as 224 × 224 × 64,
The core of maximum pond 2*2 is carried out by pond layer, step-length 2 obtains 112 × 112 × 64 data;
Second segment:Including 2 convolutional layers and a pond layer, input data 112 × 112 × 64, by 128 filters, windows
The convolutional layer that mouth size is 3*3 is handled, and then carries out ReLU activation primitive processing, and output is passed through characterized by 112 × 112 × 128
Pond layer carries out the core of maximum pond 2*2, and step-length 2 obtains 56 × 56 × 128 data;
Third section:Including 3 convolutional layers and a pond layer, input data 56 × 56 × 128, by 256 filters, windows
The convolutional layer that size is 3*3 is handled, and then carries out ReLU activation primitive processing, output is characterized as 56 × 56 × 256, by pond
Layer carries out the core of maximum pond 2*2, and step-length 2 obtains 28 × 28 × 256 data;
4th section:Including 3 convolutional layers and a pond layer, input data 28 × 28 × 256, by 512 filters, windows
The convolutional layer that size is 3*3 is handled, and then carries out ReLU activation primitive processing, output is characterized as 28 × 28 × 512, by pond
Layer carries out the core of maximum pond 2*2, and step-length 2 obtains 14 × 14 × 512 data;
5th section:Including 3 convolutional layers and a pond layer, input data 14 × 14 × 512, by 512 filters, windows
The convolutional layer that size is 3*3 is handled, and then carries out ReLU activation primitive processing, output is characterized as 14 × 14 × 512, by pond
Layer carries out the core of maximum pond 2*2, and step-length 2 obtains 7 × 7 × 512 data;
6th section:Input data 7 × 7 × 512, it is complete to connect, 4096 features are obtained, ReLU activation primitive processing is then carried out,
Output is characterized as 4096, by Dropout processing, finally obtains 4096 data;
7th section:Input data 4096, it is complete to connect, 4096 features are obtained, ReLU activation primitive processing is then carried out, output is special
Sign is 4096, by Dropout processing, finally obtains 4096 data;
8th section:Input data 4096, it is complete to connect, obtain 1000 characteristics.
5. a kind of VideoGIS data retrieval method based on deep learning according to claim 4, which is characterized in that institute
First layer coarse search is stated to specifically include:
Between the 7th section and the 8th section of the good depth convolutional neural networks model of pre-training, it is inserted into a new full connection
The feature vector of the 7th section of output of model is converted to two-value code by layer, this layer using sigmoid activation primitives;Wherein, depth
The initial parameter of convolutional neural networks is trained obtained from ImageNet data sets, and initial for new full articulamentum
Parameter builds cryptographic Hash by the way of random projection transforms;
For VideoGIS frame to be retrieved, what is extracted first is the feature of the output of new full articulamentum, passes through the threshold to activation
Two-value code is obtained after value binarization;Finally by the two-value code in the two-value code of VideoGIS frame to be retrieved and property data base
Between Hamming distance be less than those of given threshold value VideoGIS frame image and be put into candidate pool.
6. a kind of VideoGIS data retrieval method based on deep learning according to claim 5, which is characterized in that institute
The retrieval of second layer essence is stated to specifically include:
For the candidate pool image obtained in VideoGIS frame image to be retrieved and coarse search, according to from convolutional neural networks
The feature of 7th section of extraction, specifically calculates the similarity between them, to determine the VideoGIS from candidate pool with Euclidean distance
The preceding m retrieval result of frame image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810162847.8A CN108280233A (en) | 2018-02-26 | 2018-02-26 | A kind of VideoGIS data retrieval method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810162847.8A CN108280233A (en) | 2018-02-26 | 2018-02-26 | A kind of VideoGIS data retrieval method based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108280233A true CN108280233A (en) | 2018-07-13 |
Family
ID=62808720
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810162847.8A Pending CN108280233A (en) | 2018-02-26 | 2018-02-26 | A kind of VideoGIS data retrieval method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108280233A (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108920720A (en) * | 2018-07-30 | 2018-11-30 | 电子科技大学 | The large-scale image search method accelerated based on depth Hash and GPU |
CN108985457A (en) * | 2018-08-22 | 2018-12-11 | 北京大学 | A kind of deep neural network construction design method inspired by optimization algorithm |
CN109492129A (en) * | 2018-10-26 | 2019-03-19 | 武汉理工大学 | A kind of similar video searching method and system based on double-current neural network |
CN109753582A (en) * | 2018-12-27 | 2019-05-14 | 西北工业大学 | The method of magnanimity photoelectricity ship images quick-searching based on Web and database |
CN109783691A (en) * | 2018-12-29 | 2019-05-21 | 四川远鉴科技有限公司 | A kind of video retrieval method of deep learning and Hash coding |
CN110110113A (en) * | 2019-05-20 | 2019-08-09 | 重庆紫光华山智安科技有限公司 | Image search method, system and electronic device |
CN110163061A (en) * | 2018-11-14 | 2019-08-23 | 腾讯科技(深圳)有限公司 | For extracting the method, apparatus, equipment and computer-readable medium of video finger print |
CN110221979A (en) * | 2019-06-04 | 2019-09-10 | 广州虎牙信息科技有限公司 | Performance test methods, device, equipment and the storage medium of application program |
CN110717068A (en) * | 2019-08-27 | 2020-01-21 | 中山大学 | Video retrieval method based on deep learning |
CN111078993A (en) * | 2019-09-24 | 2020-04-28 | 上海依图网络科技有限公司 | Method and system for improving retrieval recall rate through extended query |
CN111382287A (en) * | 2018-12-30 | 2020-07-07 | 浙江宇视科技有限公司 | Picture searching method and device, storage medium and electronic equipment |
WO2020147857A1 (en) * | 2019-01-18 | 2020-07-23 | 上海极链网络科技有限公司 | Method and system for extracting, storing and retrieving mass video features |
CN111767204A (en) * | 2019-04-02 | 2020-10-13 | 杭州海康威视数字技术股份有限公司 | Overflow risk detection method, device and equipment |
CN112528077A (en) * | 2020-11-10 | 2021-03-19 | 山东大学 | Video face retrieval method and system based on video embedding |
CN113032372A (en) * | 2021-05-24 | 2021-06-25 | 南京北斗创新应用科技研究院有限公司 | ClickHouse database-based space big data management method |
CN113297899A (en) * | 2021-03-23 | 2021-08-24 | 上海理工大学 | Video hash algorithm based on deep learning |
CN117011766A (en) * | 2023-07-26 | 2023-11-07 | 中国信息通信研究院 | Artificial intelligence detection method and system based on intra-frame differentiation |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104156464A (en) * | 2014-08-20 | 2014-11-19 | 中国科学院重庆绿色智能技术研究院 | Micro-video retrieval method and device based on micro-video feature database |
CN105718890A (en) * | 2016-01-22 | 2016-06-29 | 北京大学 | Method for detecting specific videos based on convolution neural network |
CN106227851A (en) * | 2016-07-29 | 2016-12-14 | 汤平 | Based on the image search method searched for by depth of seam division that degree of depth convolutional neural networks is end-to-end |
-
2018
- 2018-02-26 CN CN201810162847.8A patent/CN108280233A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104156464A (en) * | 2014-08-20 | 2014-11-19 | 中国科学院重庆绿色智能技术研究院 | Micro-video retrieval method and device based on micro-video feature database |
CN105718890A (en) * | 2016-01-22 | 2016-06-29 | 北京大学 | Method for detecting specific videos based on convolution neural network |
CN106227851A (en) * | 2016-07-29 | 2016-12-14 | 汤平 | Based on the image search method searched for by depth of seam division that degree of depth convolutional neural networks is end-to-end |
Non-Patent Citations (1)
Title |
---|
HAIHONG DAI等: "VideoGIS Data Retrieval Based on Multi-feature Fusion", 《2017 12TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND KNOWLEDGE ENGINEERING (ISKE)》 * |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108920720A (en) * | 2018-07-30 | 2018-11-30 | 电子科技大学 | The large-scale image search method accelerated based on depth Hash and GPU |
CN108985457A (en) * | 2018-08-22 | 2018-12-11 | 北京大学 | A kind of deep neural network construction design method inspired by optimization algorithm |
CN108985457B (en) * | 2018-08-22 | 2021-11-19 | 北京大学 | Deep neural network structure design method inspired by optimization algorithm |
CN109492129B (en) * | 2018-10-26 | 2020-08-07 | 武汉理工大学 | Similar video searching method and system based on double-flow neural network |
CN109492129A (en) * | 2018-10-26 | 2019-03-19 | 武汉理工大学 | A kind of similar video searching method and system based on double-current neural network |
CN110163061B (en) * | 2018-11-14 | 2023-04-07 | 腾讯科技(深圳)有限公司 | Method, apparatus, device and computer readable medium for extracting video fingerprint |
CN110163061A (en) * | 2018-11-14 | 2019-08-23 | 腾讯科技(深圳)有限公司 | For extracting the method, apparatus, equipment and computer-readable medium of video finger print |
CN109753582A (en) * | 2018-12-27 | 2019-05-14 | 西北工业大学 | The method of magnanimity photoelectricity ship images quick-searching based on Web and database |
CN109783691A (en) * | 2018-12-29 | 2019-05-21 | 四川远鉴科技有限公司 | A kind of video retrieval method of deep learning and Hash coding |
CN111382287A (en) * | 2018-12-30 | 2020-07-07 | 浙江宇视科技有限公司 | Picture searching method and device, storage medium and electronic equipment |
WO2020147857A1 (en) * | 2019-01-18 | 2020-07-23 | 上海极链网络科技有限公司 | Method and system for extracting, storing and retrieving mass video features |
CN111767204A (en) * | 2019-04-02 | 2020-10-13 | 杭州海康威视数字技术股份有限公司 | Overflow risk detection method, device and equipment |
CN111767204B (en) * | 2019-04-02 | 2024-05-28 | 杭州海康威视数字技术股份有限公司 | Spill risk detection method, device and equipment |
CN110110113A (en) * | 2019-05-20 | 2019-08-09 | 重庆紫光华山智安科技有限公司 | Image search method, system and electronic device |
CN110221979A (en) * | 2019-06-04 | 2019-09-10 | 广州虎牙信息科技有限公司 | Performance test methods, device, equipment and the storage medium of application program |
CN110717068A (en) * | 2019-08-27 | 2020-01-21 | 中山大学 | Video retrieval method based on deep learning |
CN110717068B (en) * | 2019-08-27 | 2023-04-18 | 中山大学 | Video retrieval method based on deep learning |
CN111078993A (en) * | 2019-09-24 | 2020-04-28 | 上海依图网络科技有限公司 | Method and system for improving retrieval recall rate through extended query |
CN112528077A (en) * | 2020-11-10 | 2021-03-19 | 山东大学 | Video face retrieval method and system based on video embedding |
CN112528077B (en) * | 2020-11-10 | 2022-12-16 | 山东大学 | Video face retrieval method and system based on video embedding |
CN113297899A (en) * | 2021-03-23 | 2021-08-24 | 上海理工大学 | Video hash algorithm based on deep learning |
CN113032372B (en) * | 2021-05-24 | 2021-09-28 | 南京北斗创新应用科技研究院有限公司 | ClickHouse database-based space big data management method |
CN113032372A (en) * | 2021-05-24 | 2021-06-25 | 南京北斗创新应用科技研究院有限公司 | ClickHouse database-based space big data management method |
CN117011766A (en) * | 2023-07-26 | 2023-11-07 | 中国信息通信研究院 | Artificial intelligence detection method and system based on intra-frame differentiation |
CN117011766B (en) * | 2023-07-26 | 2024-02-13 | 中国信息通信研究院 | Artificial intelligence detection method and system based on intra-frame differentiation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108280233A (en) | A kind of VideoGIS data retrieval method based on deep learning | |
CN107506740B (en) | Human body behavior identification method based on three-dimensional convolutional neural network and transfer learning model | |
CN113254648B (en) | Text emotion analysis method based on multilevel graph pooling | |
CN110378334B (en) | Natural scene text recognition method based on two-dimensional feature attention mechanism | |
CN107330364B (en) | A kind of people counting method and system based on cGAN network | |
CN106407352B (en) | Traffic image search method based on deep learning | |
CN108171701B (en) | Significance detection method based on U network and counterstudy | |
CN110134946B (en) | Machine reading understanding method for complex data | |
CN108615036A (en) | A kind of natural scene text recognition method based on convolution attention network | |
CN112949647B (en) | Three-dimensional scene description method and device, electronic equipment and storage medium | |
CN108122035B (en) | End-to-end modeling method and system | |
CN107527318A (en) | A kind of hair style replacing options based on generation confrontation type network model | |
CN107066973A (en) | A kind of video content description method of utilization spatio-temporal attention model | |
CN109543722A (en) | A kind of emotion trend forecasting method based on sentiment analysis model | |
CN106960206A (en) | Character identifying method and character recognition system | |
CN110263659A (en) | A kind of finger vein identification method and system based on triple loss and lightweight network | |
CN104951554B (en) | It is that landscape shines the method for mixing the verse for meeting its artistic conception | |
CN108549658A (en) | A kind of deep learning video answering method and system based on the upper attention mechanism of syntactic analysis tree | |
CN109840322A (en) | It is a kind of based on intensified learning cloze test type reading understand analysis model and method | |
CN104239420A (en) | Video fingerprinting-based video similarity matching method | |
CN107169106A (en) | Video retrieval method, device, storage medium and processor | |
Fu et al. | Machine learning techniques for ontology-based leaf classification | |
CN109886072A (en) | Face character categorizing system based on two-way Ladder structure | |
CN107767416A (en) | The recognition methods of pedestrian's direction in a kind of low-resolution image | |
CN107480723A (en) | Texture Recognition based on partial binary threshold learning network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180713 |
|
RJ01 | Rejection of invention patent application after publication |