CN108280233A - A kind of VideoGIS data retrieval method based on deep learning - Google Patents

A kind of VideoGIS data retrieval method based on deep learning Download PDF

Info

Publication number
CN108280233A
CN108280233A CN201810162847.8A CN201810162847A CN108280233A CN 108280233 A CN108280233 A CN 108280233A CN 201810162847 A CN201810162847 A CN 201810162847A CN 108280233 A CN108280233 A CN 108280233A
Authority
CN
China
Prior art keywords
videogis
frame
layer
data
retrieval
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810162847.8A
Other languages
Chinese (zh)
Inventor
邹志强
戴海宏
吴家皋
何旭
熊俊杰
索玉聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN201810162847.8A priority Critical patent/CN108280233A/en
Publication of CN108280233A publication Critical patent/CN108280233A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/735Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Library & Information Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of VideoGIS data retrieval method based on deep learning, including:First in the case where carrying out room and time sampling to VideoGIS data, the Euclidean distance of VideoGIS frame frame difference is calculated, and key-frame extraction is carried out to video lens;Then the depth convolutional neural networks model being alternately made of convolutional layer, active coating and pond layer is established, the VideoGIS frame image of input is mapped layer by layer, realizes that the depth characteristic of VideoGIS frame image indicates;Finally carry out layering retrieval:First layer is to carry out coarse search with hash method and Hamming distance;The second layer is filtered the result of first layer coarse search, realizes the preceding m essence retrieval of the VideoGIS frame image from candidate pool.The present invention extracts key frame using frame difference Euclidean distance so that effectiveness of retrieval greatly improves, and is trained using depth convolutional neural networks model, extracts higher level character representation so that retrieval time and storage overhead are greatly reduced.

Description

A kind of VideoGIS data retrieval method based on deep learning
Technical field
The present invention relates to a kind of VideoGIS (Geographic Information System, ground based on deep learning Manage information system) data retrieval method, belong to technical field of computer vision.
Background technology
VideoGIS is that geographical video merges a kind of new video generated with GIS, and the retrieval of the video is to governability and people People's livelihood work brings huge facility.With the lasting enhancing of application breadth and depth, VideoGIS related industry has become newly Industry growth point.Meanwhile the raising that the development and city security protection built with smart city require, it is how big from VideoGIS The data needed for user are accurately found and obtained in data faces a series of bottleneck problems.On the one hand we have had accumulated flood tide VideoGIS data, and also continuing to throw huge fund creation data, on the other hand, multitude of video GIS data is limited by huge body Amount and the effective analysis of shortage, limit the breadth and depth of its application.Therefore, to these data be subject to analysis and utilization become for How key quickly and effectively retrieves oneself required data from these VideoGIS data and becomes and study recently Hot spot.
Traditional video frequency searching mode is video frequency searching and content based video retrieval system based on text key word (Content-Based Video Retrieval, CBVR).Since descriptive power is limited, subjectivity is strong and heavy workload etc. is former Cause, it is helpless for above-mentioned typical case based on the video frequency searching of text key word, the inspection of VideoGIS data depth cannot be met The demand of rope.Content based video retrieval system (CBVR) is exactly according to content input by user (image etc.), in video database In retrieve same or similar video clip or the process of key frame.In content based video retrieval system, retrieval Object be often no longer limited to video data itself, but based on video " content " description data, such as color characteristic and Textural characteristics.
Video frequency searching is generally divided into two video pre-filtering, feature extraction steps.Video pre-filtering it is the most key be close The extraction of key frame.Key frame is the characteristics of image for the key content for describing a video lens, and face can be extracted from key frame The low-level image features such as color, texture, shape, using the data source as video frequency abstract and database index.If extracting each frame of video, Data volume is huge, and there is the video frame repeated with redundancy, therefore the extraction of key frame is very heavy to establishing video index It wants.
In terms of feature extraction, traditional video frequency searching feature extraction algorithm (color characteristic, textural characteristics and shape feature Deng) very high domain knowledge is needed to the description of feature, and deep learning simulates the structure of human brain, utilizes convolutional Neural net The basic structures such as convolutional layer, pond layer and the full articulamentum of network, so that it may to allow network structure oneself to learn and extraction correlated characteristic. Therefore, can be had to VideoGIS image using deep learning extraction feature and more accurately describe degree so that VideoGIS number It is substantially reduced according to the range of retrieval, to reach accurate and quickly retrieve purpose.
In the prior art in order to efficiently indicate video frequency feature data, the method or two of real number character representation may be used It is worth the method for Hash coded representation.Method based on real number character representation refers to the real number feature vector for extracting video frame images It as expression, but takes since this representation method is comparable in retrieval and accounts for memory space, cannot meet extensive VideoGIS data retrieval demand;Method based on two-value Hash coded representation be by video frame images binary-coding to Amount carrys out coded representation, and compared to the method using real number character representation, under the expression of equal length, memory space significantly subtracts It is few.For example, in luv space, if as soon as a video feature vector accounts for 1024 bytes, then 100,000,000 video features need The memory space of 100G is wanted, and if the Hash coded representation of 128 bits of each video features, all video Hash Memory space only needs 1.6G.Meanwhile similar video frame images have similar two-value code, are then measured using Hamming distance Similitude between two-value code, speed are quite fast.
Invention content
Goal of the invention:In order to overcome the deficiencies in the prior art, the present invention to provide a kind of regarding based on deep learning Frequency GIS data search method is consumed with solving to be difficult to obtain in VideoGIS data retrieval accurately retrieval result, memory space Greatly, the slow problem of retrieval rate.
Realize foregoing invention content, it is necessary to solve several key problems:(1) it is directed to exist in VideoGIS library and repeat The problem of with the VideoGIS frame of redundancy, designs a kind of efficient extraction method of key frame;(2) it is directed in the prior art to video The not strong problem of GIS characteristics of the underlying image ability to express is realized using deep learning method based on depth convolutional neural networks Feature extraction algorithm;(3) the problem of being directed to retrieval rate is designed a kind of VideoGIS data retrieval method of layering retrieval, is being examined Suo Sudu, precision etc. meet the Search Requirement of extensive VideoGIS data.
Technical solution:To achieve the above object, the technical solution adopted by the present invention is:
A kind of VideoGIS data retrieval method based on deep learning, which is characterized in that include the following steps:
A. key-frame extraction
In order to ensure validity (i.e. the quantity of key frame is enough to represent video lens) and the inspection of VideoGIS data of key frame The efficiency of rope, and the time response of video is reflected, the present invention calculates video in the case where carrying out room and time sampling to video The Euclidean distance of GIS frames frame difference, and key-frame extraction is carried out to video lens;
The calculating of frame difference uses Euclidean distance wherein between consecutive frame, under normal circumstances, the VideoGIS in same camera lens Frame difference fluctuates above and below average value between frame, and changes smaller.It is assumed that frame difference between consecutive frame be (D1, D2 ... ..., Dn-1, N indicates the totalframes of camera lens), and VideoGIS frame is coloured image, needs to convert thereof into gray level image, it is assumed that the frame of conversion It is (X [1], X [2] ..., X [n]), then formula 1 is the frame difference calculation formula between all VideoGIS frames in camera lens.
It needs to specialize, since VideoGIS data are high-definition datas, key frame pixel is relatively high, causes follow-up The key point obtained when extraction is excessive, and characteristic matching speed is slow, influences VideoGIS data search efficiency, therefore the present invention is protecting Before depositing key frame, sampling processing has been carried out to camera lens, pass is reduced in the case that ensureing that key frame information is complete as far as possible The pixel of key frame.
B. depth characteristic is extracted
Establish the depth convolutional neural networks model being alternately made of convolutional layer, active coating and pond layer, the video of input GIS frames image is mapped layer by layer in a network, obtains each layer for the different representation of VideoGIS frame image, realization regards The depth characteristic of frequency GIS frame images indicates;
C. layering retrieval
The retrieving includes that coarse search and essence are retrieved:The high dimensional feature vector that depth network model is learnt first It is converted to two-value code, the similitude between Hamming distance measurement two-value code is then used, obtains the candidate of candidate similar key frame Pond;Then between the VideoGIS frame image in VideoGIS frame image to be retrieved and candidate pool being measured them with Euclidean distance Similitude, m similar retrieval results before finally obtaining.
Further, a. key-frame extractions specifically include:
Input:Video lens V={ V1, V2... Vn, the crucial frame number of selection:K;
Output:The key frame of video;
A1. the frame that adjacent key frame is calculated using Euclidean distance is poor, and cyclic variable i is from 1 to n-2 for setting, and n indicates camera lens Totalframes;
A2. it as i=n-2, indicates that all VideoGIS frames of camera lens have stepped through end, exports the Europe of VideoGIS frame difference Otherwise formula distance, end loop continue to execute a1;
A3. extreme value, maximum value, minimum value and the median of frame difference Euclidean distance are calculated;
If a4. extreme value>Median then filters out extreme value, otherwise deletes the extreme point less than or equal to median;
If the number of the extreme point for the crucial frame number K > screenings a5. chosen, the extreme value of screening is chosen as key frame, Otherwise, preceding K frames are chosen in the extreme value of screening as key frame.
Further, the b. depth characteristics extraction specifically includes:
B1. the size of the preceding unified image of training:Size is uniformly arrived using the method placed in the middle for cutting (centerCrop) 224*224, i.e., first zoom to 224 proportionality coefficient according to minimum edge, then carries out whole scaling, is then with center to long side Benchmark does isometric cutting to both sides respectively, retains 224 length, can ensure that image is indeformable while protruding image substantially in this way Main body;
B2. depth convolutional neural networks model is established:Including 5 sections of convolution sums, 3 full articulamentums, there is 2-3 in every section of convolution A convolutional layer, while every section of convolution tail portion can connect a maximum pond layer to reduce the size of picture;Each convolutional layer has 3*3 Filter, then use activation primitive be correct linear unit (Rectified Linear Unit, ReLU), by activation letter Nonlinear transformation is counted up into, learning ability of this model to feature is enhanced;
B3. loss function and optimization method:After above-mentioned model construction, it would be desirable to training pattern, wherein loss function The logarithm of multiclass is selected to lose (categorical_crossentropy) function, carrying out parameter by stochastic gradient descent method seeks It is excellent to minimize loss function, wherein learning rate is 0.1, and attenuation term 1e-6, momentum 0.9 uses newton momentum (nesterov) Optimal gradient optimization algorithm;
B4. it is based on model extraction feature:When extracting feature, a unified size is scaled the images to by b1., and Image is inputted in above-mentioned model and is calculated, while training convolutional neural networks, finally obtains the feature vector of higher-dimension;First Stage beginning carries out feature extraction operation to VideoGIS key frame library first, generates higher-dimension real-valued, to construct one Property data base;When carrying out VideoGIS data retrieval, feature extraction operation is carried out to VideoGIS frame image to be retrieved, it is raw At feature to be retrieved.
Further, the depth convolutional neural networks specifically include:
First segment:Including 2 convolutional layers and a pond layer, input as 224 × 224 × 3 image datas, by 64 mistakes Filter, window size be 3*3 convolutional layer handle, then carry out ReLU activation primitive processing, output be characterized as 224 × 224 × 64, the core of maximum pond 2*2 is carried out by pond layer, step-length 2 obtains 112 × 112 × 64 data;
Second segment:Including 2 convolutional layers and a pond layer, input data 112 × 112 × 64 is filtered by 128 Device, window size be 3*3 convolutional layer handle, then carry out ReLU activation primitive processing, output be characterized as 112 × 112 × 128, the core of maximum pond 2*2 is carried out by pond layer, step-length 2 obtains 56 × 56 × 128 data;
Third section:Including 3 convolutional layers and a pond layer, input data 56 × 56 × 128, by 256 filters, The convolutional layer that window size is 3*3 is handled, and then carries out ReLU activation primitive processing, and output is characterized as 56 × 56 × 256, passes through Pond layer carries out the core of maximum pond 2*2, and step-length 2 obtains 28 × 28 × 256 data;
4th section:Including 3 convolutional layers and a pond layer, input data 28 × 28 × 256, by 512 filters, The convolutional layer that window size is 3*3 is handled, and then carries out ReLU activation primitive processing, and output is characterized as 28 × 28 × 512, passes through Pond layer carries out the core of maximum pond 2*2, and step-length 2 obtains 14 × 14 × 512 data;
5th section:Including 3 convolutional layers and a pond layer, input data 14 × 14 × 512, by 512 filters, The convolutional layer that window size is 3*3 is handled, and then carries out ReLU activation primitive processing, and output is characterized as 14 × 14 × 512, passes through Pond layer carries out the core of maximum pond 2*2, and step-length 2 obtains 7 × 7 × 512 data;
6th section:Input data 7 × 7 × 512, it is complete to connect, 4096 features are obtained, are then carried out at ReLU activation primitives Reason, output are characterized as 4096, by Dropout processing (preventing model over-fitting), finally obtain 4096 data;
7th section:Input data 4096, it is complete to connect, 4096 features are obtained, ReLU activation primitive processing is then carried out, it is defeated Go out to be characterized as 4096, by Dropout processing, finally obtains 4096 data;
8th section:Input data 4096, it is complete to connect, obtain 1000 characteristics.
Further, the first layer coarse search specifically includes:
In order to carry out efficient VideoGIS data retrieval, the high dimensional feature vector learnt by depth network model is turned It is melted into two-value code, the similitude between Hamming distance measurement two-value code is then used, obtains the candidate pool of candidate similar key frame.
In order to learn to obtain character representation simultaneously and obtain one group of hash function, in the good convolutional neural networks of pre-training Between 7th section and the 8th section, it is inserted into a new full articulamentum, (S types grow bent this layer using sigmoid activation primitives Line) feature vector of the 7th section of model output is converted to two-value code;Wherein, the initial parameter of depth convolutional neural networks be from Training is obtained on ImageNet data sets (an existing image data base), and new full articulamentum is initially joined Number, cryptographic Hash is built by the way of random projection transforms;
For VideoGIS frame to be retrieved, what is extracted first is the feature of the output of new full articulamentum, by activation Threshold value binarization after obtain two-value code;Finally by two in the two-value code of VideoGIS frame to be retrieved and property data base Hamming distance between value code is less than those of given threshold value VideoGIS frame image and is put into candidate pool.
Further, the second layer essence retrieval specifically includes:
In coarse search, the Hamming distance between two-value Hash codes is less than those of given threshold value VideoGIS frame image It is put into candidate pool, in order to obtain more accurately retrieval result, further using precisely retrieving on the basis of coarse search Method.
For the candidate pool image obtained in VideoGIS frame image to be retrieved and coarse search, according to from convolutional Neural net The feature of 7th section of extraction of network, specifically calculates the similarity between them with Euclidean distance, is regarded from candidate pool to determine The preceding m retrieval result of frequency GIS frame images.Euclidean distance is smaller, and the similarity of two images is higher, before thus determining M similar retrieval results.
Advantageous effect:A kind of VideoGIS data retrieval method based on deep learning provided by the invention, relative to existing Technology has the following advantages:1, it due to the extraction method of key frame using the Euclidean distance based on frame difference, has well solved and has regarded There are problems that repeating the VideoGIS frame with redundancy in the libraries frequency GIS, reduces the occupancy of memory, accelerate the index of video;2、 Due to carrying out feature extraction using depth convolutional neural networks, such feature vector has more precisely VideoGIS frame image Description degree, have preferable experiment effect, realize the feature extraction of each key frame in extensive VideoGIS data;3, exist Layering retrieval when, the speed of retrieval is improved under conditions of ensureing precision using the thought of two-value Hash, accomplished not only soon but also Standard meets the Search Requirement of extensive VideoGIS data.
Description of the drawings
Fig. 1 is a kind of flow chart of the VideoGIS data retrieval method based on deep learning of the present invention;
Fig. 2 is the flow chart of a. key-frame extractions in the present invention;
Fig. 3 is the structure chart of depth convolutional neural networks model in the present invention.
Specific implementation mode
The present invention is further described below in conjunction with the accompanying drawings.
It is as shown in Figure 1 a kind of VideoGIS data retrieval method based on deep learning, mainly includes the following steps that:
A. key-frame extraction
For VideoGIS data, there are many information repeated with redundancy in VideoGIS data, if do not carried out to it pre- Processing, then VideoGIS data volume can be quite big, effectiveness of retrieval will substantially reduce.For example, can in VideoGIS data Static picture can be will appear, if each frame of extraction video, then there will be the VideoGIS frame of repetition or redundancy.Cause This, we need to pre-process VideoGIS data first, are split to camera lens, choose valuable information, and representative regards The main contents of frequency camera lens, i.e. key frame.
Simultaneously as VideoGIS data are high-definition datas, key frame pixel is relatively high, causes to obtain when subsequent extracted Key point is excessive, and characteristic matching speed is slow, influences VideoGIS data search efficiency, therefore the present invention is before preserving key frame, Sampling processing has been carried out to camera lens first, has ensured the complete picture for reducing key frame of key frame information as far as possible Element.In key-frame extraction, need colored VideoGIS frame image being converted to gray level image, then calculate neighbor frame difference it Between Euclidean distance, to obtain the key frame of VideoGIS data, structure key frame library.
Fig. 2 indicates the flow chart of key-frame extraction, is as follows:
Input:Video lens V=V1, V2 ... and Vn }, the crucial frame number of selection:K=5;
Output:The key frame of video;
A1. the frame that adjacent key frame is calculated using Euclidean distance is poor, and cyclic variable i is from 1 to n-2 for setting, and n indicates camera lens Totalframes;
A2. it as i=n-2, indicates that all VideoGIS frames of camera lens have stepped through end, exports the Europe of VideoGIS frame difference Otherwise formula distance, end loop continue to execute a1;
A3. extreme value, maximum value, minimum value and the median of frame difference Euclidean distance are calculated;
If a4. extreme value>Median then filters out extreme value, otherwise deletes the extreme point less than or equal to median;
If the number of the extreme point for the crucial frame number K > screenings a5. chosen, the extreme value of screening is chosen as key frame, Otherwise, preceding K frames are chosen in the extreme value of screening as key frame.
B. depth characteristic is extracted
Depth network has very strong feature abstraction ability, can extract the spy rich in semantic information to VideoGIS data Sign indicates.Therefore, identification is had more in order to make the Hash of acquisition encode, is extracted using depth characteristic, to obtain VideoGIS number According to depth characteristic indicate.
The present invention describes the feature of VideoGIS frame image with VGGNet (depth convolutional Neural) network architectures, and depth is special Sign extracting method is designed to 5 sections of convolution, including subsidiary pond layer and nonlinear activation layer, behind the last one convolutional layer Add a global pool layer to quantify to feature, as shown in Figure 3.It specifically includes:
B1. the size of the preceding unified image of training:It is using the method for centerCrop that size is unified to 224*224, i.e., first 224 proportionality coefficient is zoomed to according to minimum edge, then carries out whole scaling, then long side is divided on the basis of center to both sides Isometric cutting is not done, retains 224 size;
B2. depth convolutional neural networks model is established:Including 5 sections of convolution sums, 3 full articulamentums, there is 2-3 in every section of convolution A convolutional layer, while every section of convolution tail portion connects a maximum pond layer to reduce the size of picture;Each convolutional layer has 3*3's Then filter uses activation primitive ReLU, completes nonlinear transformation by activation primitive, enhance study energy of this model to feature Power;
B3. loss function and optimization method:After above-mentioned model construction, it would be desirable to training pattern, wherein selecting Categorical_crossentropy loss functions carry out parameter optimization to minimize loss letter by stochastic gradient descent method Number, wherein learning rate are 0.1, and attenuation term 1e-6, momentum 0.9 uses nesterov Optimal gradient optimization algorithms;
B4. it is based on model extraction feature:When extracting feature, a unified size is scaled the images to by b1., and Image is inputted in above-mentioned model and is calculated, while training convolutional neural networks, finally obtains the feature vector of higher-dimension;First Stage beginning carries out feature extraction operation to VideoGIS key frame library first, generates higher-dimension real-valued, to construct one Property data base;When carrying out VideoGIS data retrieval, feature extraction operation is carried out to VideoGIS frame image to be retrieved, it is raw At feature to be retrieved.
Wherein, the depth convolutional neural networks model specifically includes:
First segment:Including 2 convolutional layers and a pond layer, input as 224 × 224 × 3 image datas, by 64 mistakes Filter, window size be 3*3 convolutional layer handle, then carry out ReLU activation primitive processing, output be characterized as 224 × 224 × 64, the core of maximum pond 2*2 is carried out by pond layer, step-length 2 obtains 112 × 112 × 64 data;
Second segment:Including 2 convolutional layers and a pond layer, input data 112 × 112 × 64 is filtered by 128 Device, window size be 3*3 convolutional layer handle, then carry out ReLU activation primitive processing, output be characterized as 112 × 112 × 128, the core of maximum pond 2*2 is carried out by pond layer, step-length 2 obtains 56 × 56 × 128 data;
Third section:Including 3 convolutional layers and a pond layer, input data 56 × 56 × 128, by 256 filters, The convolutional layer that window size is 3*3 is handled, and then carries out ReLU activation primitive processing, and output is characterized as 56 × 56 × 256, passes through Pond layer carries out the core of maximum pond 2*2, and step-length 2 obtains 28 × 28 × 256 data;
4th section:Including 3 convolutional layers and a pond layer, input data 28 × 28 × 256, by 512 filters, The convolutional layer that window size is 3*3 is handled, and then carries out ReLU activation primitive processing, and output is characterized as 28 × 28 × 512, passes through Pond layer carries out the core of maximum pond 2*2, and step-length 2 obtains 14 × 14 × 512 data;
5th section:Including 3 convolutional layers and a pond layer, input data 14 × 14 × 512, by 512 filters, The convolutional layer that window size is 3*3 is handled, and then carries out ReLU activation primitive processing, and output is characterized as 14 × 14 × 512, passes through Pond layer carries out the core of maximum pond 2*2, and step-length 2 obtains 7 × 7 × 512 data;
6th section:Input data 7 × 7 × 512, it is complete to connect, 4096 features are obtained, are then carried out at ReLU activation primitives Reason, output are characterized as 4096, by Dropout processing, finally obtain 4096 data;
7th section:Input data 4096, it is complete to connect, 4096 features are obtained, ReLU activation primitive processing is then carried out, it is defeated Go out to be characterized as 4096, by Dropout processing, finally obtains 4096 data;
8th section:Input data 4096, it is complete to connect, obtain 1000 characteristics.
C. layering retrieval
Retrieving is divided into two levels, and coarse search and essence are retrieved.First layer is carried out with hash method and Hamming distance Coarse search;The second layer is filtered the result of first layer coarse search, realize the VideoGIS frame image from candidate pool preceding m Essence retrieval.
1) a kind of coarse search with hash method and Hamming distance
In order to carry out efficient VideoGIS data retrieval, first by this model learning to high dimensional feature vector be converted to Then two-value code uses the similitude between Hamming distance measurement two-value code, obtains the candidate pool of candidate similar key frame.
In order to learn to obtain character representation simultaneously and obtain one group of hash function, in the good convolutional neural networks of pre-training Between 7th section and the 8th section, it is inserted into a new full articulamentum, (S types grow bent this layer using sigmoid activation primitives Line) feature vector of the 7th section of model output is converted to two-value code;Wherein, the initial parameter of depth convolutional neural networks be from Training is obtained on ImageNet data sets, and for new full articulamentum initial parameter, using the side of random projection transforms Formula builds cryptographic Hash;
For VideoGIS frame to be retrieved, what is extracted first is the feature of the output of new full articulamentum, by activation Threshold value binarization after obtain two-value code, wherein threshold value is 0.5;Finally by the two-value code of VideoGIS frame to be retrieved and spy The Hamming distance between two-value code in sign database is less than those of given threshold value VideoGIS frame image and is put into candidate pool In.
2) the preceding m essence retrieval of a kind of VideoGIS frame image from candidate pool
In coarse search, the Hamming distance between two-value Hash codes is less than those of threshold value VideoGIS frame image and is put into Into candidate pool, more accurately retrieval result in order to obtain, further using the method precisely retrieved on the basis of coarse search.
For the candidate pool image obtained in VideoGIS frame image to be retrieved and coarse search, according to from convolutional Neural net The feature of 7th section of extraction of network, specifically calculates the similarity between them with Euclidean distance, is regarded from candidate pool to determine The preceding m retrieval result of frequency GIS frame images.Euclidean distance is smaller, and the similarity of two images is higher, before thus determining M similar retrieval results.
Compared with the existing technology, the VideoGIS data retrieval method based on deep learning provided in the present invention, is adopted Key frame is extracted with the frame difference of VideoGIS frame so that effectiveness of retrieval greatly improves;Using depth convolutional neural networks mould Type is trained, and extracts higher level character representation;Meanwhile it being carried under conditions of ensureing precision using the thought of two-value Hash The high speed of retrieval so that retrieval time and storage overhead are greatly reduced.
The above is only a preferred embodiment of the present invention, it should be pointed out that:For the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims (6)

1. a kind of VideoGIS data retrieval method based on deep learning, which is characterized in that include the following steps:
A. key-frame extraction
In the case where carrying out room and time sampling to VideoGIS data, the Euclidean distance of VideoGIS frame frame difference is calculated, and to video Camera lens carries out key-frame extraction;
B. depth characteristic is extracted
The depth convolutional neural networks model being alternately made of convolutional layer, active coating and pond layer is established, to the VideoGIS of input Frame image is mapped layer by layer, is obtained each layer for the different representation of VideoGIS frame image, is realized VideoGIS frame image Depth characteristic indicate;
C. layering retrieval
The retrieving includes that coarse search and essence are retrieved:The higher-dimension that first layer arrives depth convolutional neural networks model learning is special Sign vector is converted to two-value code, then uses the similitude between Hamming distance measurement two-value code, obtains candidate similar key frame Candidate pool;The second layer measures VideoGIS frame image to be retrieved and the VideoGIS frame image in candidate pool with Euclidean distance Similitude between them, m similar retrieval results before finally obtaining.
2. a kind of VideoGIS data retrieval method based on deep learning according to claim 1, which is characterized in that institute A. key-frame extractions are stated to specifically include:
Input:Video lens V={ V1, V2... Vn, the crucial frame number of selection:K;
Output:The key frame of video;
A1. the frame that adjacent key frame is calculated using Euclidean distance is poor, and cyclic variable i is from 1 to n-2 for setting, and n indicates the total of camera lens Frame number;
A2. as i=n-2, indicate that all VideoGIS frames of camera lens have stepped through end, output VideoGIS frame difference it is European away from From otherwise end loop continues to execute a1;
A3. extreme value, maximum value, minimum value and the median of frame difference Euclidean distance are calculated;
If a4. extreme value>Median then filters out extreme value, otherwise deletes the extreme point less than or equal to median;
If the number of the extreme point for the crucial frame number K > screenings a5. chosen, chooses the extreme value of screening as key frame, otherwise, Preceding K frames are as key frame in the extreme value of selection screening.
3. a kind of VideoGIS data retrieval method based on deep learning according to claim 1, which is characterized in that institute The extraction of b. depth characteristics is stated to specifically include:
B1. the size of the preceding unified image of training:It is using the method for centerCrop that picture size is unified to 224*224, i.e., first 224 proportionality coefficient is zoomed to according to minimum edge, and carries out whole scaling, and then long side is distinguished on the basis of center to both sides Isometric cutting is done, 224 size is retained;
B2. depth convolutional neural networks model is established:Including 5 sections of convolution sums, 3 full articulamentums, there is 2-3 volume in every section of convolution Lamination, while every section of convolution tail portion connects a maximum pond layer to reduce the size of picture;Each convolutional layer has the filtering of 3*3 Then device uses activation primitive ReLU, completes nonlinear transformation by activation primitive, enhance learning ability of this model to feature;
B3. loss function and optimization method:After above-mentioned model construction, need to train the model, wherein selecting Categorical_crossentropy loss functions carry out parameter optimization to minimize loss letter by stochastic gradient descent method Number, wherein learning rate are 0.1, and attenuation term 1e-6, momentum 0.9 uses nesterov Optimal gradient optimization algorithms;
B4. it is based on model extraction feature:When extracting feature, a unified size is scaled the images to by b1., and will figure It is calculated as inputting in above-mentioned model, while training convolutional neural networks, finally obtains the feature vector of higher-dimension;It is initializing Stage carries out feature extraction operation to VideoGIS key frame library first, generates higher-dimension real-valued, to one feature of construction Database;When carrying out VideoGIS data retrieval, feature extraction operation is carried out to VideoGIS frame image to be retrieved, generation waits for Retrieval character.
4. a kind of VideoGIS data retrieval method based on deep learning according to claim 3, which is characterized in that institute Depth convolutional neural networks model is stated to specifically include:
First segment:Including 2 convolutional layers and a pond layer, inputs as 224 × 224 × 3 image datas, filtered by 64 Device, the convolutional layer that window size is 3*3 are handled, and then carry out ReLU activation primitive processing, and output is characterized as 224 × 224 × 64, The core of maximum pond 2*2 is carried out by pond layer, step-length 2 obtains 112 × 112 × 64 data;
Second segment:Including 2 convolutional layers and a pond layer, input data 112 × 112 × 64, by 128 filters, windows The convolutional layer that mouth size is 3*3 is handled, and then carries out ReLU activation primitive processing, and output is passed through characterized by 112 × 112 × 128 Pond layer carries out the core of maximum pond 2*2, and step-length 2 obtains 56 × 56 × 128 data;
Third section:Including 3 convolutional layers and a pond layer, input data 56 × 56 × 128, by 256 filters, windows The convolutional layer that size is 3*3 is handled, and then carries out ReLU activation primitive processing, output is characterized as 56 × 56 × 256, by pond Layer carries out the core of maximum pond 2*2, and step-length 2 obtains 28 × 28 × 256 data;
4th section:Including 3 convolutional layers and a pond layer, input data 28 × 28 × 256, by 512 filters, windows The convolutional layer that size is 3*3 is handled, and then carries out ReLU activation primitive processing, output is characterized as 28 × 28 × 512, by pond Layer carries out the core of maximum pond 2*2, and step-length 2 obtains 14 × 14 × 512 data;
5th section:Including 3 convolutional layers and a pond layer, input data 14 × 14 × 512, by 512 filters, windows The convolutional layer that size is 3*3 is handled, and then carries out ReLU activation primitive processing, output is characterized as 14 × 14 × 512, by pond Layer carries out the core of maximum pond 2*2, and step-length 2 obtains 7 × 7 × 512 data;
6th section:Input data 7 × 7 × 512, it is complete to connect, 4096 features are obtained, ReLU activation primitive processing is then carried out, Output is characterized as 4096, by Dropout processing, finally obtains 4096 data;
7th section:Input data 4096, it is complete to connect, 4096 features are obtained, ReLU activation primitive processing is then carried out, output is special Sign is 4096, by Dropout processing, finally obtains 4096 data;
8th section:Input data 4096, it is complete to connect, obtain 1000 characteristics.
5. a kind of VideoGIS data retrieval method based on deep learning according to claim 4, which is characterized in that institute First layer coarse search is stated to specifically include:
Between the 7th section and the 8th section of the good depth convolutional neural networks model of pre-training, it is inserted into a new full connection The feature vector of the 7th section of output of model is converted to two-value code by layer, this layer using sigmoid activation primitives;Wherein, depth The initial parameter of convolutional neural networks is trained obtained from ImageNet data sets, and initial for new full articulamentum Parameter builds cryptographic Hash by the way of random projection transforms;
For VideoGIS frame to be retrieved, what is extracted first is the feature of the output of new full articulamentum, passes through the threshold to activation Two-value code is obtained after value binarization;Finally by the two-value code in the two-value code of VideoGIS frame to be retrieved and property data base Between Hamming distance be less than those of given threshold value VideoGIS frame image and be put into candidate pool.
6. a kind of VideoGIS data retrieval method based on deep learning according to claim 5, which is characterized in that institute The retrieval of second layer essence is stated to specifically include:
For the candidate pool image obtained in VideoGIS frame image to be retrieved and coarse search, according to from convolutional neural networks The feature of 7th section of extraction, specifically calculates the similarity between them, to determine the VideoGIS from candidate pool with Euclidean distance The preceding m retrieval result of frame image.
CN201810162847.8A 2018-02-26 2018-02-26 A kind of VideoGIS data retrieval method based on deep learning Pending CN108280233A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810162847.8A CN108280233A (en) 2018-02-26 2018-02-26 A kind of VideoGIS data retrieval method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810162847.8A CN108280233A (en) 2018-02-26 2018-02-26 A kind of VideoGIS data retrieval method based on deep learning

Publications (1)

Publication Number Publication Date
CN108280233A true CN108280233A (en) 2018-07-13

Family

ID=62808720

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810162847.8A Pending CN108280233A (en) 2018-02-26 2018-02-26 A kind of VideoGIS data retrieval method based on deep learning

Country Status (1)

Country Link
CN (1) CN108280233A (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920720A (en) * 2018-07-30 2018-11-30 电子科技大学 The large-scale image search method accelerated based on depth Hash and GPU
CN108985457A (en) * 2018-08-22 2018-12-11 北京大学 A kind of deep neural network construction design method inspired by optimization algorithm
CN109492129A (en) * 2018-10-26 2019-03-19 武汉理工大学 A kind of similar video searching method and system based on double-current neural network
CN109753582A (en) * 2018-12-27 2019-05-14 西北工业大学 The method of magnanimity photoelectricity ship images quick-searching based on Web and database
CN109783691A (en) * 2018-12-29 2019-05-21 四川远鉴科技有限公司 A kind of video retrieval method of deep learning and Hash coding
CN110110113A (en) * 2019-05-20 2019-08-09 重庆紫光华山智安科技有限公司 Image search method, system and electronic device
CN110163061A (en) * 2018-11-14 2019-08-23 腾讯科技(深圳)有限公司 For extracting the method, apparatus, equipment and computer-readable medium of video finger print
CN110221979A (en) * 2019-06-04 2019-09-10 广州虎牙信息科技有限公司 Performance test methods, device, equipment and the storage medium of application program
CN110717068A (en) * 2019-08-27 2020-01-21 中山大学 Video retrieval method based on deep learning
CN111078993A (en) * 2019-09-24 2020-04-28 上海依图网络科技有限公司 Method and system for improving retrieval recall rate through extended query
CN111382287A (en) * 2018-12-30 2020-07-07 浙江宇视科技有限公司 Picture searching method and device, storage medium and electronic equipment
WO2020147857A1 (en) * 2019-01-18 2020-07-23 上海极链网络科技有限公司 Method and system for extracting, storing and retrieving mass video features
CN111767204A (en) * 2019-04-02 2020-10-13 杭州海康威视数字技术股份有限公司 Overflow risk detection method, device and equipment
CN112528077A (en) * 2020-11-10 2021-03-19 山东大学 Video face retrieval method and system based on video embedding
CN113032372A (en) * 2021-05-24 2021-06-25 南京北斗创新应用科技研究院有限公司 ClickHouse database-based space big data management method
CN113297899A (en) * 2021-03-23 2021-08-24 上海理工大学 Video hash algorithm based on deep learning
CN117011766A (en) * 2023-07-26 2023-11-07 中国信息通信研究院 Artificial intelligence detection method and system based on intra-frame differentiation

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104156464A (en) * 2014-08-20 2014-11-19 中国科学院重庆绿色智能技术研究院 Micro-video retrieval method and device based on micro-video feature database
CN105718890A (en) * 2016-01-22 2016-06-29 北京大学 Method for detecting specific videos based on convolution neural network
CN106227851A (en) * 2016-07-29 2016-12-14 汤平 Based on the image search method searched for by depth of seam division that degree of depth convolutional neural networks is end-to-end

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104156464A (en) * 2014-08-20 2014-11-19 中国科学院重庆绿色智能技术研究院 Micro-video retrieval method and device based on micro-video feature database
CN105718890A (en) * 2016-01-22 2016-06-29 北京大学 Method for detecting specific videos based on convolution neural network
CN106227851A (en) * 2016-07-29 2016-12-14 汤平 Based on the image search method searched for by depth of seam division that degree of depth convolutional neural networks is end-to-end

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HAIHONG DAI等: "VideoGIS Data Retrieval Based on Multi-feature Fusion", 《2017 12TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND KNOWLEDGE ENGINEERING (ISKE)》 *

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920720A (en) * 2018-07-30 2018-11-30 电子科技大学 The large-scale image search method accelerated based on depth Hash and GPU
CN108985457A (en) * 2018-08-22 2018-12-11 北京大学 A kind of deep neural network construction design method inspired by optimization algorithm
CN108985457B (en) * 2018-08-22 2021-11-19 北京大学 Deep neural network structure design method inspired by optimization algorithm
CN109492129B (en) * 2018-10-26 2020-08-07 武汉理工大学 Similar video searching method and system based on double-flow neural network
CN109492129A (en) * 2018-10-26 2019-03-19 武汉理工大学 A kind of similar video searching method and system based on double-current neural network
CN110163061B (en) * 2018-11-14 2023-04-07 腾讯科技(深圳)有限公司 Method, apparatus, device and computer readable medium for extracting video fingerprint
CN110163061A (en) * 2018-11-14 2019-08-23 腾讯科技(深圳)有限公司 For extracting the method, apparatus, equipment and computer-readable medium of video finger print
CN109753582A (en) * 2018-12-27 2019-05-14 西北工业大学 The method of magnanimity photoelectricity ship images quick-searching based on Web and database
CN109783691A (en) * 2018-12-29 2019-05-21 四川远鉴科技有限公司 A kind of video retrieval method of deep learning and Hash coding
CN111382287A (en) * 2018-12-30 2020-07-07 浙江宇视科技有限公司 Picture searching method and device, storage medium and electronic equipment
WO2020147857A1 (en) * 2019-01-18 2020-07-23 上海极链网络科技有限公司 Method and system for extracting, storing and retrieving mass video features
CN111767204A (en) * 2019-04-02 2020-10-13 杭州海康威视数字技术股份有限公司 Overflow risk detection method, device and equipment
CN111767204B (en) * 2019-04-02 2024-05-28 杭州海康威视数字技术股份有限公司 Spill risk detection method, device and equipment
CN110110113A (en) * 2019-05-20 2019-08-09 重庆紫光华山智安科技有限公司 Image search method, system and electronic device
CN110221979A (en) * 2019-06-04 2019-09-10 广州虎牙信息科技有限公司 Performance test methods, device, equipment and the storage medium of application program
CN110717068A (en) * 2019-08-27 2020-01-21 中山大学 Video retrieval method based on deep learning
CN110717068B (en) * 2019-08-27 2023-04-18 中山大学 Video retrieval method based on deep learning
CN111078993A (en) * 2019-09-24 2020-04-28 上海依图网络科技有限公司 Method and system for improving retrieval recall rate through extended query
CN112528077A (en) * 2020-11-10 2021-03-19 山东大学 Video face retrieval method and system based on video embedding
CN112528077B (en) * 2020-11-10 2022-12-16 山东大学 Video face retrieval method and system based on video embedding
CN113297899A (en) * 2021-03-23 2021-08-24 上海理工大学 Video hash algorithm based on deep learning
CN113032372B (en) * 2021-05-24 2021-09-28 南京北斗创新应用科技研究院有限公司 ClickHouse database-based space big data management method
CN113032372A (en) * 2021-05-24 2021-06-25 南京北斗创新应用科技研究院有限公司 ClickHouse database-based space big data management method
CN117011766A (en) * 2023-07-26 2023-11-07 中国信息通信研究院 Artificial intelligence detection method and system based on intra-frame differentiation
CN117011766B (en) * 2023-07-26 2024-02-13 中国信息通信研究院 Artificial intelligence detection method and system based on intra-frame differentiation

Similar Documents

Publication Publication Date Title
CN108280233A (en) A kind of VideoGIS data retrieval method based on deep learning
CN107506740B (en) Human body behavior identification method based on three-dimensional convolutional neural network and transfer learning model
CN113254648B (en) Text emotion analysis method based on multilevel graph pooling
CN110378334B (en) Natural scene text recognition method based on two-dimensional feature attention mechanism
CN107330364B (en) A kind of people counting method and system based on cGAN network
CN106407352B (en) Traffic image search method based on deep learning
CN108171701B (en) Significance detection method based on U network and counterstudy
CN110134946B (en) Machine reading understanding method for complex data
CN108615036A (en) A kind of natural scene text recognition method based on convolution attention network
CN112949647B (en) Three-dimensional scene description method and device, electronic equipment and storage medium
CN108122035B (en) End-to-end modeling method and system
CN107527318A (en) A kind of hair style replacing options based on generation confrontation type network model
CN107066973A (en) A kind of video content description method of utilization spatio-temporal attention model
CN109543722A (en) A kind of emotion trend forecasting method based on sentiment analysis model
CN106960206A (en) Character identifying method and character recognition system
CN110263659A (en) A kind of finger vein identification method and system based on triple loss and lightweight network
CN104951554B (en) It is that landscape shines the method for mixing the verse for meeting its artistic conception
CN108549658A (en) A kind of deep learning video answering method and system based on the upper attention mechanism of syntactic analysis tree
CN109840322A (en) It is a kind of based on intensified learning cloze test type reading understand analysis model and method
CN104239420A (en) Video fingerprinting-based video similarity matching method
CN107169106A (en) Video retrieval method, device, storage medium and processor
Fu et al. Machine learning techniques for ontology-based leaf classification
CN109886072A (en) Face character categorizing system based on two-way Ladder structure
CN107767416A (en) The recognition methods of pedestrian's direction in a kind of low-resolution image
CN107480723A (en) Texture Recognition based on partial binary threshold learning network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180713

RJ01 Rejection of invention patent application after publication