CN106354816A - Video image processing method and video image processing device - Google Patents
Video image processing method and video image processing device Download PDFInfo
- Publication number
- CN106354816A CN106354816A CN201610765659.5A CN201610765659A CN106354816A CN 106354816 A CN106354816 A CN 106354816A CN 201610765659 A CN201610765659 A CN 201610765659A CN 106354816 A CN106354816 A CN 106354816A
- Authority
- CN
- China
- Prior art keywords
- video
- image
- destination object
- feature
- retrieved
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/73—Querying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7837—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/48—Matching video sequences
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
Abstract
The invention provides a video image processing method and a video image processing device. The method includes: acquiring a video image sequence; recognizing a target object from a video image frame in the video image sequence; tracking the target object, and determining a motion trail of the target object; acquiring video structural information on the basis of the target object and the motion trail of the target object; performing target object retrieval and/or video compression on the video image sequence on the basis of the video structural information. On the basis of the video image processing method and the video image processing device, quickness in target investigation is realized, target investigation is accelerated, and case cracking speed is increased.
Description
Technical field
The present invention relates to technical field of image processing, more particularly, to a kind of method of video image processing and device.
Background technology
Perfect with video monitoring system, video image investigative technique has become as Public Security Organss and continues Forensic Science, OK
The fourth-largest solving criminal cases technology after technology detectd by dynamic technology, net.And current video image investigative technique with tactics of human sea is
Main, that is, need a large amount of investigators target of investication from each video frame image of video, this investigation mode needs to consume greatly
The manpower of amount, and need to expend longer time, that is, existing investigation mode wastes time and energy, and it is bad to investigate effect.
Content of the invention
In view of this, the invention provides a kind of method of video image processing and device, of the prior art in order to solve
Video image investigation mode wastes time and energy and then leads to the slow problem of cracking of cases, and its technical scheme is as follows:
A kind of method of video image processing, methods described includes:
Obtain sequence of video images;
Destination object is identified in picture frame from described sequence of video images;
Described destination object is tracked and determines with the movement locus of described destination object;
Movement locus based on described destination object and described destination object obtain video structural information;
Destination object retrieval is carried out based on described video structural information and/or video is carried out to described sequence of video images
Concentrate.
Wherein, identify destination object in described each frame video image from described sequence of video images, comprising:
Based on identification target pair in depth convolutional neural networks each video frame image from described sequence of video images
As.
Wherein, described described destination object is tracked, comprising:
Based on the light flow point extracted on the described destination object from described video frame image, using lucas kanade light
Stream method track algorithm is tracked to described destination object.
Wherein, described video structural information includes: the text message of destination object and/or image feature information, described
The text message of destination object includes attribute information and the movable information of described destination object;
Then described destination object retrieval is carried out based on described video structural information, comprising:
When receiving the search instruction to text message to be retrieved, based on described text message to be retrieved in described target
Retrieve in the text message of object, or, when receiving the search instruction to image to be retrieved, based on described image to be retrieved
The image feature information of described destination object is retrieved, or, when receiving the search instruction to event information to be retrieved,
Based on described event to be retrieved and the event model that pre-builds is retrieved in described text message, obtain retrieval result;
Export the target object information associating with described retrieval result.
Wherein, the image feature information of described destination object includes depth convolution feature and local feature;
Described retrieved in the image feature information of described destination object based on described image to be retrieved, obtain retrieval knot
Really, comprising:
Depth convolution feature based on described image to be retrieved presses first in the image feature information of described destination object
Matched rule is mated, and obtains candidate characteristic set;
Depth convolution feature based on described image to be retrieved and local feature are concentrated in described candidate feature and are pressed second
Join rule to be mated, obtain target image characteristics as described testing result.
Wherein, the described depth convolution feature based on described image to be retrieved is in the image feature information of described destination object
In mated by the first matched rule, obtain candidate characteristic set, comprising:
Obtain depth convolution feature and the local feature of described image to be retrieved, and the depth volume to described image to be retrieved
Long-pending feature carries out the binary-coding feature that binary-coding obtains described image to be retrieved;
By the binary-coding feature of described image to be retrieved respectively with the image feature information of described destination object in each
The corresponding binary-coding feature of individual depth convolution feature is mated, by with the binary-coding feature of described image to be retrieved
The binary-coding feature that degree of joining is more than the first preset value is defined as target binary-coding feature, and will be with described target code feature
Corresponding target depth convolution feature and target local feature are as candidate characteristic set;
The described depth convolution feature based on described image to be retrieved and local feature are concentrated by the in described candidate feature
Two matched rules are mated, and obtain target image characteristics as described testing result, comprising:
Each depth convolution feature that the depth convolution feature of described image to be retrieved and described candidate feature are concentrated is entered
Row coupling, and, each local feature that the local feature of described image to be retrieved is concentrated with described candidate feature carries out
Join, depth convolution feature and the comprehensive matching degree of corresponding local feature are more than the characteristics of image of the second preset value as retrieval knot
Really;
The target object information that then described output is associated with described retrieval result, particularly as follows:
Export described depth convolution feature and the comprehensive matching degree of corresponding local feature is more than the image spy of the second preset value
Levy associated destination object image, described destination object image is in advance from the video frame image that described destination object is located
The image of the described destination object extracting.
Wherein, described video structural information includes: in described sequence of video images, the image of each video frame image is close
Degree, described image density is used for characterizing the situation of destination object in described video frame image;
Then the movement locus based on described destination object and described destination object obtain described structured message and include:
Determine institute by the movement locus of position in video frame image for the described destination object and described destination object
State the structural information of monitor area in sequence of video images, the structural information of described monitor area includes described destination object in institute
State the area information occurring in monitor area;
Determine the image of each video frame image in described sequence of video images based on the structural information of described monitor area
Density.
Wherein, described video concentration is carried out to described sequence of video images based on described video structural information, comprising:
Image density based on each video frame image described carries out segmentation to described sequence of video images, and from each point
Video-frequency band to be concentrated is determined in section;
Described video-frequency band to be concentrated is carried out with video concentration, and does not enter carrying out the video-frequency band after video concentration with other
The video-frequency band that row video concentrates merges, and obtains the sequence of video images concentrating.
Wherein, described segmentation is carried out to described sequence of video images based on described image density, and from each segmentation really
Video-frequency band to be concentrated in fixed, comprising:
Image density threshold value set in advance is utilized by described video image by the image density of each video frame image
Sequence is divided into multiple video-frequency bands, and each described video-frequency band includes multiple continuous video frame images;
The video-frequency band that the image density of each video frame image is all higher than described image density threshold is defined as to be concentrated
Video-frequency band.
Wherein, described video concentration is carried out to described video-frequency band to be concentrated, comprising:
By space-time concentration model determine will at least one destination object in described video-frequency band to be concentrated time dimension with
The optimum shift strategy of movement on Spatial Dimension;
Image co-registration is carried out based on described optimum shift strategy, obtains the video-frequency band after concentrating.
A kind of video image processing device, described device includes: video acquiring module, target recognition module, target following
Module, video structural data obtaining module and processing module;
Described video acquiring module, for obtaining sequence of video images;
Described target recognition module, for the video from the described sequence of video images that described video acquiring module obtains
Destination object is identified in picture frame;
Described target tracking module, the described destination object for identifying to described target recognition module is tracked simultaneously
Determine the movement locus of described destination object;
Described information acquisition module, the described destination object being identified based on described target recognition module and described target with
The movement locus of the described destination object that track module determines obtain video structural information;
Described processing module, the described video structural information for being obtained based on described information acquisition module carries out target
Object retrieval and/or video concentration is carried out to described sequence of video images.
Wherein, described target recognition module, specifically for based on depth convolutional neural networks from described sequence of video images
In each video frame image in identify destination object.
Wherein, described target tracking module, specifically for based on the described destination object from described video frame image
The light flow point extracted, is tracked to described destination object using lucas kanade optical flow method track algorithm.
Wherein, described video structural information includes: the text message of destination object and/or image feature information, described
The text message of destination object includes attribute information and the movable information of described destination object;
Described processing module, comprising: retrieval module and output module;
Described retrieval module, for when receiving the search instruction to text message to be retrieved, based on described to be retrieved
Text message is retrieved in the text message of described destination object, or, when receiving the search instruction to image to be retrieved,
Retrieved in the image feature information of described destination object based on described image to be retrieved, or, when receiving to thing to be retrieved
During the search instruction of part information, based on described event to be retrieved and the event model that pre-builds is examined in described text message
Rope, obtains retrieval result;
Described output module, the target object information associating with the described retrieval result of described retrieval module for output.
Wherein, the image feature information of described destination object includes depth convolution feature and local feature;
Described retrieval module includes: thick matching module and accurately mate module;
Described thick matching module, for the depth convolution feature based on described image to be retrieved described destination object figure
As being mated by the first matched rule in characteristic information, obtain candidate characteristic set;
Described accurately mate module, for the depth convolution feature based on described image to be retrieved and local feature described
Candidate feature is concentrated and is mated by the second matched rule, obtains target image characteristics as described testing result.
Wherein, described thick matching module includes: feature obtains and processes submodule and thick matched sub-block;
Described feature obtains and processes submodule, and depth convolution feature and local for obtaining described image to be retrieved are special
Levy, and the depth convolution feature of described image to be retrieved is carried out with binary-coding, obtain the binary-coding of described image to be retrieved
Feature, is additionally operable to carry out binary-coding respectively to each depth convolution feature in the image feature information of described destination object,
Obtain and each the corresponding binary-coding feature of depth convolution feature in the image feature information of described destination object;
Described thick matched sub-block, for by the binary-coding feature of described image to be retrieved respectively with described destination object
Image feature information in each corresponding binary-coding feature of depth convolution feature mated, will be with described figure to be retrieved
The binary-coding feature that the matching degree of the binary-coding feature of picture is more than the first preset value is defined as target binary-coding feature, and
Will target depth convolution feature corresponding with described target code feature and target local feature as candidate characteristic set;
Described accurately mate module, specifically for by the depth convolution feature of described image to be retrieved and described candidate feature
Each depth convolution feature concentrated is mated, and, by the local feature of described image to be retrieved and described candidate feature
Each local feature concentrated is mated, and depth convolution feature and the comprehensive matching degree of corresponding local feature is more than second pre-
If the characteristics of image of value is as retrieval result;
Then described output module, specifically for exporting the comprehensive matching degree of described depth convolution feature and corresponding local feature
More than the destination object image associated by the characteristics of image of the second preset value, described destination object image is in advance from described target
The image of the described destination object extracting in the video frame image that object is located.
Wherein, described video structural information includes: in described sequence of video images, the image of each video frame image is close
Degree, described image density is used for characterizing the situation of destination object in described video frame image;
Described information acquisition module includes: monitor area structure determination module and image density determining module;
Described monitor area structure determination submodule, for by position in video frame image for the described destination object with
And the movement locus of described destination object determine the structural information of monitor area in described sequence of video images, described monitor area
Structural information include the area information that described destination object occurs in described monitor area;
Described image density determination sub-module, described in being determined based on described monitor area structure determination submodule
The structural information of monitor area determines the image density of each video frame image in described sequence of video images.
Wherein, described processing module includes: video pre-filtering module and video concentrate module;
Described video pre-filtering module, for based on described image density described sequence of video images is carried out segmentation and from
Video-frequency band to be concentrated is determined in each segmentation;
Described video concentrates module, and for carrying out video concentration to described video-frequency band to be concentrated, and it is dense to carry out video
The video-frequency band that video-frequency band after contracting does not carry out video concentration with other merges, and obtains the sequence of video images concentrating.
Wherein, described video pre-filtering module, comprising: video segmentation submodule and video-frequency band determination sub-module to be concentrated;
Described video segmentation submodule, for utilizing image set in advance by the image density of each video frame image
Described sequence of video images is divided into multiple video-frequency bands by density threshold;
Described video-frequency band determination sub-module to be concentrated, for being all higher than described figure by the image density of each video frame image
As the video-frequency band of density threshold is defined as video-frequency band to be concentrated.
Wherein, described video concentrates module and includes: optimum concentration strategy determination sub-module and image co-registration submodule;
Described optimum concentration strategy determination sub-module, for being determined described video-frequency band to be concentrated by space-time concentration model
In the movement on time dimension and Spatial Dimension of at least one destination object optimum shift strategy;
Described image merges submodule, for carrying out image co-registration based on described optimum shift strategy, obtains after concentrating
Video-frequency band.
Technique scheme has the advantages that
Method of video image processing and device that the present invention provides, can know to the destination object in sequence of video images
And follow the tracks of, and then video structural information can be obtained by the movement locus based on destination object and destination object, do not regard getting
After frequency structured message, line retrieval can be entered based on video structural information, quickly can investigate target by this way, in addition,
It is also based on video structural information and carries out video concentration, due to concentrating video bag target containing original video all the information and frame
Number is few, therefore, can quickly investigate target based on concentrating video.That is, how user is known a priori by some information of destination object,
Then directly can enter line retrieval using these information, thus quickly investigating target, if user does not know destination object
Information, then can directly browse concentration video, thus quickly investigating target.The method of video image processing being provided based on the present invention
And device can quickly investigate target, that is, the present invention improves the speed of target detection, and then improves the detection speed of case
Degree.
Brief description
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
Have technology description in required use accompanying drawing be briefly described it should be apparent that, drawings in the following description be only this
Inventive embodiment, for those of ordinary skill in the art, on the premise of not paying creative work, can also basis
The accompanying drawing providing obtains other accompanying drawings.
Fig. 1 is the schematic flow sheet of method of video image processing provided in an embodiment of the present invention;
Fig. 2 is in method of video image processing provided in an embodiment of the present invention, carries out target detection, in video frame image
Generate a series of schematic diagram of target candidate frames;
Fig. 3 is in method of video image processing provided in an embodiment of the present invention, on the destination object from video frame image
The schematic diagram of the light flow point extracted;
Fig. 4 is in method of video image processing provided in an embodiment of the present invention, based on image to be retrieved in destination object
Retrieve in image feature information, obtain the schematic flow sheet of the specific implementation of retrieval result;
Fig. 5 is the fortune in method of video image processing provided in an embodiment of the present invention, based on destination object and destination object
Dynamic rail mark obtains the schematic flow sheet of video structural information;
Fig. 6 is in method of video image processing provided in an embodiment of the present invention, based on image density to sequence of video images
Carry out the schematic flow sheet realizing process of video-frequency band to be concentrated in segmentation, and determination from each segmentation;
Fig. 7 is in method of video image processing provided in an embodiment of the present invention, carries out video to video-frequency band to be concentrated dense
The schematic flow sheet realizing process of contracting;
Fig. 8 is the structural representation of video image processing device provided in an embodiment of the present invention.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation description is it is clear that described embodiment is only a part of embodiment of the present invention, rather than whole embodiments.It is based on
Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of not making creative work
Embodiment, broadly falls into the scope of protection of the invention.
Embodiments provide a kind of method of video image processing, refer to Fig. 1, show this Computer Vision
The schematic flow sheet of method, may include that
Step s101: obtain sequence of video images.
Step s102: identify destination object in the video frame image from sequence of video images.
Step s103: destination object is tracked and determines with the movement locus of destination object.
Step s104: the movement locus based on destination object and destination object obtain video structural information.
Step s105: destination object retrieval is carried out based on video structural information and/or sequence of video images regarded
Frequency concentrates.
The present invention provide method of video image processing, the destination object in sequence of video images can be identified and with
Track, and then video structural information can be obtained by the movement locus based on destination object and destination object, getting video structure
After change information, line retrieval can be entered based on video structural information, quickly can investigate target by this way, in addition, also can base
Carry out video concentration in video structural information, due to concentrating quantity of information and the frame number that video bag contains containing raw video image
Few, therefore, can quickly investigate suspicious object based on concentrating video.That is, how user is known a priori by the concrete letter of destination object
Breath, then directly can enter line retrieval using these information, thus quickly investigating destination object, if user does not know target
The information of object, then can directly browse concentration video, thus quickly investigating destination object.Based on provided in an embodiment of the present invention
Method of video image processing quickly can investigate destination object from video, and that is, the embodiment of the present invention improves target detection
Speed, and then improve the detection speed of case, better user experience.
In view of the method using background modeling more than traditional target identification method, first the background of image is built
Mould, after model is set up, image is compared with background model, determines foreground target according to comparative result.However, the method exists
Adaptability under the environment such as low contrast and light change is poor, usually can produce a lot of knowledges by mistake to moving target when being identified
Not, and to static target missing inspection can usually be produced.In view of existing recognition methodss have problems, follow-up in order to improve
Retrieval accuracy, the invention provides identify the identification of destination object in a kind of convolutional neural networks video frame image based on depth
Method, that is, in above-described embodiment, identifies destination object in the video frame image from sequence of video images
Identify that the process of destination object includes from video frame image based on depth convolutional neural networks: first with being based on
The target detection model of depth convolutional neural networks carries out target detection, generates a series of target candidate in video frame image
Then frame, as shown in Fig. 2 carry out target recognition using the object-class model based on depth convolutional neural networks, and to target
Candidate frame is corrected.
It should be noted that needing it is trained before being identified using depth convolutional neural networks: one
Plant in possible implementation, for public security monitors environment and target (car/people) feature, alexnet network structure can be chosen and enter
Row training, carries out pre-training using imagenet2012 data set, and using public security monitoring sample, network is entered on this basis
Row tuning.Widely different between due to different vehicle type, so sample is divided into into car, passenger vehicle, lorry, three-wheel during training
Car, non-motor vehicle (motorcycle/electric car/bicycle), pedestrian amount to 6 big class.
In addition, in order to improve follow-up recognition speed, target detection network model and target classification network model can be entered
Row convolution feature is shared.
After identifying destination object, destination object is tracked.Carrying out in view of existing optical flow tracking algorithm
When light flow point is extracted, generally light flow point is extracted to entire image, but be often concerned with target pair when carrying out target following
As, the light flow point of other extraneous areas can interfere to the tracking of destination object, in order to improve speed and the accuracy of tracking,
The present invention is tracked to destination object using improved lucas kanade optical flow method track algorithm, that is, be based on from video image
The upper light flow point extracted of destination object (foreground image) in frame, using lucas kanade optical flow method track algorithm to target pair
As being tracked.
Specifically, extract light flow point first against in view picture video frame image, then from foreground image (i.e. video frame image
In destination object) in extract light flow point, be finally based on from foreground image extract light flow point by view picture video frame image
Light flow point outside the light flow point being in foreground image filters, as shown in Figure 3.Wherein, foreground image passes through two neighboring regarding
Frequency frames differencing is got, and the white portion in the second width image in Fig. 3 is foreground area, and black portions are background area.
In the above-described embodiments, the video structural information that the movement locus based on destination object and destination object obtain can
To include: the text message of destination object and/or image feature information.Wherein, the text message of destination object can include mesh
The attribute information of mark object and movable information.
Exemplary, destination object is vehicle, then the attribute information of destination object can include the classification of vehicle, vehicle
Color, vehicle license plate number, vehicle brand model etc., the movable information of destination object can be the direction of motion, the vehicle of vehicle
Position in video frame image etc..
After getting above-mentioned video structural information, just line retrieval can be entered based on video structural information, thus soon
Speed investigates destination object.Kind is had based on the implementation that video structural information carries out destination object retrieval.
In a kind of possible implementation, line retrieval can be entered based on text, that is, when receiving to text message to be retrieved
Search instruction when, retrieved in the text message of destination object based on text message to be retrieved, obtain retrieval result, output with
The target object information of retrieval result association.It should be noted that in the present embodiment, can be by the text envelope of all destination objects
Breath composition text message storehouse, when carrying out text retrieval, enters line retrieval in text information bank.
Preferably, target object information is destination object image, and destination object image is to be located from destination object in advance
The image of the destination object extracting in video frame image, the text message in destination object image and video structural information and/
Or image feature information association.
Specifically, obtain user input text message to be retrieved, based on text message to be retrieved destination object literary composition
The target text information mated with text message to be retrieved, the target pair of output and target text information association is searched in this information
As image.
Exemplary, user enters line retrieval in search interface input vehicle license plate number, if the target information of destination object
In comprise this vehicle license plate number, then can directly the image of the vehicle being associated with this vehicle license plate number be shown, so, and soon
Speed has investigated target.
In alternatively possible implementation, can rule-based enter line retrieval, that is, when receive to event to be retrieved believe
During the search instruction of breath, based on event to be retrieved and the event model that pre-builds is examined in the text message of destination object
Rope, obtains retrieval result, exports the target object information associating with retrieval result.
Specifically, obtain the event information to be retrieved of user input, looked in text message based on event information to be retrieved
Look for the text message related to event information to be retrieved, from the text message related to event information to be retrieved, determine target
Text message, wherein, the corresponding event information of target text information is event information to be retrieved;Output is closed with target text information
The destination object image of connection.
Wherein, event information can for region invasion, line of stumbling, hover, the text envelope related to event information to be retrieved
Breath can be positional information in video frame image for the destination object, or the movement locus of destination object, based on destination object
The change of position or movement locus can determine which event destination object there occurs, if a destination object there occurs with to be checked
Rope event identical event, then export this destination object image.
In alternatively possible implementation, line retrieval can be entered based on image, when receiving the inspection to image to be retrieved
During Suo Zhiling, retrieved in the image feature information storehouse of destination object based on image to be retrieved, obtain retrieval result, output and inspection
The target object information of hitch fruit association.It should be noted that in the present embodiment, can be by the characteristics of image of all destination objects
Information forms image feature information storehouse, when carrying out image retrieval, can the feature based on image to be retrieved believe in this characteristics of image
Breath enters line retrieval in storehouse.
Specifically, obtain the image to be retrieved of user input first, and extract the characteristics of image of image to be retrieved as treating
Retrieval characteristics of image, is then based on characteristics of image to be retrieved and searches in image feature information and Image Feature Matching to be retrieved
Target image characteristics, export the destination object image associating with target image characteristics.For a user, carried out based on image
During retrieval, user only needs to input image to be retrieved in search interface, just can obtain and treat the destination object of clue images match
Image.
In a kind of preferred implementation, characteristics of image includes depth convolution feature and local feature, in the present embodiment
In, the depth convolution feature of image to be retrieved can be primarily based in the image feature information of destination object by the first matched rule
Mated, obtained candidate characteristic set;It is then based on the depth convolution feature of image to be retrieved and local feature in candidate feature
Concentrate and mated by the second matched rule, obtain target image characteristics as testing result.I.e. final output is and target figure
Information as the destination object of feature association.
Then refer to Fig. 4, show and retrieved in the image feature information of destination object based on image to be retrieved, obtain inspection
The schematic flow sheet of the specific implementation of hitch fruit, may include that
Step s401: obtain depth convolution feature and the local feature of image to be retrieved, and the depth to image to be retrieved
Convolution feature carries out binary-coding, obtains the binary-coding feature of image to be retrieved.
It should be noted that depth convolution feature is based on cnn depth convolutional neural networks extracting from high-rise, local feature closes
The local attribute of note image, can be used as the auxiliary of depth convolution feature and supplement.Wherein, local feature favor speed is fast, Shandong
The high surf feature of rod, is designated as fsurf.
In a kind of preferred implementation, dimensionality reduction can be carried out using pca method to depth convolution feature, remove redundancy special
Levy, the depth convolution feature removing after redundancy feature is carried out follow-up coupling as final depth convolution feature, is designated as
fcnn+pca.Binary-coding is carried out using lsh method to depth convolution feature, generates binary-coding feature, fcnnh.
Step s402: by the binary-coding feature of image to be retrieved respectively with the image feature information of destination object in each
The corresponding binary-coding feature of individual depth convolution feature is mated, by the matching degree with the binary-coding feature of image to be retrieved
It is defined as target binary-coding feature more than the binary-coding feature of the first preset value, and will mesh corresponding with target code feature
Mark depth convolution feature and local feature are as candidate characteristic set.
When mating to binary-coding feature, matching degree can pass through similarity characterization, and similarity can pass through calculating two
The Hamming distances of individual binary-coding feature obtain.
It should be noted that because characteristics of image is associated with destination object image, thus determine that candidate characteristic set phase
When in defining candidate image collection { o1, o2... ..., on}:
Wherein, oiRepresent i-th image to be matched, siRepresent image to be retrieved and the similarity of i-th image to be matched,
θhFor similarity threshold.
The above-mentioned process that binary-coding feature is mated is thick matching process, after the completion of thick coupling, further base
Carry out accurately mate in depth convolution feature and local feature.
Step s403: each depth convolution feature that depth convolution feature and the candidate feature of image to be retrieved are concentrated is entered
Row coupling, and, the local feature of image to be retrieved is mated with each local feature that candidate feature is concentrated, by depth
Convolution feature is more than the characteristics of image of the second preset value as retrieval result with the comprehensive matching degree of corresponding local feature.
The process of accurately mate carries out Similarity Measure using Euclidean distance, specifically by following formula calculating similarity:
S (k)=α × scnn+pca(k)+β×ssurf(k)
Wherein, α and β represents the weights of the similarity of depth convolution feature and local feature calculation, s respectivelycnn+pca(k) table
Show the similarity of depth convolution feature calculation, ssurfK () represents the similarity that local feature calculates.
Then export the target object information associating with retrieval result, particularly as follows: output depth convolution feature and corresponding local
The comprehensive matching degree of feature is more than the destination object image associated by characteristics of image of the second preset value.
Export destination object image when, if the destination object image meeting condition have multiple, can by similarity by
High to Low order shows each destination object image.
Said process gives and improves a kind of implementation that image investigates speed, enters line retrieval based on target information and obtains
Obtain target, the premise that should obtain target in this way is to know the key word for retrieval in advance, and that is, user knows in advance
The partial information of target of investication, however, sometimes, investigator may know nothing to target, that is, not used for retrieval
Key word, in this case, sequence of video images can only be browsed one by one it is contemplated that sequence of video images generally comprise more
Individual video frame image, and a lot of picture frames may be had in these video frame images not comprise the information of user's concern, in order to carry
High investigation speed, the embodiment of the present invention carries out video concentration based on video structural information to video image frame sequence, makes concentration
Video frame image afterwards has less frame number and but comprises substantial amounts of information.
In the present embodiment, video structural information may include that the figure of each video frame image in sequence of video images
As density.Wherein, image density is used for characterizing the situation of destination object in video frame image.
Refer to Fig. 5, show in above-described embodiment, the movement locus based on destination object and destination object obtain video
The schematic flow sheet of structured message, may include that
Step s501: determined by the movement locus of position in video frame image for the destination object and destination object and regard
The structural information of monitor area in frequency image sequence.
Wherein, the structural information of monitor area includes the area information that destination object occurs in monitor area.
Step s502: determine the image of each video frame image in sequence of video images based on the structural information of monitor area
Density.
After the above-mentioned video structural information obtaining, just based on this video structural information, sequence of video images can be entered
Row video concentrates.In order to improve the speed of video concentration, the image that the embodiment of the present invention is primarily based on each video frame image is close
Degree carries out segmentation to target video image sequence, and determines video-frequency band to be concentrated from each segmentation, then to be concentrated
Video-frequency band carry out video concentration, and enter carrying out the video-frequency bands that the video-frequency band after video concentration do not carry out video concentration with other
Row merges, and obtains the sequence of video images concentrating.
Further, refer to Fig. 6, show and segmentation is carried out to sequence of video images based on image density, and from each
In determining in segmentation, the schematic flow sheet realizing process of video-frequency band to be concentrated, may include that
Step s601: image density threshold value set in advance is utilized by video by the image density of each video frame image
Image sequence is divided into multiple video-frequency bands.
Step s602: the video-frequency band that the image density of each video frame image is respectively less than image density threshold value is defined as treating
The video-frequency band concentrating.
Exemplary, sequence of video images includes 100 video frame images, each image in front 30 video frame images
The image density of frame of video is respectively less than the image density threshold value setting, each image/video frame in the 31st to 70 video frame image
Image density be all higher than the image density threshold value that sets, and each image/video frame in the 71st to the 100th video frame image
Image density be respectively less than the image density threshold value setting, then sequence of video images can be divided into 3 video-frequency bands, 1-30 frame is the
1 video-frequency band, 31-70 frame is the 2nd video-frequency band, and 71-100 frame is the 3rd video-frequency band, due to scheming in 1-30 frame, 71-100 frame
As density is respectively less than the image density threshold value setting, then 1-30 frame, this two video-frequency bands of 31-70 frame are defined as to be concentrated regarding
Frequency range.
After determining video-frequency band to be concentrated, video concentration is carried out to video-frequency band to be concentrated, refers to Fig. 7, illustrate
Video-frequency band to be concentrated carried out with the schematic flow sheet realizing process of video concentration, may include that
Step s701: determined at least one destination object in video-frequency band to be concentrated in time dimension by space-time concentration model
The optimum shift strategy of movement on degree and Spatial Dimension.
Step s702: image co-registration is carried out based on optimum shift strategy, obtains the video-frequency band after concentrating.
Space-time concentration model provided in an embodiment of the present invention is not being lost any target, is being ensured the same of target original temporal
When, carry out maximum video concentration in time, two, space dimension, video collisionless after concentration, no stroboscopic, vision is imitated
Fruit is good.
Specifically, the energy function of space-time concentration model is characterized as:
E (m)=min { σ ea(b)+{ασec(b,b')+βσet(b,b')}},(b,b'∈b)
Wherein, b is the image sequence with first object object, and b' is the image sequence with the second destination object, σ
eaB () is movable energy damage threshold, if the big target of area is not mapped onto concentrating in video, this is representative
Penalty value is bigger than normal, on the contrary then little it is to be understood that the big target of area more should be retained in concentration video in.ec(b,b')
For colliding conflict penalty term,Inner product for two tracks conflict periods.Concentrating video is will be former
Target moves on time shafts and spatial distribution, situations such as the intersection-type collision between track being produced unavoidably and blocks, such as two
There is the shared period in individual target sequence, and there is track cross, then penalty term is the inner product operation of corresponding overlapping region.et
(b, b') item is sequential penalty term, and sequential penalty term meaning is to keep the sequencing of original video life event as far as possible,
As in original video, two people one in front and one in back walk or talk when walking side by side, also to reasonably keep in concentrating video
This relativeness.et(b, b')=exp (- (d (b, b')/ω)), d (b, b') represent two tracks and share period center pixel
Euclidean distance, ω is custom parameter, adjusts event-order serie.It should be noted that above-mentioned optimum shift strategy is to work as space-time
The value of the energy function of concentration model be during minima corresponding destination object in the time and the strategy that spatially moves.
Corresponding with said method, the embodiment of the present invention additionally provides a kind of video image processing device, refers to Fig. 8,
Show the structural representation of this device, may include that video acquiring module 801, target recognition module 802, target following mould
Block 803, data obtaining module 804 and processing module 805.
Video acquiring module 801, for obtaining sequence of video images.
Target recognition module 802, for the video frame image from the sequence of video images that video acquiring module 801 obtains
Middle identification destination object.
Target tracking module 803, the destination object for identifying to target recognition module 802 is tracked and determines mesh
The movement locus of mark object.
Data obtaining module 804, the destination object being identified based on target recognition module 802 and target tracking module 803
The movement locus of the destination object determining obtain video knotization information.
Processing module 805, for being carried out based on the video structural information that video structural data obtaining module 804 obtains
Destination object is retrieved and/or is carried out video concentration to sequence of video images.
The present invention provide video image processing device, the destination object in sequence of video images can be identified and with
Track, and then video structural information can be obtained by the movement locus based on destination object and destination object, getting video structure
After change information, line retrieval can be entered based on video structural information, quickly can investigate target by this way, in addition, also can base
Carry out video concentration in video structural information, contain, due to concentrating video, quantity of information and the frame number that raw video image contains
Few, therefore, can quickly investigate target based on concentrating video.That is, how user is known a priori by some information of destination object, then
Directly line retrieval can be entered using these information, thus quickly investigating destination object, if user does not know destination object
Information, then can directly browse concentration video, thus quickly investigating destination object.Based on video provided in an embodiment of the present invention
Image processing apparatus quickly can investigate destination object from video, and that is, the embodiment of the present invention improves the speed of target detection
Degree, and then improve the detection speed of case, better user experience.
In the video image processing device that above-described embodiment provides, target recognition module 802, specifically for being rolled up based on depth
Destination object is identified in long-pending neutral net each video frame image from sequence of video images.
In the video image processing device that above-described embodiment provides, target tracking module 803, specifically for based on from video
The light flow point extracted on destination object in picture frame, is carried out to destination object using lucas kanade optical flow method track algorithm
Follow the tracks of.
In the video image processing device that above-described embodiment provides, what video structural data obtaining module 804 obtained regards
Frequency structured message includes: the text message of destination object and/or image feature information, and the text message of destination object includes mesh
The attribute information of mark object and movable information.
Then processing module 805, comprising: retrieval module and output module.
Retrieval module, for when receiving the search instruction to text message to be retrieved, based on described text to be retrieved
Information is retrieved in the text message of described destination object, or, when receiving the search instruction to image to be retrieved, it is based on
Described image to be retrieved is retrieved in the image feature information of described destination object, or, when receiving, event to be retrieved is believed
During the search instruction of breath, based on described event to be retrieved and the event model that pre-builds is retrieved in described text message,
Obtain retrieval result;
Output module, the target object information associating with the described retrieval result of described retrieval module for output.
In the above-described embodiments, the image feature information of destination object include depth convolution feature and with described depth convolution
The local feature of feature association.
Retrieval module may include that thick matching module and accurately mate module.
Thick matching module, for the depth convolution feature based on image to be retrieved in the image feature information of destination object
Mated by the first matched rule, obtained candidate characteristic set.
Accurately mate module, concentrates in candidate feature for the depth convolution feature based on image to be retrieved and local feature
Mated by the second matched rule, obtained target image characteristics as testing result.
Further, thick matching module includes: feature obtains and processes submodule and thick matched sub-block.
Feature obtains and processes submodule, for obtaining depth convolution feature and the local feature of image to be retrieved and right
The depth convolution feature of image to be retrieved carries out binary-coding, obtains the binary-coding feature of image to be retrieved, is additionally operable to mesh
Each depth convolution feature in the image feature information of mark object carries out binary-coding respectively, obtains the image with destination object
Each corresponding binary-coding feature of depth convolution feature in characteristic information.
Thick matched sub-block, for believing the binary-coding feature of image to be retrieved respectively with the characteristics of image of destination object
Each corresponding binary-coding feature of depth convolution feature in breath is mated, by the binary-coding feature with image to be retrieved
Matching degree be more than the binary-coding feature of the first preset value and be defined as target binary-coding feature, and will be with target code feature
Corresponding target depth convolution feature and target local feature are as candidate characteristic set.
Accurately mate module, specifically for each depth concentrating the depth convolution feature of image to be retrieved and candidate feature
Degree convolution feature is mated, and, each office that the local feature of described image to be retrieved is concentrated with described candidate feature
Portion's feature is mated, and the image that depth convolution feature is more than the second preset value with the comprehensive matching degree of corresponding local feature is special
Levy as retrieval result.
Then output module, is more than second specifically for output depth convolution feature with the comprehensive matching degree of corresponding local feature
Destination object image associated by the characteristics of image of preset value, wherein, destination object image is in advance from described destination object institute
Video frame image in extract destination object image.
In the video image processing device that above-described embodiment provides, what video structural data obtaining module 804 obtained regards
Frequency structured message includes: the image density of each video frame image in sequence of video images, and image density is used for characterizing video
The situation of destination object in picture frame.
Then video structural data obtaining module includes: monitor area structure determination module and image density determining module.
Monitor area structure determination submodule, for by position in video frame image for the destination object and described mesh
The movement locus of mark object determine the structural information of monitor area in sequence of video images, wherein, the structural information of monitor area
The area information occurring in described monitor area including described destination object.
Image density determination sub-module, for the described monitoring determined based on described monitor area structure determination submodule
The structural information in region determines the image density of each video frame image in sequence of video images.
In the video image processing device that above-described embodiment provides, processing module includes: video pre-filtering module and video
Concentrate module.
Video pre-filtering module, for based on image density sequence of video images is carried out segmentation and from each segmentation really
Make video-frequency band to be concentrated.
Video concentrates module, for carrying out video concentration to described video-frequency band to be concentrated, and after carrying out video concentration
Video-frequency band do not carry out the video-frequency bands of video concentration with other and merge, obtain the sequence of video images concentrating.
Further, video pre-filtering module, comprising: video segmentation submodule and video-frequency band determination sub-module to be concentrated.
Video segmentation submodule, for utilizing image density set in advance by the image density of each video frame image
Sequence of video images is divided into multiple video-frequency bands by threshold value;
Video-frequency band determination sub-module to be concentrated, close for the image density of each video frame image is all higher than described image
The video-frequency band of degree threshold value is defined as video-frequency band to be concentrated.
Further, video concentrates module and includes: optimum concentration strategy determination sub-module and image co-registration submodule.
Optimum concentrate tactful determination sub-module, for by space-time concentration model determine by described video-frequency band to be concentrated extremely
The optimum shift strategy of few destination object movement on time dimension and Spatial Dimension.
Image co-registration submodule, for carrying out image co-registration based on described optimum shift strategy, obtains the video after concentrating
Section.
In this specification, each embodiment is described by the way of going forward one by one, and what each embodiment stressed is and other
The difference of embodiment, between each embodiment identical similar portion mutually referring to.
It should be understood that disclosed method, device and equipment in several embodiments provided herein, permissible
Realize by another way.For example, device embodiment described above is only schematically, for example, described unit
Divide, only a kind of division of logic function, actual can have other dividing mode when realizing, for example multiple units or assembly
Can in conjunction with or be desirably integrated into another system, or some features can be ignored, or does not execute.Another, shown or
The coupling each other discussing or direct-coupling or communication connection can be by some communication interfaces, between device or unit
Connect coupling or communicate to connect, can be electrical, mechanical or other forms.
The described unit illustrating as separating component can be or may not be physically separate, show as unit
The part showing can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple
On NE.The mesh to realize this embodiment scheme for some or all of unit therein can be selected according to the actual needs
's.In addition, can be integrated in a processing unit or each in each functional unit in each embodiment of the present invention
Unit is individually physically present it is also possible to two or more units are integrated in a unit.
If described function realized using in the form of SFU software functional unit and as independent production marketing or use when, permissible
It is stored in a computer read/write memory medium.Based on such understanding, technical scheme is substantially in other words
Partly being embodied in the form of software product of part that prior art is contributed or this technical scheme, this meter
Calculation machine software product is stored in a storage medium, including some instructions with so that a computer equipment (can be individual
People's computer, server, or network equipment etc.) execution each embodiment methods described of the present invention all or part of step.
And aforesaid storage medium includes: u disk, portable hard drive, read only memory (rom, read-only memory), random access memory are deposited
Reservoir (ram, random access memory), magnetic disc or CD etc. are various can be with the medium of store program codes.
Described above to the disclosed embodiments, makes professional and technical personnel in the field be capable of or uses the present invention.
Multiple modifications to these embodiments will be apparent from for those skilled in the art, as defined herein
General Principle can be realized without departing from the spirit or scope of the present invention in other embodiments.Therefore, the present invention
It is not intended to be limited to the embodiments shown herein, and be to fit to and principles disclosed herein and features of novelty phase one
The scope the widest causing.
Claims (20)
1. a kind of method of video image processing is it is characterised in that methods described includes:
Obtain sequence of video images;
Destination object is identified in picture frame from described sequence of video images;
Described destination object is tracked and determines with the movement locus of described destination object;
Movement locus based on described destination object and described destination object obtain video structural information;
Destination object retrieval is carried out based on described video structural information and/or to carry out video to described sequence of video images dense
Contracting.
2. method of video image processing according to claim 1 it is characterised in that described from described sequence of video images
Each frame video image in identify destination object, comprising:
Based on identification destination object in depth convolutional neural networks each video frame image from described sequence of video images.
3. method of video image processing according to claim 1 it is characterised in that described described destination object is carried out with
Track, comprising:
Based on the light flow point extracted on the described destination object from described video frame image, using lucas kanade optical flow method
Track algorithm is tracked to described destination object.
4. method of video image processing according to claim 1 is it is characterised in that described video structural information includes:
The text message of destination object and/or image feature information, the text message of described destination object includes described destination object
Attribute information and movable information;
Then described destination object retrieval is carried out based on described video structural information, comprising:
When receiving the search instruction to text message to be retrieved, based on described text message to be retrieved in described destination object
Text message in retrieve, or, when receiving the search instruction to image to be retrieved, based on described image to be retrieved in institute
State retrieval in the image feature information of destination object, or, when receiving the search instruction to event information to be retrieved, it is based on
Described event to be retrieved and the event model pre-building are retrieved in described text message, obtain retrieval result;
Export the target object information associating with described retrieval result.
5. method of video image processing according to claim 4 is it is characterised in that the characteristics of image of described destination object is believed
Breath includes depth convolution feature and local feature;
Described retrieved in the image feature information of described destination object based on described image to be retrieved, obtain retrieval result, bag
Include:
Depth convolution feature based on described image to be retrieved presses the first coupling in the image feature information of described destination object
Rule is mated, and obtains candidate characteristic set;
Depth convolution feature based on described image to be retrieved and local feature are concentrated by the second coupling rule in described candidate feature
Then mated, obtained target image characteristics as described testing result.
6. method of video image processing according to claim 5 it is characterised in that described based on described image to be retrieved
Depth convolution feature is mated by the first matched rule in the image feature information of described destination object, obtains candidate feature
Collection, comprising:
Obtain depth convolution feature and the local feature of described image to be retrieved, and special to the depth convolution of described image to be retrieved
Levy the binary-coding feature carrying out that binary-coding obtains described image to be retrieved;
By the binary-coding feature of described image to be retrieved respectively with the image feature information of described destination object in each is deep
Degree convolution feature corresponding binary-coding feature is mated, by the matching degree with the binary-coding feature of described image to be retrieved
It is defined as target binary-coding feature more than the binary-coding feature of the first preset value, and will be corresponding with described target code feature
Target depth convolution feature and target local feature as candidate characteristic set;
The described depth convolution feature based on described image to be retrieved and local feature are concentrated in described candidate feature and are pressed second
Join rule to be mated, obtain target image characteristics as described testing result, comprising:
Each depth convolution feature that the depth convolution feature of described image to be retrieved is concentrated with described candidate feature is carried out
Join, and, the local feature of described image to be retrieved is mated with each local feature that described candidate feature is concentrated, will
Depth convolution feature is more than the characteristics of image of the second preset value as retrieval result with the comprehensive matching degree of corresponding local feature;
The target object information that then described output is associated with described retrieval result, particularly as follows:
Export described depth convolution feature and the comprehensive matching degree of corresponding local feature is more than the characteristics of image institute of the second preset value
The destination object image of association, described destination object image is to extract from the video frame image that described destination object is located in advance
Described destination object image.
7. method of video image processing according to claim 1 is it is characterised in that described video structural information includes:
The image density of each video frame image in described sequence of video images, described image density is used for characterizing described video frame image
The situation of middle destination object;
Then the movement locus based on described destination object and described destination object obtain described structured message and include:
By the movement locus of position in video frame image for the described destination object and described destination object determine described in regard
The structural information of monitor area in frequency image sequence, the structural information of described monitor area includes described destination object in described prison
The area information occurring in control region;
Determine the image density of each video frame image in described sequence of video images based on the structural information of described monitor area.
8. method of video image processing according to claim 7 is it is characterised in that described believed based on described video structural
Breath carries out video concentration to described sequence of video images, comprising:
Image density based on each video frame image described carries out segmentation to described sequence of video images, and from each segmentation
Determine video-frequency band to be concentrated;
Described video-frequency band to be concentrated is carried out with video concentration, and is regarded carrying out the video-frequency band after video concentration with other
The video-frequency band that frequency concentrates merges, and obtains the sequence of video images concentrating.
9. method of video image processing according to claim 8 it is characterised in that described based on described image density to institute
State sequence of video images and carry out video-frequency band to be concentrated in segmentation, and determination from each segmentation, comprising:
Image density threshold value set in advance is utilized by described sequence of video images by the image density of each video frame image
It is divided into multiple video-frequency bands, each described video-frequency band includes multiple continuous video frame images;
The video-frequency band that the image density of each video frame image is all higher than described image density threshold is defined as to be concentrated regarding
Frequency range.
10. method of video image processing according to claim 8 it is characterised in that described to described video to be concentrated
Duan Jinhang video concentrates, comprising:
Determined at least one destination object in described video-frequency band to be concentrated in time dimension and space by space-time concentration model
The optimum shift strategy of movement in dimension;
Image co-registration is carried out based on described optimum shift strategy, obtains the video-frequency band after concentrating.
A kind of 11. video image processing devices are it is characterised in that described device includes: video acquiring module, target recognition mould
Block, target tracking module, video structural data obtaining module and processing module;
Described video acquiring module, for obtaining sequence of video images;
Described target recognition module, for the video image from the described sequence of video images that described video acquiring module obtains
Destination object is identified in frame;
Described target tracking module, the described destination object for identifying to described target recognition module is tracked and determines
The movement locus of described destination object;
Described information acquisition module, the described destination object being identified based on described target recognition module and described target following mould
The movement locus of the described destination object that block determines obtain video structural information;
Described processing module, the described video structural information for being obtained based on described information acquisition module carries out destination object
Retrieve and/or video concentration is carried out to described sequence of video images.
12. video image processing devices according to claim 11 it is characterised in that described target recognition module, specifically
For identifying destination object based in each video frame image from described sequence of video images of depth convolutional neural networks.
13. video image processing devices according to claim 11 it is characterised in that described target tracking module, specifically
For based on the light flow point extracted on the described destination object from described video frame image, using lucas kanade optical flow method
Track algorithm is tracked to described destination object.
14. video image processing devices according to claim 11 are it is characterised in that described video structural packet
Include: the text message of destination object and/or image feature information, the text message of described destination object includes described destination object
Attribute information and movable information;
Described processing module, comprising: retrieval module and output module;
Described retrieval module, for when receiving the search instruction to text message to be retrieved, based on described text to be retrieved
Information is retrieved in the text message of described destination object, or, when receiving the search instruction to image to be retrieved, it is based on
Described image to be retrieved is retrieved in the image feature information of described destination object, or, when receiving, event to be retrieved is believed
During the search instruction of breath, based on described event to be retrieved and the event model that pre-builds is retrieved in described text message,
Obtain retrieval result;
Described output module, the target object information associating with the described retrieval result of described retrieval module for output.
15. video image processing devices according to claim 14 are it is characterised in that the characteristics of image of described destination object
Information includes depth convolution feature and local feature;
Described retrieval module includes: thick matching module and accurately mate module;
Described thick matching module, special in the image of described destination object for the depth convolution feature based on described image to be retrieved
Mated by the first matched rule in reference breath, obtained candidate characteristic set;
Described accurately mate module, for the depth convolution feature based on described image to be retrieved and local feature in described candidate
Mated by the second matched rule in feature set, obtained target image characteristics as described testing result.
16. video image processing devices according to claim 15 are it is characterised in that described thick matching module includes: special
Levy acquisition and process submodule and thick matched sub-block;
Described feature obtains and processes submodule, for obtaining depth convolution feature and the local feature of described image to be retrieved,
And the depth convolution feature of described image to be retrieved is carried out with binary-coding, the binary-coding obtaining described image to be retrieved is special
Levy, be additionally operable to carry out binary-coding respectively to each depth convolution feature in the image feature information of described destination object, obtain
Obtain and each the corresponding binary-coding feature of depth convolution feature in the image feature information of described destination object;
Described thick matched sub-block, for by the binary-coding feature of the described image to be retrieved figure with described destination object respectively
As each the corresponding binary-coding feature of depth convolution feature in characteristic information is mated, by with described image to be retrieved
The binary-coding feature that the matching degree of binary-coding feature is more than the first preset value is defined as target binary-coding feature, and will be with
The corresponding target depth convolution feature of described target code feature and target local feature are as candidate characteristic set;
Described accurately mate module, specifically for concentrating the depth convolution feature of described image to be retrieved with described candidate feature
Each depth convolution feature mated, and, the local feature of described image to be retrieved is concentrated with described candidate feature
Each local feature mated, the comprehensive matching degree of depth convolution feature and corresponding local feature is more than the second preset value
Characteristics of image as retrieval result;
Then described output module, is more than with the comprehensive matching degree of corresponding local feature specifically for exporting described depth convolution feature
Destination object image associated by the characteristics of image of the second preset value, described destination object image is in advance from described destination object
The image of the described destination object extracting in the video frame image being located.
17. video image processing devices according to claim 11 are it is characterised in that described video structural packet
Include: the image density of each video frame image in described sequence of video images, described image density is used for characterizing described video figure
Situation as destination object in frame;
Described information acquisition module includes: monitor area structure determination module and image density determining module;
Described monitor area structure determination submodule, for by position in video frame image for the described destination object and institute
The movement locus stating destination object determine the structural information of monitor area in described sequence of video images, the knot of described monitor area
Structure information includes the area information that described destination object occurs in described monitor area;
Described image density determination sub-module, for the described monitoring determined based on described monitor area structure determination submodule
The structural information in region determines the image density of each video frame image in described sequence of video images.
18. video image processing devices according to claim 17 are it is characterised in that described processing module includes: video
Pretreatment module and video concentrate module;
Described video pre-filtering module, for carrying out segmentation and from each based on described image density to described sequence of video images
Video-frequency band to be concentrated is determined in segmentation;
Described video concentrates module, for carrying out video concentration to described video-frequency band to be concentrated, and after carrying out video concentration
Video-frequency band do not carry out the video-frequency bands of video concentration with other and merge, obtain the sequence of video images concentrating.
19. video image processing devices according to claim 18, it is characterised in that described video pre-filtering module, wrap
Include: video segmentation submodule and video-frequency band determination sub-module to be concentrated;
Described video segmentation submodule, for utilizing image density set in advance by the image density of each video frame image
Described sequence of video images is divided into multiple video-frequency bands by threshold value;
Described video-frequency band determination sub-module to be concentrated, close for the image density of each video frame image is all higher than described image
The video-frequency band of degree threshold value is defined as video-frequency band to be concentrated.
20. video image processing devices according to claim 18 are it is characterised in that described video concentration module includes:
Optimum concentration strategy determination sub-module and image co-registration submodule;
Described optimum concentrate tactful determination sub-module, for by space-time concentration model determine by described video-frequency band to be concentrated extremely
The optimum shift strategy of few destination object movement on time dimension and Spatial Dimension;
Described image merges submodule, for carrying out image co-registration based on described optimum shift strategy, obtains the video after concentrating
Section.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610765659.5A CN106354816B (en) | 2016-08-30 | 2016-08-30 | video image processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610765659.5A CN106354816B (en) | 2016-08-30 | 2016-08-30 | video image processing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106354816A true CN106354816A (en) | 2017-01-25 |
CN106354816B CN106354816B (en) | 2019-12-13 |
Family
ID=57856028
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610765659.5A Active CN106354816B (en) | 2016-08-30 | 2016-08-30 | video image processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106354816B (en) |
Cited By (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107038713A (en) * | 2017-04-12 | 2017-08-11 | 南京航空航天大学 | A kind of moving target method for catching for merging optical flow method and neutral net |
CN107346415A (en) * | 2017-06-08 | 2017-11-14 | 小草数语(北京)科技有限公司 | Method of video image processing, device and monitoring device |
CN107506370A (en) * | 2017-07-07 | 2017-12-22 | 大圣科技股份有限公司 | Multi-medium data depth method for digging, storage medium and electronic equipment |
CN107633480A (en) * | 2017-09-14 | 2018-01-26 | 光锐恒宇(北京)科技有限公司 | A kind of image processing method and device |
CN107705324A (en) * | 2017-10-20 | 2018-02-16 | 中山大学 | A kind of video object detection method based on machine learning |
CN107730560A (en) * | 2017-10-17 | 2018-02-23 | 张家港全智电子科技有限公司 | A kind of target trajectory extracting method based on sequence of video images |
CN108304808A (en) * | 2018-02-06 | 2018-07-20 | 广东顺德西安交通大学研究院 | A kind of monitor video method for checking object based on space time information Yu depth network |
CN108664844A (en) * | 2017-03-28 | 2018-10-16 | 爱唯秀股份有限公司 | The image object semantics of convolution deep neural network identify and tracking |
CN108875517A (en) * | 2017-12-15 | 2018-11-23 | 北京旷视科技有限公司 | Method for processing video frequency, device and system and storage medium |
CN109215055A (en) * | 2017-06-30 | 2019-01-15 | 杭州海康威视数字技术股份有限公司 | A kind of target's feature-extraction method, apparatus and application system |
CN109508408A (en) * | 2018-10-25 | 2019-03-22 | 北京陌上花科技有限公司 | A kind of video retrieval method and computer readable storage medium based on frame density |
CN109657546A (en) * | 2018-11-12 | 2019-04-19 | 平安科技(深圳)有限公司 | Video behavior recognition methods neural network based and terminal device |
CN109803067A (en) * | 2017-11-16 | 2019-05-24 | 富士通株式会社 | Video concentration method, video enrichment facility and electronic equipment |
CN109977816A (en) * | 2019-03-13 | 2019-07-05 | 联想(北京)有限公司 | A kind of information processing method, device, terminal and storage medium |
CN109993032A (en) * | 2017-12-29 | 2019-07-09 | 杭州海康威视数字技术股份有限公司 | A kind of shared bicycle target identification method, device and camera |
CN110008859A (en) * | 2019-03-20 | 2019-07-12 | 北京迈格威科技有限公司 | The dog of view-based access control model only recognition methods and device again |
CN110175263A (en) * | 2019-04-08 | 2019-08-27 | 浙江大华技术股份有限公司 | A kind of method of positioning video frame, the method and terminal device for saving video |
CN110188617A (en) * | 2019-05-05 | 2019-08-30 | 深圳供电局有限公司 | A kind of machine room intelligent monitoring method and system |
CN110225310A (en) * | 2019-06-24 | 2019-09-10 | 浙江大华技术股份有限公司 | Computer readable storage medium, the display methods of video and device |
CN110264496A (en) * | 2019-06-03 | 2019-09-20 | 深圳市恩钛控股有限公司 | Video structural processing system and method |
CN110363171A (en) * | 2019-07-22 | 2019-10-22 | 北京百度网讯科技有限公司 | The method of the training method and identification sky areas of sky areas prediction model |
CN110659384A (en) * | 2018-06-13 | 2020-01-07 | 杭州海康威视数字技术股份有限公司 | Video structured analysis method and device |
CN110751065A (en) * | 2019-09-30 | 2020-02-04 | 北京旷视科技有限公司 | Training data acquisition method and device |
WO2020038243A1 (en) * | 2018-08-21 | 2020-02-27 | 腾讯科技(深圳)有限公司 | Video abstract generating method and apparatus, computing device, and storage medium |
CN111435370A (en) * | 2019-01-11 | 2020-07-21 | 富士通株式会社 | Information processing apparatus, method, and machine-readable storage medium |
CN111696136A (en) * | 2020-06-09 | 2020-09-22 | 电子科技大学 | Target tracking method based on coding and decoding structure |
CN111898416A (en) * | 2020-06-17 | 2020-11-06 | 绍兴埃瓦科技有限公司 | Video stream processing method and device, computer equipment and storage medium |
CN112036306A (en) * | 2020-08-31 | 2020-12-04 | 公安部第三研究所 | System and method for realizing target tracking based on monitoring video analysis |
CN112422898A (en) * | 2020-10-27 | 2021-02-26 | 中电鸿信信息科技有限公司 | Video concentration method introducing deep behavior understanding |
CN112579811A (en) * | 2020-12-11 | 2021-03-30 | 公安部第三研究所 | Target image retrieval and identification system, method, device, processor and computer-readable storage medium for video detection |
CN113365104A (en) * | 2021-06-04 | 2021-09-07 | 中国建设银行股份有限公司 | Video concentration method and device |
CN113515649A (en) * | 2020-11-19 | 2021-10-19 | 阿里巴巴集团控股有限公司 | Data structuring method, system, device, equipment and storage medium |
CN113641852A (en) * | 2021-07-13 | 2021-11-12 | 彩虹无人机科技有限公司 | Unmanned aerial vehicle photoelectric video target retrieval method, electronic device and medium |
CN113792150A (en) * | 2021-11-15 | 2021-12-14 | 湖南科德信息咨询集团有限公司 | Man-machine cooperative intelligent demand identification method and system |
CN113949823A (en) * | 2021-09-30 | 2022-01-18 | 广西中科曙光云计算有限公司 | Video concentration method and device |
CN114998810A (en) * | 2022-07-11 | 2022-09-02 | 北京烽火万家科技有限公司 | AI video deep learning system based on neural network |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102930061A (en) * | 2012-11-28 | 2013-02-13 | 安徽水天信息科技有限公司 | Video abstraction method and system based on moving target detection |
US8582807B2 (en) * | 2010-03-15 | 2013-11-12 | Nec Laboratories America, Inc. | Systems and methods for determining personal characteristics |
CN105512684A (en) * | 2015-12-09 | 2016-04-20 | 江苏大为科技股份有限公司 | Vehicle logo automatic identification method based on principal component analysis convolutional neural network |
-
2016
- 2016-08-30 CN CN201610765659.5A patent/CN106354816B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8582807B2 (en) * | 2010-03-15 | 2013-11-12 | Nec Laboratories America, Inc. | Systems and methods for determining personal characteristics |
CN102930061A (en) * | 2012-11-28 | 2013-02-13 | 安徽水天信息科技有限公司 | Video abstraction method and system based on moving target detection |
CN105512684A (en) * | 2015-12-09 | 2016-04-20 | 江苏大为科技股份有限公司 | Vehicle logo automatic identification method based on principal component analysis convolutional neural network |
Non-Patent Citations (1)
Title |
---|
王浩: "《计算机信息检索》", 30 November 2001, 西北大学出版社 * |
Cited By (44)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108664844A (en) * | 2017-03-28 | 2018-10-16 | 爱唯秀股份有限公司 | The image object semantics of convolution deep neural network identify and tracking |
CN107038713A (en) * | 2017-04-12 | 2017-08-11 | 南京航空航天大学 | A kind of moving target method for catching for merging optical flow method and neutral net |
CN107346415A (en) * | 2017-06-08 | 2017-11-14 | 小草数语(北京)科技有限公司 | Method of video image processing, device and monitoring device |
US11398084B2 (en) | 2017-06-30 | 2022-07-26 | Hangzhou Hikvision Digital Technology Co., Ltd. | Method, apparatus and application system for extracting a target feature |
CN109215055A (en) * | 2017-06-30 | 2019-01-15 | 杭州海康威视数字技术股份有限公司 | A kind of target's feature-extraction method, apparatus and application system |
CN107506370A (en) * | 2017-07-07 | 2017-12-22 | 大圣科技股份有限公司 | Multi-medium data depth method for digging, storage medium and electronic equipment |
CN107633480A (en) * | 2017-09-14 | 2018-01-26 | 光锐恒宇(北京)科技有限公司 | A kind of image processing method and device |
CN107730560A (en) * | 2017-10-17 | 2018-02-23 | 张家港全智电子科技有限公司 | A kind of target trajectory extracting method based on sequence of video images |
CN107705324A (en) * | 2017-10-20 | 2018-02-16 | 中山大学 | A kind of video object detection method based on machine learning |
CN109803067A (en) * | 2017-11-16 | 2019-05-24 | 富士通株式会社 | Video concentration method, video enrichment facility and electronic equipment |
CN108875517A (en) * | 2017-12-15 | 2018-11-23 | 北京旷视科技有限公司 | Method for processing video frequency, device and system and storage medium |
CN109993032A (en) * | 2017-12-29 | 2019-07-09 | 杭州海康威视数字技术股份有限公司 | A kind of shared bicycle target identification method, device and camera |
CN109993032B (en) * | 2017-12-29 | 2021-09-17 | 杭州海康威视数字技术股份有限公司 | Shared bicycle target identification method and device and camera |
CN108304808B (en) * | 2018-02-06 | 2021-08-17 | 广东顺德西安交通大学研究院 | Monitoring video object detection method based on temporal-spatial information and deep network |
CN108304808A (en) * | 2018-02-06 | 2018-07-20 | 广东顺德西安交通大学研究院 | A kind of monitor video method for checking object based on space time information Yu depth network |
CN110659384A (en) * | 2018-06-13 | 2020-01-07 | 杭州海康威视数字技术股份有限公司 | Video structured analysis method and device |
WO2020038243A1 (en) * | 2018-08-21 | 2020-02-27 | 腾讯科技(深圳)有限公司 | Video abstract generating method and apparatus, computing device, and storage medium |
US11347792B2 (en) | 2018-08-21 | 2022-05-31 | Tencent Technology (Shenzhen) Company Limited | Video abstract generating method, apparatus, and storage medium |
CN109508408A (en) * | 2018-10-25 | 2019-03-22 | 北京陌上花科技有限公司 | A kind of video retrieval method and computer readable storage medium based on frame density |
CN109657546A (en) * | 2018-11-12 | 2019-04-19 | 平安科技(深圳)有限公司 | Video behavior recognition methods neural network based and terminal device |
CN111435370A (en) * | 2019-01-11 | 2020-07-21 | 富士通株式会社 | Information processing apparatus, method, and machine-readable storage medium |
CN109977816A (en) * | 2019-03-13 | 2019-07-05 | 联想(北京)有限公司 | A kind of information processing method, device, terminal and storage medium |
CN110008859A (en) * | 2019-03-20 | 2019-07-12 | 北京迈格威科技有限公司 | The dog of view-based access control model only recognition methods and device again |
CN110175263A (en) * | 2019-04-08 | 2019-08-27 | 浙江大华技术股份有限公司 | A kind of method of positioning video frame, the method and terminal device for saving video |
CN110188617A (en) * | 2019-05-05 | 2019-08-30 | 深圳供电局有限公司 | A kind of machine room intelligent monitoring method and system |
CN110264496A (en) * | 2019-06-03 | 2019-09-20 | 深圳市恩钛控股有限公司 | Video structural processing system and method |
CN110225310A (en) * | 2019-06-24 | 2019-09-10 | 浙江大华技术股份有限公司 | Computer readable storage medium, the display methods of video and device |
CN110363171A (en) * | 2019-07-22 | 2019-10-22 | 北京百度网讯科技有限公司 | The method of the training method and identification sky areas of sky areas prediction model |
CN110751065A (en) * | 2019-09-30 | 2020-02-04 | 北京旷视科技有限公司 | Training data acquisition method and device |
CN111696136A (en) * | 2020-06-09 | 2020-09-22 | 电子科技大学 | Target tracking method based on coding and decoding structure |
CN111898416A (en) * | 2020-06-17 | 2020-11-06 | 绍兴埃瓦科技有限公司 | Video stream processing method and device, computer equipment and storage medium |
CN112036306A (en) * | 2020-08-31 | 2020-12-04 | 公安部第三研究所 | System and method for realizing target tracking based on monitoring video analysis |
CN112422898B (en) * | 2020-10-27 | 2022-06-17 | 中电鸿信信息科技有限公司 | Video concentration method introducing deep behavior understanding |
CN112422898A (en) * | 2020-10-27 | 2021-02-26 | 中电鸿信信息科技有限公司 | Video concentration method introducing deep behavior understanding |
CN113515649A (en) * | 2020-11-19 | 2021-10-19 | 阿里巴巴集团控股有限公司 | Data structuring method, system, device, equipment and storage medium |
CN113515649B (en) * | 2020-11-19 | 2024-03-01 | 阿里巴巴集团控股有限公司 | Data structuring method, system, device, equipment and storage medium |
CN112579811A (en) * | 2020-12-11 | 2021-03-30 | 公安部第三研究所 | Target image retrieval and identification system, method, device, processor and computer-readable storage medium for video detection |
CN113365104A (en) * | 2021-06-04 | 2021-09-07 | 中国建设银行股份有限公司 | Video concentration method and device |
CN113365104B (en) * | 2021-06-04 | 2022-09-09 | 中国建设银行股份有限公司 | Video concentration method and device |
CN113641852A (en) * | 2021-07-13 | 2021-11-12 | 彩虹无人机科技有限公司 | Unmanned aerial vehicle photoelectric video target retrieval method, electronic device and medium |
CN113949823A (en) * | 2021-09-30 | 2022-01-18 | 广西中科曙光云计算有限公司 | Video concentration method and device |
CN113792150A (en) * | 2021-11-15 | 2021-12-14 | 湖南科德信息咨询集团有限公司 | Man-machine cooperative intelligent demand identification method and system |
CN113792150B (en) * | 2021-11-15 | 2022-02-11 | 湖南科德信息咨询集团有限公司 | Man-machine cooperative intelligent demand identification method and system |
CN114998810A (en) * | 2022-07-11 | 2022-09-02 | 北京烽火万家科技有限公司 | AI video deep learning system based on neural network |
Also Published As
Publication number | Publication date |
---|---|
CN106354816B (en) | 2019-12-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106354816A (en) | Video image processing method and video image processing device | |
Wang et al. | Detection and localization of image forgeries using improved mask regional convolutional neural network | |
Zapletal et al. | Vehicle re-identification for automatic video traffic surveillance | |
CN108334881B (en) | License plate recognition method based on deep learning | |
Kuo et al. | Deep aggregation net for land cover classification | |
CN109063649B (en) | Pedestrian re-identification method based on twin pedestrian alignment residual error network | |
Nguyen et al. | Anomaly detection in traffic surveillance videos with gan-based future frame prediction | |
CN103136516A (en) | Face recognition method and system fusing visible light and near-infrared information | |
CN110751018A (en) | Group pedestrian re-identification method based on mixed attention mechanism | |
Passos et al. | A review of deep learning‐based approaches for deepfake content detection | |
Farag | A lightweight vehicle detection and tracking technique for advanced driving assistance systems | |
CN111833380B (en) | Multi-view image fusion space target tracking system and method | |
Zhu et al. | Towards automatic wild animal detection in low quality camera-trap images using two-channeled perceiving residual pyramid networks | |
Yu et al. | Manipulation classification for jpeg images using multi-domain features | |
CN107092935A (en) | A kind of assets alteration detection method | |
Hu et al. | Spatial-temporal fusion convolutional neural network for simulated driving behavior recognition | |
Shahbaz et al. | Deep atrous spatial features-based supervised foreground detection algorithm for industrial surveillance systems | |
Xia et al. | Abnormal event detection method in surveillance video based on temporal CNN and sparse optical flow | |
CN112395953A (en) | Road surface foreign matter detection system | |
CN105760885A (en) | Bloody image detection classifier implementing method, bloody image detection method and bloody image detection system | |
Zou et al. | Deep learning-based pavement cracks detection via wireless visible light camera-based network | |
Sun et al. | NTT_CQUPT@ TRECVID2019 ActEV: Activities in Extended Video. | |
Vijay Gopal et al. | A deep learning approach to image splicing using depth map | |
CN112184566A (en) | Image processing method and system for removing attached water mist droplets | |
Sahoo et al. | Depth estimated history image based appearance representation for human action recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |