CN102932605A - Method for selecting camera combination in visual perception network - Google Patents
Method for selecting camera combination in visual perception network Download PDFInfo
- Publication number
- CN102932605A CN102932605A CN2012104884341A CN201210488434A CN102932605A CN 102932605 A CN102932605 A CN 102932605A CN 2012104884341 A CN2012104884341 A CN 2012104884341A CN 201210488434 A CN201210488434 A CN 201210488434A CN 102932605 A CN102932605 A CN 102932605A
- Authority
- CN
- China
- Prior art keywords
- video camera
- histogram
- video
- image
- candidate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 25
- 230000016776 visual perception Effects 0.000 title abstract 2
- 230000000007 visual effect Effects 0.000 claims abstract description 92
- 230000004438 eyesight Effects 0.000 claims abstract description 72
- 230000033001 locomotion Effects 0.000 claims abstract description 37
- 238000012549 training Methods 0.000 claims abstract description 29
- 238000001514 detection method Methods 0.000 claims abstract description 18
- 238000000605 extraction Methods 0.000 claims abstract description 9
- 239000013598 vector Substances 0.000 claims description 67
- 239000000284 extract Substances 0.000 claims description 36
- 239000004744 fabric Substances 0.000 claims description 15
- 238000012360 testing method Methods 0.000 claims description 14
- 238000005070 sampling Methods 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 9
- 238000010606 normalization Methods 0.000 claims description 5
- 238000001914 filtration Methods 0.000 claims description 4
- 230000008859 change Effects 0.000 claims description 2
- 238000012544 monitoring process Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 230000003993 interaction Effects 0.000 description 5
- 230000003044 adaptive effect Effects 0.000 description 4
- 238000005286 illumination Methods 0.000 description 4
- 230000000903 blocking effect Effects 0.000 description 3
- 230000008030 elimination Effects 0.000 description 3
- 238000003379 elimination reaction Methods 0.000 description 3
- 238000012552 review Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 241000590419 Polygonia interrogationis Species 0.000 description 1
- 230000004308 accommodation Effects 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000013138 pruning Methods 0.000 description 1
- UBCKGWBNUIFUST-YHYXMXQVSA-N tetrachlorvinphos Chemical compound COP(=O)(OC)O\C(=C/Cl)C1=CC(Cl)=C(Cl)C=C1Cl UBCKGWBNUIFUST-YHYXMXQVSA-N 0.000 description 1
Images
Landscapes
- Image Analysis (AREA)
- Studio Devices (AREA)
Abstract
The invention discloses a method for selecting camera combination in a visual perception network. The method comprises the following steps: on-line generation of a target image visual histogram: in the case that the vision fields of a plurality of cameras overlap each other, performing motion detection on online obtained video data of multiple cameras observing the same object, determining the subregion of the object in a video frame image space according to the detection result, to obtain a target image region; performing local feature extraction on the target image region, and calculating the visual histogram of the target image region at the visual angle according to a visual dictionary generated by pre-training; and sequential forward camera selection: selecting an optimal visual angle, that is, the optimal camera, in the set of unselected cameras, selecting a secondary optimal camera, adding the secondary optimal camera to the set of selected cameras, removing the secondary optimal camera from the set of candidate cameras, and repeating the steps until the count of the selected cameras reaches the count of needed cameras.
Description
Technical field
The present invention relates to the video camera system of selection, belong to computer vision and video data processing technology field, the specifically combination system of selection of video camera in a kind of visually-perceptible network.
Background technology
In recent years and since video camera be widely used in the fields such as security monitoring, man-machine interaction, navigator fix, battlefield surroundings perception, multi-camera system becomes one of study hotspot of computer vision and application thereof.Especially in using based on the monitoring of video and man-machine interaction etc., the visually-perceptible network VSN (VisualSensor Network) that is comprised of a plurality of video cameras can effectively solve in the target observation process that single camera exists from the problem such as blocking, but also produced bulk redundancy information, increased the burden of system storage, vision calculating and Internet Transmission.Therefore, how from multi-channel video, to choose and to push the video of rich amount of information, just become one of key issue of visually-perceptible network and application thereof.Select question marks seemingly with the video camera based on video data, in the graphics field, select problem also to carry out broad research for the visual angle of three-dimension model observation, such as document 1Vazquez P, Sbert M.Fast adaptive selection ofbest views.Lecture Notes in Computer Science, 2003, among the 2669:295 – 305 known geometrical model is asked for viewpoint entropy under the different visual angles, and according to its optimum visual angle of size selection, but different from video camera selection problem is, the former requires to obtain in advance the accurate model definition of observed object, and model mostly makes up in the special pattern environment thereby analytic process does not need to consider the factor affecting such as background and illumination.On the other hand, general sensor network nodes is selected problem such as document 2Mo Y., AmbrosinoR., and Bruno Sinopoli.Sensor Selection Strategies for StateEstimation in Energy Constrained Wireless Sensor Networks.Automatica, 2011,47 (7): 1330-1338 and document 3Huber, M.F.Optimal pruning for multi-Step sensor scheduling.IEEETransactions on automatic control.2012, the foundation that all adopts the position between target being observed and the transducer to select as sensor node among 57 (5): 1338 – 1343, and video camera perception environment has directivity, can not simply select optimum video camera according to the position relationship of target and camera node, more wish to see people's direct picture rather than figure viewed from behind image closely during for example security monitoring is used.
Existing video camera system of selection can be divided in the wide area without the overlapping video camera system of selection of the ken and the video camera system of selection with part or all of overlapping ken according to camera node ken overlapped coverage situation in the visually-perceptible network.Wherein continue the demands such as tracking without the overlapping video camera system of selection of the ken for the target in realizing on a large scale, according to the prediction to target travel the node in the camera network of disperseing to lay is selected; The present invention is for satisfying the application demands such as security monitoring and man-machine interaction, and main research has the video camera system of selection of observing same target in the part or all of ken overlapping range.In these class methods, can be divided into again single camera system of selection and camera chain system of selection two classes according to the quantity of on-camera.Wherein the single camera system of selection is on specific select time point, only select an optimum visual angle as output according to the selection criterion that proposes, at this moment, the design of choice criteria is the key that the visual information amount evaluation criterion becomes the selection of video camera, and need not to consider the separately similitude between the capturing information between video camera and the video camera.Design aspect in choice criteria, usually can be divided into based on the selection of video image content with based on target in the video in objective world spatial relation two classes, such as document 4Daniyal F., Taj M., Cavallaro A Contentand task-based view selection from multiple video streams.Multimedia Tools andApplications, 2010, extract what that move in the video product among the 46:235 – 258, the type of target, whether the video features such as shooting event occur in size and position and the video, realize the selection of content-based video camera according to the contextual information of feature, the video image content that these class methods are only obtained video camera carries out feature extraction and scoring is compared, and does not need the content of each node perceived in the camera network is carried out similarity measurement.The method of video camera being selected based on the spatial relation of target in the video, such as document 5Park J, Bhat C, Kak AC.A look-up table basedapproach for solving the camera selection problem in large camera networks.ACMWorkshop on Distributed Smart Cameras, in 2006 the space in the video camera angular field of view is created as a corresponding video camera look-up table, in the video camera selection course according to the spatial relation of target and each video camera, the nearest camera node of chosen distance in table, this class methods prerequisite is that video camera must be processed through accurate camera calibration in the scene, otherwise can't from the video image of each video camera, obtain the accurate locus of target, simultaneously the method is not considered the orientation information of target in scene, target acquisition direct picture all the time when being applied to the field such as security monitoring.The said method selection result all only has an optimum video camera, does not consider that the compound mode by a plurality of video cameras remedies mutually the each other restriction at visual angle in the selection result, thereby can not investigate information similarity degree and redundant situation between the video camera.
Under certain resource and treatment conditions allow, compare with the single camera system of selection by the method for selecting a plurality of video cameras to form camera chain, can be by problems such as a plurality of visual angles certainly blocking of increasing that information sources overcome effectively that the latter occurs and blind areas.Although can form camera chain by the mode of selecting one by one optimum visual angle, but there is the content similitude thereby has in various degree data redundancy because the video camera of different angles obtains video image, generally speaking, the selection result that each selected optimum video camera forms not is optimum camera chain, although for example two video camera amount of information with regard to single that photograph simultaneously the target front are all larger, contained Global Information amount is not so good as the visual information amount of a front and a side target that video camera obtains usually.Up to the present existing camera chain system of selection is studied relatively less.
Summary of the invention
Goal of the invention: technical problem to be solved by this invention is for the deficiencies in the prior art, has proposed the combination system of selection of video camera in a kind of visually-perceptible network.
Technical scheme: the combination system of selection of video camera in a kind of visually-perceptible network disclosed by the invention may further comprise the steps:
Step 2, sequential forward direction video camera is selected: on each time point, select an optimum visual angle, i.e. optimum video camera; In non-selected video camera set, the information gain of target image and candidate's video camera are to selecting the mutual information of video camera set in the vision histogram calculation candidate camera video that calculates according to step 2, selection is large and be the less suboptimum video camera of mutual information with selecting camera review content similarity less to target observed information gain, the video camera set has been selected in its adding, and from the set of candidate's video camera, rejected; Constantly repeat above-mentioned steps until selected video camera counting reaches preset value.
Visual dictionary: because the constant transform characteristics SIFT of yardstick (Scale Invariant Feature Transform, SIFT) can overcome the impacts such as illumination that different cameras produces and convergent-divergent preferably, so the present invention with it as the visual dictionary lemma.To input as the two field picture in the multi-path video data of training data, at first extract the constant transform characteristics SIFT of yardstick (Scale Invariant Feature Transform, SIFT) the local feature description subvector set of every width of cloth image; The k-mean cluster is carried out in the SIFT Feature Descriptor set that all images extract; Each cluster centre is regarded as a visual word, and the set of the visual word that obtains consists of the visual dictionary of off-line training, specifically may further comprise the steps:
Extract image SIFT Feature Descriptor vector: to every frame input video two field picture, adopt respectively Gauss's template that image is carried out filtering and ask for x and y direction gradient component I
xAnd I
y, and with this calculating pixel point gradient magnitude and direction
θ (x, y)=arctan (I
y, I
x); From the image upper left corner, get the window of 16 * 16 sizes as the feature extraction sampling window in 8 pixels of image x and the every interval of y direction, window is divided into 4 * 4 square nets zone, sampled point in each zone is calculated respectively gradient relative direction with the sampling window center, the gradient magnitude of sampled point is included into respectively gradient orientation histogram on interior 8 directions in zone after by distance Gauss weighting, the characteristic vectors that it is 128 dimensions that each sampling window generates one 4 * 4 * 8 dimension are carried out normalization to the gained characteristic vector and are formed window local feature description subvector; The descriptor vector that every width of cloth image calculation is obtained adds Feature Descriptor set F={f
(1), f
(2), f
(3)... f
(t), f
(i)∈ R
128, 1≤i≤t, wherein f
(i)For this width of cloth image feature descriptor is gathered i descriptor vector, R
128Represent that this vector dimension is 128 dimensions, the Feature Descriptor sum that t extracts for this width of cloth image;
Characteristic vector is carried out the k-mean cluster: the SIFT Feature Descriptor vector that two field picture extracts is gathered F, choose at random in the set k vector and be initial cluster center, after all characteristic vectors are carried out clustering by cluster centre, recomputate new cluster centre, constantly iteration until reach the iterations restriction or the cluster centre change of distance less than certain threshold value, setting of the present invention when iterations reach 50 ~ 200 times or the cluster centre distance less than 0.02 as stopping iterated conditional.
Visual dictionary consists of: each cluster centre is considered as a visual word, obtains and store the set of visual word, consist of the visual dictionary of off-line training.
The online generation of step 1 target image vision histogram of the present invention specifically may further comprise the steps:
Step 11, Video Motion Detection: the video data to each video camera input adopts respectively mixed Gauss model to carry out Video Motion Detection, every frame testing result is eliminated the shade that is produced by target based on texture method in scene, extract moving target in the zone of image space.
Step 12, area image local feature description extracts: the motion target area image that step 11 is extracted extracts the set of SIFT Feature Descriptor vector;
Step 13, the vision histogram generates: the cluster centre of the visual dictionary that generates with training in advance is as the histogram bucket, the motion target area image SIFT Feature Descriptor vector that step 12 is extracted incorporates in the corresponding bucket of histogram, add up respectively descriptor vector number in each barrel, to the histogram normalized, generate thus the vision histogram of moving target under a plurality of visual angles at last.
Step 2 of the present invention specifically may further comprise the steps:
Step 21, initialization is selected: have video camera set C={c in the scene
1, c
2... c
mObserving simultaneously moving target, m is the sum of video camera, selecteed video camera set
Candidate's video camera set C
u=C merges the SIFT Feature Descriptor vector set of all candidate's video cameras, the vision histogram H after merging by step 13 generation
Merge
Step 22, optimum video camera is selected: from candidate's camera chain C
uIn optimum video camera c of Standard Selection such as comprehensive people's face testing result, the gain of motion target area image information and definition
*, it is added in the selected video camera set, i.e. C
s={ c
*, from the set of candidate's video camera, reject simultaneously, i.e. C
u=C
u{ c
*, C
u=C
u{ c
*, the selected video camera counting of initial setting up count value is 1; Its concrete steps are:
Step 221, people's face detects: utilize AdaBoost(Adaptive Boosting, improved Weak Classifier algorithm, Adaboost is a kind of iterative algorithm, its core concept is for the different grader (Weak Classifier) of same training set training, then these Weak Classifiers are gathered, consist of a stronger final grader (strong classifier).The Adaboost algorithm is improved Boosting algorithm, and the mistake of the Weak Classifier that it can obtain weak study is carried out accommodation.) human-face detector carries out people's face to candidate's video camera c' motion target area image and detect, testing result is V
Face=0,1}, 1 expression detects people's face, otherwise is 0;
Step 222, the gain of motion target area image information is calculated: to the vision histogram H of video camera c'
C 'With merging after-vision histogram H
MergeCalculate visual information amount information gain V when selecting video camera c' and not selecting this video camera
IG, namely
Wherein: p (b
k, c') for selecting video camera c' and vision histogram H
MergeIn k the bucket joint probability,
Video camera c' and vision histogram H are not selected in expression
MergeIn the joint probability of k histogram bucket, p (b
k) be k the bucket probability, p (c') and
The probability of video camera c' is selected and is not selected in expression respectively, all according to histogram H
C 'And H
MergeCalculate;
Step 223, target image sharpness computation: to the gradient of target area image
Calculating average gradient size
Characterize the readability of target in image.
Step 224, optimum video camera is selected: establish weight coefficient α
1, α
2, α
3, α
1+ α
2+ α
3=1, select video camera c
*As optimum video camera, make
It is 1 that selected video camera counting count value is set;
Step 23, the suboptimum video camera is selected: to selecting the video camera set
The information gain of each iterative computation candidate video camera and to selecting the vision histogram mutual information of video camera is selected suboptimum video camera c
*, join and select the video camera set and gather C from candidate's video camera
uMiddle rejecting, and increase selected video camera counting, i.e. count=count+1;
Step 231, target area image information gain IG among the method calculated candidate video camera c' of employing step 222
C '
Step 232, candidate's video camera is in selecting the video camera mutual information to calculate: the vision histogram H of calculated candidate video camera target area image c'
C'With video camera c in the selected video camera set
j, c
j∈ C
sThe vision histogram
Between mutual information MI (c', c
j) target area image vision content similarity degree between two video cameras of expression:
Wherein,
Be histogram H
C'X bucket,
Be histogram
Y bucket, n
C 'Vision histogram H for candidate's video camera c'
C'The sum of bucket,
For selecting video camera c
jThe vision histogram
The sum of bucket;
Step 233, the suboptimum video camera is selected: establish weight coefficient β, video camera c ' is selected in 0≤β≤1
*As optimum video camera, make
Step 234 is to the suboptimum video camera c that chooses
*, video camera set C has been selected in its adding
s=C
s∪ c
*, from the set of candidate's video camera, reject this video camera C simultaneously
u=C
u{ c
*, and increase selected video camera counting, i.e. count=count+1.
Step 24, repeating step 23, until selected video camera counting count reaches predefined video camera counting n, n gets natural number, sets according to concrete needs.
Beneficial effect: for solve in the application such as target monitoring and man-machine interaction single camera owing to target from blocking the loss of learning that produces, and use simultaneously a plurality of video cameras to have the problem of bulk redundancy information, the present invention is directed to the system of selection that application demand discloses a kind of camera chain, on select time point, select by gradual camera node, pick out the minimum camera chain of the richest amount of information and redundant information, namely from candidate's video camera of m the same target of observation, choose n video camera (n<m), calculate to satisfy, the constraints of storage and network capacity.
Particularly the present invention compares with existing method and has the following advantages: 1, to having the visually-perceptible network of common FOV, under calculating, storage capacity confined condition, select a plurality of video cameras to form camera chain, the problem such as certainly block that efficiently solves that single camera selects to bring has reduced the information redundancy problem of using all video cameras to bring simultaneously; 2, carry out optimum video camera in conjunction with the gain of recognition of face, video camera information and image definition and select, guarantee positive, have higher level of detail and the video camera of target image is selected more clearly; 3, from candidate's video camera, progressively select the suboptimum video camera with sequential progression, both considered the contribute information of video camera to observed target, adopt again the information redundancy degree between the mutual information form minimizing different cameras, avoid producing the angle same problem of selecting a plurality of optimum video cameras to bring easily under the same standard; 4, off-line learning visual dictionary and construct vision histogram under the different visual angles makes and the target of a plurality of video cameras of same target is observed image sets up related; 5, choose SIFT local feature description as the vision lemma, effectively reduced the impact of the factors such as convergent-divergent, illumination and visual angle in the different cameras.
Description of drawings
Below in conjunction with the drawings and specific embodiments the present invention is done further to specify, above-mentioned and/or otherwise advantage of the present invention will become apparent.
Fig. 1 is handling process schematic diagram of the present invention.
Fig. 2 a ~ 2d is an embodiment video camera the 180th frame video frame images.
Fig. 3 a ~ 3d is for carrying out the moving image after step 11 motion detection is processed to Fig. 2 a ~ 2d.
Fig. 4 a ~ 4d processes the rear image that obtains target according to motion detection result for Fig. 2 a ~ 2d being carried out step 11.
Fig. 5 a ~ Fig. 5 d is that Fig. 4 a ~ 4d carries out the vision histogram that obtains after step 13 is processed.
Fig. 6 a ~ Fig. 6 h is 8 camera video two field pictures of second embodiment the 62nd frame.
Fig. 7 a~Fig. 7 h is that Fig. 6 a~Fig. 6 h carries out the image that step 11 is processed rear moving target.
Embodiment:
The invention discloses a kind of video camera system of selection according to information redundancy between video camera information gain and the video camera, may further comprise the steps:
Step 11, Video Motion Detection: the video data to each video camera input adopts respectively mixed Gauss model to carry out Video Motion Detection, every frame testing result is eliminated the shade that is produced by target based on texture method in scene, extract moving target in the zone of image space.
Step 12, area image local feature description extracts: the motion target area image that step 11 is extracted extracts the set of SIFT Feature Descriptor vector;
Step 13, the vision histogram generates: the cluster centre that obtains with calculated off-line (among the present invention is divided into feature space several little intervals as the histogram bucket, each interval is a histogram bucket), the motion target area image SIFT Feature Descriptor vector that step 12 is extracted incorporates in the corresponding bucket, add up respectively descriptor vector number in each barrel, to the histogram normalized, generate thus the vision histogram of moving target under a plurality of visual angles at last.
Step 2, sequential forward direction video camera is selected: select an optimum visual angle, i.e. optimum video camera; In non-selected video camera set, the information gain of target image and candidate's video camera are to selecting the mutual information of video camera set in the vision histogram calculation candidate camera video that calculates according to step 1, selection is large and be the less suboptimum video camera of mutual information with selecting the camera review similarity less to target observed information gain, the video camera set has been selected in its adding, and from the set of candidate's video camera, rejected; Constantly repeat above-mentioned steps until selected video camera counting reaches preset value.
Step 21, initialization is selected: have video camera set C={c in the visually-perceptible network scenarios
1, c
2... c
mObserving simultaneously moving target, m is the sum of video camera, has selected the video camera set
Candidate's video camera set C
u=C merges the constant transform characteristics descriptor vector of the yardstick set of all candidate's video cameras, generates the vision histogram H after merging
Merge
Step 22, optimum video camera is selected: from candidate's camera chain C
uIn optimum video camera c of Standard Selection such as comprehensive people's face testing result, the gain of motion target area image information and definition
*, it is added in the selected video camera set, i.e. C
s={ c
*, from the set of candidate's video camera, reject simultaneously, i.e. C
u=C
u{ c
*, the selected video camera counting of initial setting up count value is 1, its concrete steps are:
Step 221, people's face detect: utilize AdaBoost(Adaptive Boosting, improved Weak Classifier algorithm) human-face detector carries out people's face to candidate's video camera c' motion target area image and detects, and testing result is V
Face=0,1}, 1 expression detects people's face, otherwise is 0;
Step 222, the gain of motion target area image information is calculated: to the vision histogram H of video camera c'
C 'With merging after-vision histogram H
MergeCalculate visual information amount information gain V when selecting video camera c' and not selecting this video camera
IG, namely
P (b wherein
k, c') for selecting video camera c' and vision histogram H
MergeIn k the bucket joint probability,
Video camera c' and vision histogram H are not selected in expression
MergeIn the joint probability of k histogram bucket, p (b
k) be k the bucket probability, p (c') and
The probability of video camera c' is selected and is not selected in expression respectively, all according to histogram H
C 'And H
MergeCalculate;
Step 223, target image sharpness computation: to the gradient of target area image
Calculating average gradient size
Characterize the readability of target in image.
Step 224, optimum video camera is selected: establish weight coefficient α
1, α
2, α
3, α
1+ α
2+ α
3=1, select video camera c
*As optimum video camera, make
It is 1 that selected video camera counting count value is set;
Step 23, the suboptimum video camera is selected: to selecting the video camera set
The information gain of each iterative computation candidate video camera and to selecting the vision histogram mutual information of video camera is selected suboptimum video camera c
*, join and select the video camera set and gather C from candidate's video camera
uMiddle rejecting, and increase selected video camera counting, i.e. count=count+1;
Step 231, target area image information gain IG among the method calculated candidate video camera c' of employing step 222
C '
Step 232, candidate's video camera is in selecting the video camera mutual information to calculate: the vision histogram H of calculated candidate video camera target area image c'
C'With video camera c in the selected video camera set
j, c
j∈ C
sThe vision histogram
Between mutual information MI (c', c
j) target area image vision content similarity degree between two video cameras of expression
Wherein,
Be histogram H
C'X bucket,
Be histogram
Y bucket, n
C 'Vision histogram H for candidate's video camera c'
C'The sum of bucket,
For selecting video camera c
jThe vision histogram
The sum of bucket;
Step 233, the suboptimum video camera is selected: establish weight coefficient β, video camera c ' is selected in 0≤β≤1
*As optimum video camera, make
Step 234 is to the suboptimum video camera c that chooses
*, video camera set C has been selected in its adding
s=C
s∪ c
*, from the set of candidate's video camera, reject this video camera C simultaneously
u=C
u{ c
*, and increase selected video camera counting, i.e. count=count+1.
Step 24, repeating step 23 is until selected video camera counting count reaches predefined video camera counting n.
Visual dictionary off-line training: to the multi-channel video two field picture as training data of input, at first extract the constant transform characteristics of yardstick (the Scale Invariant Feature Transform of every width of cloth image, SIFT) local feature description's subvector set, the k-mean cluster is carried out in the SIFT Feature Descriptor set of then all images being extracted, each cluster centre is a descriptor vector, be regarded as a visual word, the set of the visual word that obtains consists of the visual dictionary of off-line training.Specifically comprise following steps:
Extract image SIFT Feature Descriptor vector: to every frame input video two field picture, adopt respectively Gauss's template that image is carried out filtering and ask for x and y direction gradient component I
xAnd I
y, and with this calculating pixel point gradient magnitude and direction
θ (x, y)=arctan (I
y, I
x); From the image upper left corner, get the window of 16 * 16 sizes as the feature extraction sampling window in 8 pixels of image x and the every interval of y direction, window is divided into 4 * 4 square nets zone, sampled point in each zone is calculated respectively gradient relative direction with the sampling window center, the gradient magnitude of sampled point is included into respectively gradient orientation histogram on interior 8 directions in zone after by distance Gauss weighting, the characteristic vectors that it is 128 dimensions that each sampling window generates one 4 * 4 * 8 dimension are carried out normalization to the gained characteristic vector and are formed window local feature description subvector; The descriptor vector that every width of cloth image calculation is obtained adds Feature Descriptor set F={f
(1), f
(2), f
(3)... f
(t), f
(i)∈ R
128, 1≤i≤t, wherein f
(i)For this width of cloth image feature descriptor is gathered i descriptor vector, R
128Represent that this vector dimension is 128 dimensions, the Feature Descriptor vector sum that t extracts for this width of cloth image;
Characteristic vector is carried out the k-mean cluster: the SIFT Feature Descriptor vector that two field picture extracts is gathered F, choose at random in the set k vector and be initial cluster center, after all characteristic vectors are carried out clustering by cluster centre, recomputate new cluster centre, constantly iteration is until reach the iterations restriction or cluster centre changes less than certain threshold value, setting of the present invention when iterations reach 100 times or the cluster centre distance less than 0.02 as stopping iterated conditional.
Visual dictionary consists of: each cluster centre is considered as a visual word, obtains and store the set of visual word, consist of the visual dictionary of off-line training.
Present embodiment comprises that off-line training generates the video camera selection of visual dictionary, the online generation of target image vision histogram and sequential forward direction, its process chart as shown in Figure 1, whole method is divided into online target image vision histogram generation and video camera is selected two key steps, and the below introduces respectively the main flow process of each embodiment part.
1, target image vision histogram generates online
In order to set up the information association between each video camera, present embodiment is at first chosen multi-path video data under the Same Scene, extract the local feature information in the frame of video, local feature vectors is carried out cluster, with the visual dictionary of cluster centre as the off-line training generation, so that Online Video can generate corresponding vision histogram and it is carried out related information relatively according to visual dictionary.Because the SIFT local feature can overcome the visual effect difference that Target Factor illumination under a plurality of visual angles, convergent-divergent and visual angle produce preferably, therefore present embodiment extracts the SIFT characteristic vector of the multichannel visual angle training video image of input, it is carried out the k-mean cluster, finally generate visual dictionary.Concrete steps are:
Extract image SIFT Feature Descriptor vector: to every frame input video two field picture, adopt respectively Gauss's template G_x, G_y carries out filtering to image and asks for x and y direction gradient component I
xAnd I
y, wherein
And with this calculating pixel point gradient magnitude and direction
θ (x, y)=arctan (I
y, I
x); From the image upper left corner, get the window of 16 * 16 sizes as the feature extraction sampling window in 8 pixels of image x and the every interval of y direction, window is divided into 4 * 4 square nets zone, sampled point in each zone is calculated respectively gradient relative direction with the sampling window center, the gradient magnitude of sampled point is included into respectively in the zone 0 after by distance Gauss weighting
π,
Gradient orientation histogram on totally 8 directions, each sampling window generate the i.e. characteristic vectors of 128 dimensions of one 4 * 4 * 8 dimension, the gained characteristic vector is carried out normalization form window local feature description subvector; The descriptor vector that every width of cloth image calculation is obtained adds Feature Descriptor set F={f
(1), f
(2), f
(3)... f
(t), to each f
(i)∈ R
128, f wherein
(i)Gather i descriptor vector for this width of cloth image feature descriptor, this vector dimension is 128 dimensions, the Feature Descriptor vector sum that t extracts for this width of cloth image.
Characteristic vector is carried out the k-mean cluster: to the SIFT Feature Descriptor vector set F that two field picture extracts, establishing the cluster centre number is k, carries out as follows the k-mean cluster:
Cluster centre is selected: choose at random k the subvector { μ of local feature description from training sample SIFT local feature description subclass
(1), μ
(2)... μ
(k), as the center of k cluster;
Clustering: to residue descriptor vector f in the Feature Descriptor set
(i), calculate it to each cluster centre μ
(j)Squardx
2Distance
F wherein
l (i)Be the descriptor vector f
(i)L component is with the descriptor vector f
(i)Incorporate in the cluster with minimum range d;
Recomputate cluster centre: according to cluster result, calculate the average of each dimension of all elements in k the cluster, as the new center of cluster;
Again cluster: with Feature Descriptor gather all elements among the F by the new center of cluster according to the minimum distance criterion in the step 122 again cluster;
The iterative computation cluster centre and according to new center again to Feature Descriptor set carry out cluster, until iterations reaches centre distance old before predefined iterations or new center and the iteration less than setting threshold, setting of the present invention when iterations reach 100 times or the cluster centre distance less than 0.02 as stopping iterated conditional.
Visual dictionary consists of: each cluster centre is considered as a visual word, obtains and store the set of visual word, consist of the visual dictionary of off-line training.
For setting up target image statistical representation model under each video camera, the present invention is to the multiple paths of video images of online input, extracts the area image of moving target under the visual angle separately, reduce because of the different visual effects that produce of background different; The target area image of extracting is extracted the SIFT local feature and generated vision histogram under each visual angle according to the visual dictionary of off-line training.Concrete steps are:
Step 11, Video Motion Detection: the video data to each video camera input carries out respectively Video Motion Detection, extracts moving target in the zone of image space.Concrete steps are as follows:
Step 111, the sport foreground image extracts: adopt mixed Gauss model (GMM) that video camera sequential input picture is carried out background modeling and foreground extraction.Concrete steps are as follows:
Step 1111, initial setting up: the first frame that will input is made as background, and Gauss's number, background threshold and the window size of mixed Gauss model are set.
Step 1112, the frame video image input model with new upgrades background, extracts the present frame foreground image.
Step 112, shade is eliminated: to each frame foreground image that step 111 obtains, carry out target shadow elimination operation in the foreground image.Concrete steps are as follows:
Step 1121 adopts respectively Gauss's template G_x, and G_y calculates in the x and y direction gradient I of original image I
xAnd I
y
Step 1122 adopts step 1121 same procedure to calculate background image I
bGradient I in the x and y direction
BxAnd I
By
Step 1123, computed image I and background image I
bThe gradient vector included angle cosine,
Step 1124 is to image mid point (x, y) compute gradient texture in 5 * 5 scopes.
Step 1125, when S (x, y) greater than certain threshold value, and point (x, y) motion detection result is when being the foreground point, then point (x, y) should partly be removed from prospect as shade.
Step 113, the movement destination image extracted region: the foreground image to behind the elimination shade, carry out rim detection, extract object boundary, obtain target at the boundary rectangle of image space, take this rectangle as the area image of template extraction moving target in the original video two field picture.
Step 12, area image local feature description extracts: the motion target area image that step 11 is extracted extracts the set of SIFT Feature Descriptor vector.
Step 13, the vision histogram generates: the visual word in the visual dictionary that obtains with calculated off-line is as the histogram bucket, the motion target area image SIFT Feature Descriptor vector that step 12 is extracted incorporates in the corresponding bucket, add up respectively descriptor vector number in each barrel, to the histogram normalized, generate thus the vision histogram of moving target under a plurality of visual angles at last.
2, video camera is selected
At each select time point, it is a select time point that present embodiment arranges intervals of video 10 frames, according to whether detect under the visual angle gain of people's face, target area visual information what and should select an optimum visual angle take average gradient as the image definition that characterizes in the zone, be optimum video camera, to guarantee to obtain front, more clear and level of detail is selected as far as possible than the video camera of hi-vision; In non-selected video camera set, information gain and the mutual information of candidate's video camera to selecting video camera to gather according to target image in the vision histogram calculation candidate camera video that calculates online, selection is large and be the less suboptimum video camera of mutual information with selecting the camera review similarity less to target observed information gain, the video camera set has been selected in its adding, and from the set of candidate's video camera, rejected; Constantly repeat above-mentioned steps until selected video camera counting reaches preset value.Its concrete steps are as follows:
Step 21, initialization is selected: have video camera set C={c in the scene
1, c
2... c
mObserve simultaneously moving target, selecteed video camera is gathered
Candidate's video camera set C
u=C merges the SIFT Feature Descriptor vector set of all candidate's video cameras, the vision histogram H after merging by step 13 generation
Merge
Step 22, optimum video camera is selected: from candidate's camera chain C
uIn optimum video camera c of Standard Selection such as comprehensive people's face testing result, the gain of motion target area image information and definition
*, it is added in the selected video camera set, i.e. C
s={ c
*, from the set of candidate's video camera, reject simultaneously, i.e. C
u=C
u{ c
*, it is 1 that selected video camera counting count value is set, its concrete steps are:
Step 221, people's face detect: utilize AdaBoost(Adaptive Boosting, improved Weak Classifier algorithm) human-face detector carries out people's face to candidate's video camera c' motion target area image and detects, and testing result is V
Face=0,1}, 1 expression detects people's face, otherwise is 0;
Step 222, the gain of motion target area image information is calculated: to the vision histogram H of video camera c'
C 'With merging after-vision histogram H
MergeCalculate visual information amount information gain V when selecting video camera c' and not selecting this video camera
IG, namely
P (b wherein
k, c') for selecting video camera c' and vision histogram H
MergeIn k the bucket joint probability,
Video camera c' and vision histogram H are not selected in expression
MergeIn the joint probability of k histogram bucket, p (b
k) be k the bucket probability, p (c') and
The probability of video camera c' is selected and is not selected in expression respectively, all according to histogram H
C 'And H
MergeCalculate;
Step 223, target image sharpness computation: to the gradient of target area image
Calculating average gradient size
Characterize the readability of the observed target of video camera in video image, wherein N
xBe the width of video image, N
yHeight for video image.
Step 224, optimum video camera is selected: establish weight coefficient α
1, α
2, α
3, α
1+ α
2+ α
3=1, select video camera c
*As optimum video camera, make
Present embodiment arranges α
1=0.3, α
2=0.4, α
3=0.3;
Step 23, the suboptimum video camera is selected: to selecting the video camera set
Follow these steps to from candidate's video camera set C
uIn choose a suboptimum video camera:
Step 231, target area image information gain IG among the method calculated candidate video camera c' of employing step 222
C '
Step 232, candidate's video camera is in selecting the video camera mutual information to calculate: the vision histogram H of calculated candidate video camera target area image c'
C'With video camera c in the selected video camera set
j, c
j∈ C
sThe vision histogram
Between mutual information MI (c', c
j) target area image vision content similarity degree between two video cameras of expression
Wherein,
Be histogram H
C'X bucket,
Be histogram
Y bucket, n
C 'Vision histogram H for candidate's video camera c'
C'The sum of bucket,
For selecting video camera c
jThe vision histogram
The sum of bucket;
Step 233, the suboptimum video camera is selected: establish weight coefficient β, video camera c ' is selected in 0≤β≤1
*As optimum video camera, make
Present embodiment arranges β=0.5;
Step 234 is to the suboptimum video camera c that chooses
*, video camera set C has been selected in its adding
s=C
s∪ c
*, from the set of candidate's video camera, reject this video camera C simultaneously
u=C
u{ c
*, and increase selected video camera counting, i.e. count=count+1.
Step 24, repeating step 23 is until selected video camera counting count reaches predefined video camera counting n.
Embodiment 2
Use video camera selective system that this programme realizes to take outdoor scene as main POM data set (Fleuret F, Berclaz J, Lengane R, Fua P.Multi-camera people tracking with a probabilistic occupancymap[J] .IEEE Transaction on Pattern Analysis and Machine Intelligence, 2008.vol30 (2): the Terrace video sequence is selected 267-282), this video sequence disposes 4 video cameras altogether, wherein choose the Terrace2 video sequence as the visual dictionary under this scene of scene training data training generation, carry out the on-line selection test with the Terrace1 video sequence, the 180th frame original image as shown in Figure 2, Fig. 2 a ~ 2d represents respectively video camera C0, C1, the target image that C2, C3 obtain.Fig. 3 a ~ 3d is to Fig. 2 a ~ 2d video camera C0, C1, C2, after the target image that C3 obtains utilizes mixed Gauss model to carry out background modeling and foreground extraction, combined with texture information carries out shade and eliminates detected foreground image, the target area image of Fig. 4 a ~ 4d for from Fig. 2 a ~ 2d, extracting respectively, Fig. 5 a ~ 5d is the normalization after-vision histogram of Fig. 4 a ~ 4d, wherein each histogram is all sorted out the target local feature description subvector that extracts under the corresponding visual angle by visual word in the visual dictionary (being histogrammic bucket), the visual word number of present embodiment visual dictionary is made as 200, the number that is the histogram bucket is to represent a barrel sequence number with the x axial coordinate among the 200(figure), statistics is included in the histogram number of local feature description's subvector carries out normalized divided by the characteristic vector sum in each barrel, show that with the block diagram form local feature description's subvector that is extracted is integrated into the probability distribution in the visual dictionary, the y axial coordinate represents the probability size among the figure, and the probability distribution take bucket as unit character descriptor vector is content shown in each video camera vision histogram of present embodiment.For the target area, this programme detects people's face in the C2 video camera, the factors such as combining information gain and image definition, the result that therefore optimum video camera is selected is C2, selection result shows has obtained the comparatively positive video image of this moment target, based on this, system calculate successively other video cameras information gain and with the mutual information that selects video camera, order of preference is followed successively by C2, C0, C3, C1 is namely when selecting video camera counting m=2, the camera chain selection result is { C2, C0}, when selecting the video camera counting to be made as m=3, selection result is { C2, C0, C3}.
Embodiment 3
Use video camera selective system that this programme realizes to take indoor scene as main i3DPost data set (N.Gkalelis, H.Kim, A.Hilton, N.Nikolaidis, and I.Pitas.The i3dpost multi-view and 3dhuman action/interaction database.In CVMP, 2009.) in video sequence select, this video set disposes 8 video cameras altogether, training generates visual dictionary under this scene as training data wherein to choose Walk video sequence D1-002 and D1-015, carries out the on-line selection test with Run video sequence D1-016, and the 62nd frame original image is shown in Fig. 6 a ~ Fig. 6 h, Fig. 6 a ~ Fig. 6 h represents respectively video camera C0, C1, C2, C3, C4, C5, the image that C6, C7 obtain.Fig. 7 a~Fig. 7 h represents target area under each visual angle after motion detection and the shade elimination, and for the target area, C5 and C6 people's face detected value are 1 in the optimum video camera selection course of this programme, all the other video camera values are 0, integrated information gain and clear picture degree, and it is optimum video camera that system chooses C5, based on this, system calculate successively other video cameras information gain and with the mutual information that selects video camera, order of preference is followed successively by C5, C6, C1, C3, C4, C7, C0, C2, namely when selecting video camera counting m=2, the camera chain selection result is { C5, C6}, when selecting the video camera counting to be made as m=3, selection result is { C5, C6, C1}, when selecting the video camera counting to be made as m=4, selection result is { C5, C6, C1, C3}, the selection result of all the other combined number the like.
Claims (6)
1. the combination system of selection of video camera in the visually-perceptible network is characterized in that, may further comprise the steps:
Step 1, target image vision histogram generates online: in the overlapping situation of a plurality of video camera FOVs, video data to the multichannel video camera of observing same target of online acquisition carries out motion detection, determine that by testing result target at the subregion in video frame images space, namely obtains object region; Object region is carried out local feature to be extracted; According to the visual dictionary that training in advance generates, calculate the vision histogram of object region under this visual angle;
Step 2, sequential forward direction video camera is selected: select an optimum visual angle, i.e. optimum video camera; In non-selected video camera set, the information gain of target image and candidate's video camera are to selecting the mutual information of video camera set in the vision histogram calculation candidate camera video data that calculate according to step 1, select the suboptimum video camera, the video camera set has been selected in its adding, and from the set of candidate's video camera, rejected; Constantly repeat until selected video camera counting reaches the video camera counting that needs.
2. the combination system of selection of video camera in a kind of visually-perceptible network according to claim 1, it is characterized in that, the visual dictionary that described training generates is: to the multi-path video data as training data of input, at first extract the constant transform characteristics of the yardstick local feature description subvector set of every width of cloth image; The k-mean cluster is carried out in the constant transform characteristics descriptor set of yardstick that all images extract; Each cluster centre is a descriptor vector, and as a visual word, the set of the visual word that obtains consists of the visual dictionary of off-line training;
Training generates visual dictionary and specifically may further comprise the steps:
Extract the constant transform characteristics descriptor vector of graphical rule: to every frame video frame images, adopt respectively Gauss's template that image is carried out filtering and ask for x direction and y direction gradient component I
xWith gradient component I
y, and calculating pixel point gradient magnitude mag (x, y) and direction θ (x, y), wherein
θ (x, y)=arctan (I
y, I
x); From the image upper left corner, get the window of 16 * 16 sizes as the feature extraction sampling window in 8 pixels of the x and the every interval of y direction of image, each sampling window is divided into 4 * 4 square nets zone, sampled point in each zone is calculated respectively gradient relative direction with the sampling window center, the gradient magnitude of sampled point is included into respectively gradient orientation histogram on interior 8 directions in zone after by distance Gauss weighting, each sampling window generates the characteristic vector of one 128 dimension, the gained characteristic vector is carried out normalization form window local feature description subvector; The descriptor vector that every width of cloth image calculation is obtained adds Feature Descriptor set F={f
(1), f
(2), f
(3)... f
(t), f
(i)∈ R
128, 1≤i≤t, wherein f
(i)For this width of cloth image feature descriptor is gathered i descriptor vector, R
128Represent that this vector dimension is 128 dimensions, the Feature Descriptor vector sum that t extracts for this width of cloth image;
Characteristic vector is carried out the k-mean cluster: to Feature Descriptor vector set F, choose at random k vector as cluster centre, the distance of all vector distance cluster centres of iterative computation is also carried out clustering, recomputate cluster centre according to dividing the result, until before and after reaching the iterations of regulation or iteration the cluster centre change of distance less than setting threshold.
Visual dictionary consists of: each cluster centre as a visual word, is obtained and store the set of visual word, consist of visual dictionary.
3. the combination system of selection of video camera in a kind of visually-perceptible network according to claim 2 is characterized in that the online generation of described step 1 target image vision histogram specifically may further comprise the steps:
Step 11, video frequency motion target detects: the video data to each video camera input carries out respectively Video Motion Detection based on mixed Gauss model, and motion detection result is eliminated target shadow based on texture information, finally extracts moving target in the zone of image space;
Step 12, area image local feature description extracts: the motion target area image that step 11 is extracted extracts the set of the constant transform characteristics descriptor vector of yardstick;
Step 13, the vision histogram generates: the cluster centre of the visual dictionary that generates with training in advance is as a histogram bucket, the constant transform characteristics descriptor of the yardstick vector of the motion target area image that step 12 is extracted incorporates in the corresponding bucket of histogram, add up respectively descriptor vector number in each histogram bucket, to the histogram normalized, generate the vision histogram of moving target under a plurality of visual angles at last.
4. the combination system of selection of video camera in a kind of visually-perceptible network according to claim 3 is characterized in that, the sequential forward direction video camera of described step 2 is selected specifically to may further comprise the steps:
Step 21, initialization is selected: have video camera set C={c in the visually-perceptible network scenarios
1, c
2... c
mObserving simultaneously moving target, m is the sum of video camera, has selected the video camera set
Candidate's video camera set C
u=C merges the constant transform characteristics descriptor vector of the yardstick set of all candidate's video cameras, generates the vision histogram H after merging
Merge
Step 22, optimum video camera is selected: from candidate's camera chain C
uOptimum video camera c of middle selection
*, video camera set C has been selected in its adding
s, i.e. C
s={ c
*, from the set of candidate's video camera, reject video camera c simultaneously
*, i.e. C
u=C
u{ c
*, the selected video camera counting of initial setting up count value is 1;
Step 23, the suboptimum video camera is selected: to selecting the video camera set
The information gain of each iterative computation candidate video camera and to selecting the vision histogram mutual information of video camera is selected suboptimum video camera c
*, join and select the video camera set and gather C from candidate's video camera
uMiddle rejecting, and increase selected video camera counting, i.e. count=count+1;
Step 24, repeating step 23 is until selected video camera counting count reaches predefined video camera counting n.
5. the combination system of selection of video camera in a kind of visually-perceptible network according to claim 4 is characterized in that, the optimum video camera of described step 22 is selected specifically to may further comprise the steps:
Step 221, people's face detects: utilize human-face detector that candidate's video camera c' motion target area image is carried out people's face and detect, testing result is V
Face=0,1}, 1 expression detects people's face, otherwise is 0;
Step 222, the gain of motion target area image information is calculated: to the vision histogram H of video camera c'
c' and merge after-vision histogram H
MergeCalculate visual information amount information gain V when selecting video camera c' and not selecting video camera c'
IG, namely
P (b wherein
k, c') for selecting video camera c' and vision histogram H
MergeIn the joint probability of k histogram bucket,
Video camera c' and vision histogram H are not selected in expression
MergeIn the joint probability of k histogram bucket, p (b
k) be k histogram bucket probability, p (c') and
The probability of video camera c' is selected and is not selected in expression respectively;
Step 223, target image sharpness computation: to the gradient calculation average gradient size of target area image
Characterize the readability of target in image;
6. the combination system of selection of video camera in a kind of visually-perceptible network according to claim 5 is characterized in that, described step 23 suboptimum video camera is selected specifically to may further comprise the steps:
Step 231, target area image information gain IG among the method calculated candidate video camera c' of employing step 222
C '
Step 232, candidate's video camera is in selecting the video camera mutual information to calculate: the vision histogram H of target area image among the calculated candidate video camera c'
C'With select video camera set C
sMiddle video camera c
jThe vision histogram
Between mutual information MI (c', c
j),, c
j∈ C
s, MI (c', c
j) target area image vision content similarity degree between two video cameras of expression:
Wherein,
Be histogram H
C'X bucket,
Be histogram
Y bucket, n
C'Vision histogram H for candidate's video camera c'
C'The sum of bucket,
For selecting video camera c
jThe vision histogram
The sum of bucket;
Step 233, the suboptimum video camera is selected: establish weight coefficient β, video camera c ' is selected in 0≤β≤1
*As optimum video camera, make
Step 234 is to the suboptimum video camera c that chooses
*, video camera set C has been selected in its adding
s=C
s∪ c
*, from the set of candidate's video camera, reject this video camera C simultaneously
u=C
u{ c
*.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210488434.1A CN102932605B (en) | 2012-11-26 | 2012-11-26 | Method for selecting camera combination in visual perception network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210488434.1A CN102932605B (en) | 2012-11-26 | 2012-11-26 | Method for selecting camera combination in visual perception network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102932605A true CN102932605A (en) | 2013-02-13 |
CN102932605B CN102932605B (en) | 2014-12-24 |
Family
ID=47647293
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210488434.1A Expired - Fee Related CN102932605B (en) | 2012-11-26 | 2012-11-26 | Method for selecting camera combination in visual perception network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102932605B (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103294813A (en) * | 2013-06-07 | 2013-09-11 | 北京捷成世纪科技股份有限公司 | Sensitive image search method and device |
CN104918011A (en) * | 2015-05-29 | 2015-09-16 | 华为技术有限公司 | Method and device for playing video |
CN104915677A (en) * | 2015-05-25 | 2015-09-16 | 宁波大学 | Three-dimensional video object tracking method |
CN106778777A (en) * | 2016-11-30 | 2017-05-31 | 成都通甲优博科技有限责任公司 | A kind of vehicle match method and system |
CN107111664A (en) * | 2016-08-09 | 2017-08-29 | 深圳市瑞立视多媒体科技有限公司 | A kind of video camera collocation method and device |
CN107888897A (en) * | 2017-11-01 | 2018-04-06 | 南京师范大学 | A kind of optimization method of video source modeling scene |
CN108234900A (en) * | 2018-02-13 | 2018-06-29 | 深圳市瑞立视多媒体科技有限公司 | A kind of camera configuration method and apparatus |
CN108449551A (en) * | 2018-02-13 | 2018-08-24 | 深圳市瑞立视多媒体科技有限公司 | A kind of camera configuration method and apparatus |
CN108471496A (en) * | 2018-02-13 | 2018-08-31 | 深圳市瑞立视多媒体科技有限公司 | A kind of camera configuration method and apparatus |
CN108491857A (en) * | 2018-02-11 | 2018-09-04 | 中国矿业大学 | A kind of multiple-camera target matching method of ken overlapping |
CN108495057A (en) * | 2018-02-13 | 2018-09-04 | 深圳市瑞立视多媒体科技有限公司 | A kind of camera configuration method and apparatus |
CN108875507A (en) * | 2017-11-22 | 2018-11-23 | 北京旷视科技有限公司 | Pedestrian tracting method, equipment, system and computer readable storage medium |
CN109493634A (en) * | 2018-12-21 | 2019-03-19 | 深圳信路通智能技术有限公司 | A kind of parking management system and method based on multiple-equipment team working |
CN109639961A (en) * | 2018-11-08 | 2019-04-16 | 联想(北京)有限公司 | Acquisition method and electronic equipment |
CN110505397A (en) * | 2019-07-12 | 2019-11-26 | 北京旷视科技有限公司 | The method, apparatus and computer storage medium of camera selection |
CN111447404A (en) * | 2019-01-16 | 2020-07-24 | 杭州海康威视数字技术股份有限公司 | Video camera |
CN111866468A (en) * | 2020-07-29 | 2020-10-30 | 浙江大华技术股份有限公司 | Object tracking distribution method and device, storage medium and electronic device |
CN114900602A (en) * | 2022-06-08 | 2022-08-12 | 北京爱笔科技有限公司 | Video source camera determining method and device |
CN117750040A (en) * | 2024-02-20 | 2024-03-22 | 浙江宇视科技有限公司 | Video service balancing method, device, equipment and medium of intelligent server cluster |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080075361A1 (en) * | 2006-09-21 | 2008-03-27 | Microsoft Corporation | Object Recognition Using Textons and Shape Filters |
CN102081666A (en) * | 2011-01-21 | 2011-06-01 | 北京大学 | Index construction method for distributed picture search and server |
CN102208038A (en) * | 2011-06-27 | 2011-10-05 | 清华大学 | Image classification method based on visual dictionary |
CN102509110A (en) * | 2011-10-24 | 2012-06-20 | 中国科学院自动化研究所 | Method for classifying images by performing pairwise-constraint-based online dictionary reweighting |
CN102609732A (en) * | 2012-01-31 | 2012-07-25 | 中国科学院自动化研究所 | Object recognition method based on generalization visual dictionary diagram |
CN102693311A (en) * | 2012-05-28 | 2012-09-26 | 中国人民解放军信息工程大学 | Target retrieval method based on group of randomized visual vocabularies and context semantic information |
-
2012
- 2012-11-26 CN CN201210488434.1A patent/CN102932605B/en not_active Expired - Fee Related
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080075361A1 (en) * | 2006-09-21 | 2008-03-27 | Microsoft Corporation | Object Recognition Using Textons and Shape Filters |
CN102081666A (en) * | 2011-01-21 | 2011-06-01 | 北京大学 | Index construction method for distributed picture search and server |
CN102208038A (en) * | 2011-06-27 | 2011-10-05 | 清华大学 | Image classification method based on visual dictionary |
CN102509110A (en) * | 2011-10-24 | 2012-06-20 | 中国科学院自动化研究所 | Method for classifying images by performing pairwise-constraint-based online dictionary reweighting |
CN102609732A (en) * | 2012-01-31 | 2012-07-25 | 中国科学院自动化研究所 | Object recognition method based on generalization visual dictionary diagram |
CN102693311A (en) * | 2012-05-28 | 2012-09-26 | 中国人民解放军信息工程大学 | Target retrieval method based on group of randomized visual vocabularies and context semantic information |
Non-Patent Citations (2)
Title |
---|
赵春晖,等: "一种基于词袋模型的图像优化分类方法", 《电子与信息学报》 * |
赵春晖,等: "一种改进的K-means聚类视觉词典构造方法", 《仪器仪表学报》 * |
Cited By (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103294813A (en) * | 2013-06-07 | 2013-09-11 | 北京捷成世纪科技股份有限公司 | Sensitive image search method and device |
CN104915677A (en) * | 2015-05-25 | 2015-09-16 | 宁波大学 | Three-dimensional video object tracking method |
CN104915677B (en) * | 2015-05-25 | 2018-01-05 | 宁波大学 | A kind of 3 D video method for tracking target |
CN104918011B (en) * | 2015-05-29 | 2018-04-27 | 华为技术有限公司 | A kind of method and device for playing video |
CN104918011A (en) * | 2015-05-29 | 2015-09-16 | 华为技术有限公司 | Method and device for playing video |
CN107111664A (en) * | 2016-08-09 | 2017-08-29 | 深圳市瑞立视多媒体科技有限公司 | A kind of video camera collocation method and device |
CN107111664B (en) * | 2016-08-09 | 2018-03-06 | 深圳市瑞立视多媒体科技有限公司 | A kind of video camera collocation method and device |
CN106778777A (en) * | 2016-11-30 | 2017-05-31 | 成都通甲优博科技有限责任公司 | A kind of vehicle match method and system |
CN107888897A (en) * | 2017-11-01 | 2018-04-06 | 南京师范大学 | A kind of optimization method of video source modeling scene |
CN107888897B (en) * | 2017-11-01 | 2019-11-26 | 南京师范大学 | A kind of optimization method of video source modeling scene |
CN108875507A (en) * | 2017-11-22 | 2018-11-23 | 北京旷视科技有限公司 | Pedestrian tracting method, equipment, system and computer readable storage medium |
CN108491857B (en) * | 2018-02-11 | 2022-08-09 | 中国矿业大学 | Multi-camera target matching method with overlapped vision fields |
CN108491857A (en) * | 2018-02-11 | 2018-09-04 | 中国矿业大学 | A kind of multiple-camera target matching method of ken overlapping |
CN108495057B (en) * | 2018-02-13 | 2020-12-08 | 深圳市瑞立视多媒体科技有限公司 | Camera configuration method and device |
CN108234900B (en) * | 2018-02-13 | 2020-11-20 | 深圳市瑞立视多媒体科技有限公司 | Camera configuration method and device |
CN108471496A (en) * | 2018-02-13 | 2018-08-31 | 深圳市瑞立视多媒体科技有限公司 | A kind of camera configuration method and apparatus |
CN108234900A (en) * | 2018-02-13 | 2018-06-29 | 深圳市瑞立视多媒体科技有限公司 | A kind of camera configuration method and apparatus |
CN108495057A (en) * | 2018-02-13 | 2018-09-04 | 深圳市瑞立视多媒体科技有限公司 | A kind of camera configuration method and apparatus |
CN108449551A (en) * | 2018-02-13 | 2018-08-24 | 深圳市瑞立视多媒体科技有限公司 | A kind of camera configuration method and apparatus |
CN108449551B (en) * | 2018-02-13 | 2020-11-03 | 深圳市瑞立视多媒体科技有限公司 | Camera configuration method and device |
CN108471496B (en) * | 2018-02-13 | 2020-11-03 | 深圳市瑞立视多媒体科技有限公司 | Camera configuration method and device |
CN109639961A (en) * | 2018-11-08 | 2019-04-16 | 联想(北京)有限公司 | Acquisition method and electronic equipment |
CN109493634A (en) * | 2018-12-21 | 2019-03-19 | 深圳信路通智能技术有限公司 | A kind of parking management system and method based on multiple-equipment team working |
CN111447404B (en) * | 2019-01-16 | 2022-02-01 | 杭州海康威视数字技术股份有限公司 | Video camera |
CN111447404A (en) * | 2019-01-16 | 2020-07-24 | 杭州海康威视数字技术股份有限公司 | Video camera |
CN110505397B (en) * | 2019-07-12 | 2021-08-31 | 北京旷视科技有限公司 | Camera selection method, device and computer storage medium |
CN110505397A (en) * | 2019-07-12 | 2019-11-26 | 北京旷视科技有限公司 | The method, apparatus and computer storage medium of camera selection |
CN111866468A (en) * | 2020-07-29 | 2020-10-30 | 浙江大华技术股份有限公司 | Object tracking distribution method and device, storage medium and electronic device |
CN111866468B (en) * | 2020-07-29 | 2022-06-24 | 浙江大华技术股份有限公司 | Object tracking distribution method, device, storage medium and electronic device |
CN114900602A (en) * | 2022-06-08 | 2022-08-12 | 北京爱笔科技有限公司 | Video source camera determining method and device |
CN114900602B (en) * | 2022-06-08 | 2023-10-17 | 北京爱笔科技有限公司 | Method and device for determining video source camera |
CN117750040A (en) * | 2024-02-20 | 2024-03-22 | 浙江宇视科技有限公司 | Video service balancing method, device, equipment and medium of intelligent server cluster |
CN117750040B (en) * | 2024-02-20 | 2024-06-07 | 浙江宇视科技有限公司 | Video service balancing method, device, equipment and medium of intelligent server cluster |
Also Published As
Publication number | Publication date |
---|---|
CN102932605B (en) | 2014-12-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102932605B (en) | Method for selecting camera combination in visual perception network | |
CN110363122B (en) | Cross-domain target detection method based on multi-layer feature alignment | |
CN111899172A (en) | Vehicle target detection method oriented to remote sensing application scene | |
CN103824070B (en) | A kind of rapid pedestrian detection method based on computer vision | |
CN101950426B (en) | Vehicle relay tracking method in multi-camera scene | |
CN110853032B (en) | Unmanned aerial vehicle video tag acquisition method based on multi-mode deep learning | |
CN112084869B (en) | Compact quadrilateral representation-based building target detection method | |
CN110263712B (en) | Coarse and fine pedestrian detection method based on region candidates | |
CN110688905B (en) | Three-dimensional object detection and tracking method based on key frame | |
CN108564598B (en) | Improved online Boosting target tracking method | |
CN103856727A (en) | Multichannel real-time video splicing processing system | |
CN104517095B (en) | A kind of number of people dividing method based on depth image | |
CN102256065A (en) | Automatic video condensing method based on video monitoring network | |
Cepni et al. | Vehicle detection using different deep learning algorithms from image sequence | |
CN111368660A (en) | Single-stage semi-supervised image human body target detection method | |
CN103955888A (en) | High-definition video image mosaic method and device based on SIFT | |
CN104834894A (en) | Gesture recognition method combining binary coding and Hausdorff-like distance | |
CN113255608A (en) | Multi-camera face recognition positioning method based on CNN classification | |
CN106529441A (en) | Fuzzy boundary fragmentation-based depth motion map human body action recognition method | |
CN116614705A (en) | Coal face camera regulation and control system based on multi-mode video feature analysis | |
CN106127813B (en) | The monitor video motion segments dividing method of view-based access control model energy sensing | |
CN114170526A (en) | Remote sensing image multi-scale target detection and identification method based on lightweight network | |
CN114495170A (en) | Pedestrian re-identification method and system based on local self-attention inhibition | |
CN110046601B (en) | Pedestrian detection method for crossroad scene | |
CN107730535B (en) | Visible light infrared cascade video tracking method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20141224 Termination date: 20181126 |