CN102932605A - Method for selecting camera combination in visual perception network - Google Patents

Method for selecting camera combination in visual perception network Download PDF

Info

Publication number
CN102932605A
CN102932605A CN2012104884341A CN201210488434A CN102932605A CN 102932605 A CN102932605 A CN 102932605A CN 2012104884341 A CN2012104884341 A CN 2012104884341A CN 201210488434 A CN201210488434 A CN 201210488434A CN 102932605 A CN102932605 A CN 102932605A
Authority
CN
China
Prior art keywords
video camera
histogram
video
image
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012104884341A
Other languages
Chinese (zh)
Other versions
CN102932605B (en
Inventor
孙正兴
李骞
陈松乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201210488434.1A priority Critical patent/CN102932605B/en
Publication of CN102932605A publication Critical patent/CN102932605A/en
Application granted granted Critical
Publication of CN102932605B publication Critical patent/CN102932605B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)
  • Studio Devices (AREA)

Abstract

The invention discloses a method for selecting camera combination in a visual perception network. The method comprises the following steps: on-line generation of a target image visual histogram: in the case that the vision fields of a plurality of cameras overlap each other, performing motion detection on online obtained video data of multiple cameras observing the same object, determining the subregion of the object in a video frame image space according to the detection result, to obtain a target image region; performing local feature extraction on the target image region, and calculating the visual histogram of the target image region at the visual angle according to a visual dictionary generated by pre-training; and sequential forward camera selection: selecting an optimal visual angle, that is, the optimal camera, in the set of unselected cameras, selecting a secondary optimal camera, adding the secondary optimal camera to the set of selected cameras, removing the secondary optimal camera from the set of candidate cameras, and repeating the steps until the count of the selected cameras reaches the count of needed cameras.

Description

The combination system of selection of video camera in a kind of visually-perceptible network
Technical field
The present invention relates to the video camera system of selection, belong to computer vision and video data processing technology field, the specifically combination system of selection of video camera in a kind of visually-perceptible network.
Background technology
In recent years and since video camera be widely used in the fields such as security monitoring, man-machine interaction, navigator fix, battlefield surroundings perception, multi-camera system becomes one of study hotspot of computer vision and application thereof.Especially in using based on the monitoring of video and man-machine interaction etc., the visually-perceptible network VSN (VisualSensor Network) that is comprised of a plurality of video cameras can effectively solve in the target observation process that single camera exists from the problem such as blocking, but also produced bulk redundancy information, increased the burden of system storage, vision calculating and Internet Transmission.Therefore, how from multi-channel video, to choose and to push the video of rich amount of information, just become one of key issue of visually-perceptible network and application thereof.Select question marks seemingly with the video camera based on video data, in the graphics field, select problem also to carry out broad research for the visual angle of three-dimension model observation, such as document 1Vazquez P, Sbert M.Fast adaptive selection ofbest views.Lecture Notes in Computer Science, 2003, among the 2669:295 – 305 known geometrical model is asked for viewpoint entropy under the different visual angles, and according to its optimum visual angle of size selection, but different from video camera selection problem is, the former requires to obtain in advance the accurate model definition of observed object, and model mostly makes up in the special pattern environment thereby analytic process does not need to consider the factor affecting such as background and illumination.On the other hand, general sensor network nodes is selected problem such as document 2Mo Y., AmbrosinoR., and Bruno Sinopoli.Sensor Selection Strategies for StateEstimation in Energy Constrained Wireless Sensor Networks.Automatica, 2011,47 (7): 1330-1338 and document 3Huber, M.F.Optimal pruning for multi-Step sensor scheduling.IEEETransactions on automatic control.2012, the foundation that all adopts the position between target being observed and the transducer to select as sensor node among 57 (5): 1338 – 1343, and video camera perception environment has directivity, can not simply select optimum video camera according to the position relationship of target and camera node, more wish to see people's direct picture rather than figure viewed from behind image closely during for example security monitoring is used.
Existing video camera system of selection can be divided in the wide area without the overlapping video camera system of selection of the ken and the video camera system of selection with part or all of overlapping ken according to camera node ken overlapped coverage situation in the visually-perceptible network.Wherein continue the demands such as tracking without the overlapping video camera system of selection of the ken for the target in realizing on a large scale, according to the prediction to target travel the node in the camera network of disperseing to lay is selected; The present invention is for satisfying the application demands such as security monitoring and man-machine interaction, and main research has the video camera system of selection of observing same target in the part or all of ken overlapping range.In these class methods, can be divided into again single camera system of selection and camera chain system of selection two classes according to the quantity of on-camera.Wherein the single camera system of selection is on specific select time point, only select an optimum visual angle as output according to the selection criterion that proposes, at this moment, the design of choice criteria is the key that the visual information amount evaluation criterion becomes the selection of video camera, and need not to consider the separately similitude between the capturing information between video camera and the video camera.Design aspect in choice criteria, usually can be divided into based on the selection of video image content with based on target in the video in objective world spatial relation two classes, such as document 4Daniyal F., Taj M., Cavallaro A Contentand task-based view selection from multiple video streams.Multimedia Tools andApplications, 2010, extract what that move in the video product among the 46:235 – 258, the type of target, whether the video features such as shooting event occur in size and position and the video, realize the selection of content-based video camera according to the contextual information of feature, the video image content that these class methods are only obtained video camera carries out feature extraction and scoring is compared, and does not need the content of each node perceived in the camera network is carried out similarity measurement.The method of video camera being selected based on the spatial relation of target in the video, such as document 5Park J, Bhat C, Kak AC.A look-up table basedapproach for solving the camera selection problem in large camera networks.ACMWorkshop on Distributed Smart Cameras, in 2006 the space in the video camera angular field of view is created as a corresponding video camera look-up table, in the video camera selection course according to the spatial relation of target and each video camera, the nearest camera node of chosen distance in table, this class methods prerequisite is that video camera must be processed through accurate camera calibration in the scene, otherwise can't from the video image of each video camera, obtain the accurate locus of target, simultaneously the method is not considered the orientation information of target in scene, target acquisition direct picture all the time when being applied to the field such as security monitoring.The said method selection result all only has an optimum video camera, does not consider that the compound mode by a plurality of video cameras remedies mutually the each other restriction at visual angle in the selection result, thereby can not investigate information similarity degree and redundant situation between the video camera.
Under certain resource and treatment conditions allow, compare with the single camera system of selection by the method for selecting a plurality of video cameras to form camera chain, can be by problems such as a plurality of visual angles certainly blocking of increasing that information sources overcome effectively that the latter occurs and blind areas.Although can form camera chain by the mode of selecting one by one optimum visual angle, but there is the content similitude thereby has in various degree data redundancy because the video camera of different angles obtains video image, generally speaking, the selection result that each selected optimum video camera forms not is optimum camera chain, although for example two video camera amount of information with regard to single that photograph simultaneously the target front are all larger, contained Global Information amount is not so good as the visual information amount of a front and a side target that video camera obtains usually.Up to the present existing camera chain system of selection is studied relatively less.
Summary of the invention
Goal of the invention: technical problem to be solved by this invention is for the deficiencies in the prior art, has proposed the combination system of selection of video camera in a kind of visually-perceptible network.
Technical scheme: the combination system of selection of video camera in a kind of visually-perceptible network disclosed by the invention may further comprise the steps:
Step 1, target image vision histogram generates online: in the overlapping situation of a plurality of video camera FOVs, video data to the multichannel video camera that contains target of online acquisition carries out motion detection, determine that by testing result target at the subregion in video frame images space, namely obtains object region; Object region is carried out local feature to be extracted; In conjunction with the visual dictionary that training in advance generates, calculate the vision histogram of object region under this visual angle;
Step 2, sequential forward direction video camera is selected: on each time point, select an optimum visual angle, i.e. optimum video camera; In non-selected video camera set, the information gain of target image and candidate's video camera are to selecting the mutual information of video camera set in the vision histogram calculation candidate camera video that calculates according to step 2, selection is large and be the less suboptimum video camera of mutual information with selecting camera review content similarity less to target observed information gain, the video camera set has been selected in its adding, and from the set of candidate's video camera, rejected; Constantly repeat above-mentioned steps until selected video camera counting reaches preset value.
Visual dictionary: because the constant transform characteristics SIFT of yardstick (Scale Invariant Feature Transform, SIFT) can overcome the impacts such as illumination that different cameras produces and convergent-divergent preferably, so the present invention with it as the visual dictionary lemma.To input as the two field picture in the multi-path video data of training data, at first extract the constant transform characteristics SIFT of yardstick (Scale Invariant Feature Transform, SIFT) the local feature description subvector set of every width of cloth image; The k-mean cluster is carried out in the SIFT Feature Descriptor set that all images extract; Each cluster centre is regarded as a visual word, and the set of the visual word that obtains consists of the visual dictionary of off-line training, specifically may further comprise the steps:
Extract image SIFT Feature Descriptor vector: to every frame input video two field picture, adopt respectively Gauss's template that image is carried out filtering and ask for x and y direction gradient component I xAnd I y, and with this calculating pixel point gradient magnitude and direction
Figure BDA00002468276900031
θ (x, y)=arctan (I y, I x); From the image upper left corner, get the window of 16 * 16 sizes as the feature extraction sampling window in 8 pixels of image x and the every interval of y direction, window is divided into 4 * 4 square nets zone, sampled point in each zone is calculated respectively gradient relative direction with the sampling window center, the gradient magnitude of sampled point is included into respectively gradient orientation histogram on interior 8 directions in zone after by distance Gauss weighting, the characteristic vectors that it is 128 dimensions that each sampling window generates one 4 * 4 * 8 dimension are carried out normalization to the gained characteristic vector and are formed window local feature description subvector; The descriptor vector that every width of cloth image calculation is obtained adds Feature Descriptor set F={f (1), f (2), f (3)... f (t), f (i)∈ R 128, 1≤i≤t, wherein f (i)For this width of cloth image feature descriptor is gathered i descriptor vector, R 128Represent that this vector dimension is 128 dimensions, the Feature Descriptor sum that t extracts for this width of cloth image;
Characteristic vector is carried out the k-mean cluster: the SIFT Feature Descriptor vector that two field picture extracts is gathered F, choose at random in the set k vector and be initial cluster center, after all characteristic vectors are carried out clustering by cluster centre, recomputate new cluster centre, constantly iteration until reach the iterations restriction or the cluster centre change of distance less than certain threshold value, setting of the present invention when iterations reach 50 ~ 200 times or the cluster centre distance less than 0.02 as stopping iterated conditional.
Visual dictionary consists of: each cluster centre is considered as a visual word, obtains and store the set of visual word, consist of the visual dictionary of off-line training.
The online generation of step 1 target image vision histogram of the present invention specifically may further comprise the steps:
Step 11, Video Motion Detection: the video data to each video camera input adopts respectively mixed Gauss model to carry out Video Motion Detection, every frame testing result is eliminated the shade that is produced by target based on texture method in scene, extract moving target in the zone of image space.
Step 12, area image local feature description extracts: the motion target area image that step 11 is extracted extracts the set of SIFT Feature Descriptor vector;
Step 13, the vision histogram generates: the cluster centre of the visual dictionary that generates with training in advance is as the histogram bucket, the motion target area image SIFT Feature Descriptor vector that step 12 is extracted incorporates in the corresponding bucket of histogram, add up respectively descriptor vector number in each barrel, to the histogram normalized, generate thus the vision histogram of moving target under a plurality of visual angles at last.
Step 2 of the present invention specifically may further comprise the steps:
Step 21, initialization is selected: have video camera set C={c in the scene 1, c 2... c mObserving simultaneously moving target, m is the sum of video camera, selecteed video camera set
Figure BDA00002468276900041
Candidate's video camera set C u=C merges the SIFT Feature Descriptor vector set of all candidate's video cameras, the vision histogram H after merging by step 13 generation Merge
Step 22, optimum video camera is selected: from candidate's camera chain C uIn optimum video camera c of Standard Selection such as comprehensive people's face testing result, the gain of motion target area image information and definition *, it is added in the selected video camera set, i.e. C s={ c *, from the set of candidate's video camera, reject simultaneously, i.e. C u=C u{ c *, C u=C u{ c *, the selected video camera counting of initial setting up count value is 1; Its concrete steps are:
Step 221, people's face detects: utilize AdaBoost(Adaptive Boosting, improved Weak Classifier algorithm, Adaboost is a kind of iterative algorithm, its core concept is for the different grader (Weak Classifier) of same training set training, then these Weak Classifiers are gathered, consist of a stronger final grader (strong classifier).The Adaboost algorithm is improved Boosting algorithm, and the mistake of the Weak Classifier that it can obtain weak study is carried out accommodation.) human-face detector carries out people's face to candidate's video camera c' motion target area image and detect, testing result is V Face=0,1}, 1 expression detects people's face, otherwise is 0;
Step 222, the gain of motion target area image information is calculated: to the vision histogram H of video camera c' C 'With merging after-vision histogram H MergeCalculate visual information amount information gain V when selecting video camera c' and not selecting this video camera IG, namely
V IG = Σ k p ( b k , c ′ ) log ( p ( b k , c ′ ) p ( b k ) p ( c ′ ) ) + Σ k p ( b k , c ′ ‾ ) log ( p ( b k , c ′ ‾ ) p ( b k ) p ( c ′ ‾ ) ) ,
Wherein: p (b k, c') for selecting video camera c' and vision histogram H MergeIn k the bucket joint probability,
Figure BDA00002468276900052
Video camera c' and vision histogram H are not selected in expression MergeIn the joint probability of k histogram bucket, p (b k) be k the bucket probability, p (c') and The probability of video camera c' is selected and is not selected in expression respectively, all according to histogram H C 'And H MergeCalculate;
Step 223, target image sharpness computation: to the gradient of target area image
Figure BDA00002468276900054
Calculating average gradient size
Figure BDA00002468276900055
Characterize the readability of target in image.
Step 224, optimum video camera is selected: establish weight coefficient α 1, α 2, α 3, α 1+ α 2+ α 3=1, select video camera c *As optimum video camera, make
Figure BDA00002468276900061
It is 1 that selected video camera counting count value is set;
Step 23, the suboptimum video camera is selected: to selecting the video camera set The information gain of each iterative computation candidate video camera and to selecting the vision histogram mutual information of video camera is selected suboptimum video camera c *, join and select the video camera set and gather C from candidate's video camera uMiddle rejecting, and increase selected video camera counting, i.e. count=count+1;
Step 231, target area image information gain IG among the method calculated candidate video camera c' of employing step 222 C '
Step 232, candidate's video camera is in selecting the video camera mutual information to calculate: the vision histogram H of calculated candidate video camera target area image c' C'With video camera c in the selected video camera set j, c j∈ C sThe vision histogram Between mutual information MI (c', c j) target area image vision content similarity degree between two video cameras of expression:
MI ( c ′ , c j ) = Σ x = 1 n c ′ Σ y = 1 n c j p ( H c ′ x , H c j y ) log ( p ( H c ′ x , H c j y ) p ( H c ′ x ) p ( H c j y ) ) ,
Wherein, Be histogram H C'X bucket, Be histogram
Figure BDA00002468276900067
Y bucket, n C 'Vision histogram H for candidate's video camera c' C'The sum of bucket,
Figure BDA00002468276900068
For selecting video camera c jThe vision histogram
Figure BDA00002468276900069
The sum of bucket;
Step 233, the suboptimum video camera is selected: establish weight coefficient β, video camera c ' is selected in 0≤β≤1 *As optimum video camera, make c ′ * = arg max c ′ ( IG c ′ - β Σ c j ∈ C s MI ( c ′ , c j ) ) ;
Step 234 is to the suboptimum video camera c that chooses *, video camera set C has been selected in its adding s=C s∪ c *, from the set of candidate's video camera, reject this video camera C simultaneously u=C u{ c *, and increase selected video camera counting, i.e. count=count+1.
Step 24, repeating step 23, until selected video camera counting count reaches predefined video camera counting n, n gets natural number, sets according to concrete needs.
Beneficial effect: for solve in the application such as target monitoring and man-machine interaction single camera owing to target from blocking the loss of learning that produces, and use simultaneously a plurality of video cameras to have the problem of bulk redundancy information, the present invention is directed to the system of selection that application demand discloses a kind of camera chain, on select time point, select by gradual camera node, pick out the minimum camera chain of the richest amount of information and redundant information, namely from candidate's video camera of m the same target of observation, choose n video camera (n<m), calculate to satisfy, the constraints of storage and network capacity.
Particularly the present invention compares with existing method and has the following advantages: 1, to having the visually-perceptible network of common FOV, under calculating, storage capacity confined condition, select a plurality of video cameras to form camera chain, the problem such as certainly block that efficiently solves that single camera selects to bring has reduced the information redundancy problem of using all video cameras to bring simultaneously; 2, carry out optimum video camera in conjunction with the gain of recognition of face, video camera information and image definition and select, guarantee positive, have higher level of detail and the video camera of target image is selected more clearly; 3, from candidate's video camera, progressively select the suboptimum video camera with sequential progression, both considered the contribute information of video camera to observed target, adopt again the information redundancy degree between the mutual information form minimizing different cameras, avoid producing the angle same problem of selecting a plurality of optimum video cameras to bring easily under the same standard; 4, off-line learning visual dictionary and construct vision histogram under the different visual angles makes and the target of a plurality of video cameras of same target is observed image sets up related; 5, choose SIFT local feature description as the vision lemma, effectively reduced the impact of the factors such as convergent-divergent, illumination and visual angle in the different cameras.
Description of drawings
Below in conjunction with the drawings and specific embodiments the present invention is done further to specify, above-mentioned and/or otherwise advantage of the present invention will become apparent.
Fig. 1 is handling process schematic diagram of the present invention.
Fig. 2 a ~ 2d is an embodiment video camera the 180th frame video frame images.
Fig. 3 a ~ 3d is for carrying out the moving image after step 11 motion detection is processed to Fig. 2 a ~ 2d.
Fig. 4 a ~ 4d processes the rear image that obtains target according to motion detection result for Fig. 2 a ~ 2d being carried out step 11.
Fig. 5 a ~ Fig. 5 d is that Fig. 4 a ~ 4d carries out the vision histogram that obtains after step 13 is processed.
Fig. 6 a ~ Fig. 6 h is 8 camera video two field pictures of second embodiment the 62nd frame.
Fig. 7 a~Fig. 7 h is that Fig. 6 a~Fig. 6 h carries out the image that step 11 is processed rear moving target.
Embodiment:
The invention discloses a kind of video camera system of selection according to information redundancy between video camera information gain and the video camera, may further comprise the steps:
Step 1, target image vision histogram generates online: a plurality of video camera FOVs are overlapping in the supposition system, therefore can observe simultaneously the moving target in the scene.At first the multi-path video data that has target of online acquisition carried out motion detection, determine that by testing result target is at the subregion in video frame images space; Object region is carried out local feature to be extracted; In conjunction with the visual dictionary that training in advance generates, calculate the vision histogram of object region under this visual angle.Specifically comprise following content:
Step 11, Video Motion Detection: the video data to each video camera input adopts respectively mixed Gauss model to carry out Video Motion Detection, every frame testing result is eliminated the shade that is produced by target based on texture method in scene, extract moving target in the zone of image space.
Step 12, area image local feature description extracts: the motion target area image that step 11 is extracted extracts the set of SIFT Feature Descriptor vector;
Step 13, the vision histogram generates: the cluster centre that obtains with calculated off-line (among the present invention is divided into feature space several little intervals as the histogram bucket, each interval is a histogram bucket), the motion target area image SIFT Feature Descriptor vector that step 12 is extracted incorporates in the corresponding bucket, add up respectively descriptor vector number in each barrel, to the histogram normalized, generate thus the vision histogram of moving target under a plurality of visual angles at last.
Step 2, sequential forward direction video camera is selected: select an optimum visual angle, i.e. optimum video camera; In non-selected video camera set, the information gain of target image and candidate's video camera are to selecting the mutual information of video camera set in the vision histogram calculation candidate camera video that calculates according to step 1, selection is large and be the less suboptimum video camera of mutual information with selecting the camera review similarity less to target observed information gain, the video camera set has been selected in its adding, and from the set of candidate's video camera, rejected; Constantly repeat above-mentioned steps until selected video camera counting reaches preset value.
Step 21, initialization is selected: have video camera set C={c in the visually-perceptible network scenarios 1, c 2... c mObserving simultaneously moving target, m is the sum of video camera, has selected the video camera set
Figure BDA00002468276900081
Candidate's video camera set C u=C merges the constant transform characteristics descriptor vector of the yardstick set of all candidate's video cameras, generates the vision histogram H after merging Merge
Step 22, optimum video camera is selected: from candidate's camera chain C uIn optimum video camera c of Standard Selection such as comprehensive people's face testing result, the gain of motion target area image information and definition *, it is added in the selected video camera set, i.e. C s={ c *, from the set of candidate's video camera, reject simultaneously, i.e. C u=C u{ c *, the selected video camera counting of initial setting up count value is 1, its concrete steps are:
Step 221, people's face detect: utilize AdaBoost(Adaptive Boosting, improved Weak Classifier algorithm) human-face detector carries out people's face to candidate's video camera c' motion target area image and detects, and testing result is V Face=0,1}, 1 expression detects people's face, otherwise is 0;
Step 222, the gain of motion target area image information is calculated: to the vision histogram H of video camera c' C 'With merging after-vision histogram H MergeCalculate visual information amount information gain V when selecting video camera c' and not selecting this video camera IG, namely
V IG = Σ k p ( b k , c ′ ) log ( p ( b k , c ′ ) p ( b k ) p ( c ′ ) ) + Σ k p ( b k , c ′ ‾ ) log ( p ( b k , c ′ ‾ ) p ( b k ) p ( c ′ ‾ ) ) ,
P (b wherein k, c') for selecting video camera c' and vision histogram H MergeIn k the bucket joint probability,
Figure BDA00002468276900092
Video camera c' and vision histogram H are not selected in expression MergeIn the joint probability of k histogram bucket, p (b k) be k the bucket probability, p (c') and The probability of video camera c' is selected and is not selected in expression respectively, all according to histogram H C 'And H MergeCalculate;
Step 223, target image sharpness computation: to the gradient of target area image
Figure BDA00002468276900094
Calculating average gradient size
Figure BDA00002468276900095
Characterize the readability of target in image.
Step 224, optimum video camera is selected: establish weight coefficient α 1, α 2, α 3, α 1+ α 2+ α 3=1, select video camera c *As optimum video camera, make It is 1 that selected video camera counting count value is set;
Step 23, the suboptimum video camera is selected: to selecting the video camera set
Figure BDA00002468276900097
The information gain of each iterative computation candidate video camera and to selecting the vision histogram mutual information of video camera is selected suboptimum video camera c *, join and select the video camera set and gather C from candidate's video camera uMiddle rejecting, and increase selected video camera counting, i.e. count=count+1;
Step 231, target area image information gain IG among the method calculated candidate video camera c' of employing step 222 C '
Step 232, candidate's video camera is in selecting the video camera mutual information to calculate: the vision histogram H of calculated candidate video camera target area image c' C'With video camera c in the selected video camera set j, c j∈ C sThe vision histogram
Figure BDA00002468276900098
Between mutual information MI (c', c j) target area image vision content similarity degree between two video cameras of expression
MI ( c ′ , c j ) = Σ x = 1 n c ′ Σ y = 1 n c j p ( H c ′ x , H c j y ) log ( p ( H c ′ x , H c j y ) p ( H c ′ x ) p ( H c j y ) ) ,
Wherein,
Figure BDA00002468276900102
Be histogram H C'X bucket,
Figure BDA00002468276900103
Be histogram
Figure BDA00002468276900104
Y bucket, n C 'Vision histogram H for candidate's video camera c' C'The sum of bucket,
Figure BDA00002468276900105
For selecting video camera c jThe vision histogram
Figure BDA00002468276900106
The sum of bucket;
Step 233, the suboptimum video camera is selected: establish weight coefficient β, video camera c ' is selected in 0≤β≤1 *As optimum video camera, make c ′ * = arg max c ′ ( IG c ′ - β Σ c j ∈ C s MI ( c ′ , c j ) ) ;
Step 234 is to the suboptimum video camera c that chooses *, video camera set C has been selected in its adding s=C s∪ c *, from the set of candidate's video camera, reject this video camera C simultaneously u=C u{ c *, and increase selected video camera counting, i.e. count=count+1.
Step 24, repeating step 23 is until selected video camera counting count reaches predefined video camera counting n.
Visual dictionary off-line training: to the multi-channel video two field picture as training data of input, at first extract the constant transform characteristics of yardstick (the Scale Invariant Feature Transform of every width of cloth image, SIFT) local feature description's subvector set, the k-mean cluster is carried out in the SIFT Feature Descriptor set of then all images being extracted, each cluster centre is a descriptor vector, be regarded as a visual word, the set of the visual word that obtains consists of the visual dictionary of off-line training.Specifically comprise following steps:
Extract image SIFT Feature Descriptor vector: to every frame input video two field picture, adopt respectively Gauss's template that image is carried out filtering and ask for x and y direction gradient component I xAnd I y, and with this calculating pixel point gradient magnitude and direction
Figure BDA00002468276900108
θ (x, y)=arctan (I y, I x); From the image upper left corner, get the window of 16 * 16 sizes as the feature extraction sampling window in 8 pixels of image x and the every interval of y direction, window is divided into 4 * 4 square nets zone, sampled point in each zone is calculated respectively gradient relative direction with the sampling window center, the gradient magnitude of sampled point is included into respectively gradient orientation histogram on interior 8 directions in zone after by distance Gauss weighting, the characteristic vectors that it is 128 dimensions that each sampling window generates one 4 * 4 * 8 dimension are carried out normalization to the gained characteristic vector and are formed window local feature description subvector; The descriptor vector that every width of cloth image calculation is obtained adds Feature Descriptor set F={f (1), f (2), f (3)... f (t), f (i)∈ R 128, 1≤i≤t, wherein f (i)For this width of cloth image feature descriptor is gathered i descriptor vector, R 128Represent that this vector dimension is 128 dimensions, the Feature Descriptor vector sum that t extracts for this width of cloth image;
Characteristic vector is carried out the k-mean cluster: the SIFT Feature Descriptor vector that two field picture extracts is gathered F, choose at random in the set k vector and be initial cluster center, after all characteristic vectors are carried out clustering by cluster centre, recomputate new cluster centre, constantly iteration is until reach the iterations restriction or cluster centre changes less than certain threshold value, setting of the present invention when iterations reach 100 times or the cluster centre distance less than 0.02 as stopping iterated conditional.
Visual dictionary consists of: each cluster centre is considered as a visual word, obtains and store the set of visual word, consist of the visual dictionary of off-line training.
Embodiment 1
Present embodiment comprises that off-line training generates the video camera selection of visual dictionary, the online generation of target image vision histogram and sequential forward direction, its process chart as shown in Figure 1, whole method is divided into online target image vision histogram generation and video camera is selected two key steps, and the below introduces respectively the main flow process of each embodiment part.
1, target image vision histogram generates online
In order to set up the information association between each video camera, present embodiment is at first chosen multi-path video data under the Same Scene, extract the local feature information in the frame of video, local feature vectors is carried out cluster, with the visual dictionary of cluster centre as the off-line training generation, so that Online Video can generate corresponding vision histogram and it is carried out related information relatively according to visual dictionary.Because the SIFT local feature can overcome the visual effect difference that Target Factor illumination under a plurality of visual angles, convergent-divergent and visual angle produce preferably, therefore present embodiment extracts the SIFT characteristic vector of the multichannel visual angle training video image of input, it is carried out the k-mean cluster, finally generate visual dictionary.Concrete steps are:
Extract image SIFT Feature Descriptor vector: to every frame input video two field picture, adopt respectively Gauss's template G_x, G_y carries out filtering to image and asks for x and y direction gradient component I xAnd I y, wherein
G _ x = 0.0067 0.0085 0 - 0.0085 - 0.0067 0.0964 0.1224 0 - 0.1224 0.0964 0.2344 0.2977 0 - 0.2977 - 0.2344 0.0964 0.1224 0 - 0.1224 - 0.0964 0.0067 0.0085 0 - 0.0085 - 0.0067 ,
G _ y = 0.0067 0.0964 0.2344 0.0964 0.0067 0.0085 0.1224 0.2977 0.1224 0.0085 0 0 0 0 0 - 0.0085 0.1224 - 0.2977 - 0.1224 - 0.0085 - 0.0067 - 0.0964 - 0.2344 - 0.0964 - 0.0067 ,
And with this calculating pixel point gradient magnitude and direction
Figure BDA00002468276900122
θ (x, y)=arctan (I y, I x); From the image upper left corner, get the window of 16 * 16 sizes as the feature extraction sampling window in 8 pixels of image x and the every interval of y direction, window is divided into 4 * 4 square nets zone, sampled point in each zone is calculated respectively gradient relative direction with the sampling window center, the gradient magnitude of sampled point is included into respectively in the zone 0 after by distance Gauss weighting
Figure BDA00002468276900123
π,
Figure BDA00002468276900124
Gradient orientation histogram on totally 8 directions, each sampling window generate the i.e. characteristic vectors of 128 dimensions of one 4 * 4 * 8 dimension, the gained characteristic vector is carried out normalization form window local feature description subvector; The descriptor vector that every width of cloth image calculation is obtained adds Feature Descriptor set F={f (1), f (2), f (3)... f (t), to each f (i)∈ R 128, f wherein (i)Gather i descriptor vector for this width of cloth image feature descriptor, this vector dimension is 128 dimensions, the Feature Descriptor vector sum that t extracts for this width of cloth image.
Characteristic vector is carried out the k-mean cluster: to the SIFT Feature Descriptor vector set F that two field picture extracts, establishing the cluster centre number is k, carries out as follows the k-mean cluster:
Cluster centre is selected: choose at random k the subvector { μ of local feature description from training sample SIFT local feature description subclass (1), μ (2)... μ (k), as the center of k cluster;
Clustering: to residue descriptor vector f in the Feature Descriptor set (i), calculate it to each cluster centre μ (j)Squardx 2Distance
Figure BDA00002468276900125
F wherein l (i)Be the descriptor vector f (i)L component is with the descriptor vector f (i)Incorporate in the cluster with minimum range d;
Recomputate cluster centre: according to cluster result, calculate the average of each dimension of all elements in k the cluster, as the new center of cluster;
Again cluster: with Feature Descriptor gather all elements among the F by the new center of cluster according to the minimum distance criterion in the step 122 again cluster;
The iterative computation cluster centre and according to new center again to Feature Descriptor set carry out cluster, until iterations reaches centre distance old before predefined iterations or new center and the iteration less than setting threshold, setting of the present invention when iterations reach 100 times or the cluster centre distance less than 0.02 as stopping iterated conditional.
Visual dictionary consists of: each cluster centre is considered as a visual word, obtains and store the set of visual word, consist of the visual dictionary of off-line training.
For setting up target image statistical representation model under each video camera, the present invention is to the multiple paths of video images of online input, extracts the area image of moving target under the visual angle separately, reduce because of the different visual effects that produce of background different; The target area image of extracting is extracted the SIFT local feature and generated vision histogram under each visual angle according to the visual dictionary of off-line training.Concrete steps are:
Step 11, Video Motion Detection: the video data to each video camera input carries out respectively Video Motion Detection, extracts moving target in the zone of image space.Concrete steps are as follows:
Step 111, the sport foreground image extracts: adopt mixed Gauss model (GMM) that video camera sequential input picture is carried out background modeling and foreground extraction.Concrete steps are as follows:
Step 1111, initial setting up: the first frame that will input is made as background, and Gauss's number, background threshold and the window size of mixed Gauss model are set.
Step 1112, the frame video image input model with new upgrades background, extracts the present frame foreground image.
Step 112, shade is eliminated: to each frame foreground image that step 111 obtains, carry out target shadow elimination operation in the foreground image.Concrete steps are as follows:
Step 1121 adopts respectively Gauss's template G_x, and G_y calculates in the x and y direction gradient I of original image I xAnd I y
Step 1122 adopts step 1121 same procedure to calculate background image I bGradient I in the x and y direction BxAnd I By
Step 1123, computed image I and background image I bThe gradient vector included angle cosine, cos θ = I x I bx + I y I by ( I x 2 + I y 2 ) ( I bx 2 + I by 2 ) .
Step 1124 is to image mid point (x, y) compute gradient texture in 5 * 5 scopes.
S ( x , y ) = Σ x - 2 x + 2 Σ y - 2 y + 2 ( 2 · ( I x 2 + I y 2 ) ( I bx 2 + I by 2 ) ) cos θ Σ x - 2 x + 2 Σ y - 2 y + 2 ( I x 2 + I y 2 + I bx 2 + I by 2 ) .
Step 1125, when S (x, y) greater than certain threshold value, and point (x, y) motion detection result is when being the foreground point, then point (x, y) should partly be removed from prospect as shade.
Step 113, the movement destination image extracted region: the foreground image to behind the elimination shade, carry out rim detection, extract object boundary, obtain target at the boundary rectangle of image space, take this rectangle as the area image of template extraction moving target in the original video two field picture.
Step 12, area image local feature description extracts: the motion target area image that step 11 is extracted extracts the set of SIFT Feature Descriptor vector.
Step 13, the vision histogram generates: the visual word in the visual dictionary that obtains with calculated off-line is as the histogram bucket, the motion target area image SIFT Feature Descriptor vector that step 12 is extracted incorporates in the corresponding bucket, add up respectively descriptor vector number in each barrel, to the histogram normalized, generate thus the vision histogram of moving target under a plurality of visual angles at last.
2, video camera is selected
At each select time point, it is a select time point that present embodiment arranges intervals of video 10 frames, according to whether detect under the visual angle gain of people's face, target area visual information what and should select an optimum visual angle take average gradient as the image definition that characterizes in the zone, be optimum video camera, to guarantee to obtain front, more clear and level of detail is selected as far as possible than the video camera of hi-vision; In non-selected video camera set, information gain and the mutual information of candidate's video camera to selecting video camera to gather according to target image in the vision histogram calculation candidate camera video that calculates online, selection is large and be the less suboptimum video camera of mutual information with selecting the camera review similarity less to target observed information gain, the video camera set has been selected in its adding, and from the set of candidate's video camera, rejected; Constantly repeat above-mentioned steps until selected video camera counting reaches preset value.Its concrete steps are as follows:
Step 21, initialization is selected: have video camera set C={c in the scene 1, c 2... c mObserve simultaneously moving target, selecteed video camera is gathered
Figure BDA00002468276900142
Candidate's video camera set C u=C merges the SIFT Feature Descriptor vector set of all candidate's video cameras, the vision histogram H after merging by step 13 generation Merge
Step 22, optimum video camera is selected: from candidate's camera chain C uIn optimum video camera c of Standard Selection such as comprehensive people's face testing result, the gain of motion target area image information and definition *, it is added in the selected video camera set, i.e. C s={ c *, from the set of candidate's video camera, reject simultaneously, i.e. C u=C u{ c *, it is 1 that selected video camera counting count value is set, its concrete steps are:
Step 221, people's face detect: utilize AdaBoost(Adaptive Boosting, improved Weak Classifier algorithm) human-face detector carries out people's face to candidate's video camera c' motion target area image and detects, and testing result is V Face=0,1}, 1 expression detects people's face, otherwise is 0;
Step 222, the gain of motion target area image information is calculated: to the vision histogram H of video camera c' C 'With merging after-vision histogram H MergeCalculate visual information amount information gain V when selecting video camera c' and not selecting this video camera IG, namely
V IG = Σ k p ( b k , c ′ ) log ( p ( b k , c ′ ) p ( b k ) p ( c ′ ) ) + Σ k p ( b k , c ′ ‾ ) log ( p ( b k , c ′ ‾ ) p ( b k ) p ( c ′ ‾ ) ) ,
P (b wherein k, c') for selecting video camera c' and vision histogram H MergeIn k the bucket joint probability,
Figure BDA00002468276900152
Video camera c' and vision histogram H are not selected in expression MergeIn the joint probability of k histogram bucket, p (b k) be k the bucket probability, p (c') and
Figure BDA00002468276900153
The probability of video camera c' is selected and is not selected in expression respectively, all according to histogram H C 'And H MergeCalculate;
Step 223, target image sharpness computation: to the gradient of target area image Calculating average gradient size
Figure BDA00002468276900155
Characterize the readability of the observed target of video camera in video image, wherein N xBe the width of video image, N yHeight for video image.
Step 224, optimum video camera is selected: establish weight coefficient α 1, α 2, α 3, α 1+ α 2+ α 3=1, select video camera c *As optimum video camera, make
Figure BDA00002468276900156
Present embodiment arranges α 1=0.3, α 2=0.4, α 3=0.3;
Step 23, the suboptimum video camera is selected: to selecting the video camera set
Figure BDA00002468276900157
Follow these steps to from candidate's video camera set C uIn choose a suboptimum video camera:
Step 231, target area image information gain IG among the method calculated candidate video camera c' of employing step 222 C '
Step 232, candidate's video camera is in selecting the video camera mutual information to calculate: the vision histogram H of calculated candidate video camera target area image c' C'With video camera c in the selected video camera set j, c j∈ C sThe vision histogram Between mutual information MI (c', c j) target area image vision content similarity degree between two video cameras of expression
MI ( c ′ , c j ) = Σ x = 1 n c ′ Σ y = 1 n c j p ( H c ′ x , H c j y ) log ( p ( H c ′ x , H c j y ) p ( H c ′ x ) p ( H c j y ) ) ,
Wherein,
Figure BDA00002468276900163
Be histogram H C'X bucket,
Figure BDA00002468276900164
Be histogram
Figure BDA00002468276900165
Y bucket, n C 'Vision histogram H for candidate's video camera c' C'The sum of bucket, For selecting video camera c jThe vision histogram
Figure BDA00002468276900167
The sum of bucket;
Step 233, the suboptimum video camera is selected: establish weight coefficient β, video camera c ' is selected in 0≤β≤1 *As optimum video camera, make c ′ * = arg max c ′ ( IG c ′ - β Σ c j ∈ C s MI ( c ′ , c j ) ) , Present embodiment arranges β=0.5;
Step 234 is to the suboptimum video camera c that chooses *, video camera set C has been selected in its adding s=C s∪ c *, from the set of candidate's video camera, reject this video camera C simultaneously u=C u{ c *, and increase selected video camera counting, i.e. count=count+1.
Step 24, repeating step 23 is until selected video camera counting count reaches predefined video camera counting n.
Embodiment 2
Use video camera selective system that this programme realizes to take outdoor scene as main POM data set (Fleuret F, Berclaz J, Lengane R, Fua P.Multi-camera people tracking with a probabilistic occupancymap[J] .IEEE Transaction on Pattern Analysis and Machine Intelligence, 2008.vol30 (2): the Terrace video sequence is selected 267-282), this video sequence disposes 4 video cameras altogether, wherein choose the Terrace2 video sequence as the visual dictionary under this scene of scene training data training generation, carry out the on-line selection test with the Terrace1 video sequence, the 180th frame original image as shown in Figure 2, Fig. 2 a ~ 2d represents respectively video camera C0, C1, the target image that C2, C3 obtain.Fig. 3 a ~ 3d is to Fig. 2 a ~ 2d video camera C0, C1, C2, after the target image that C3 obtains utilizes mixed Gauss model to carry out background modeling and foreground extraction, combined with texture information carries out shade and eliminates detected foreground image, the target area image of Fig. 4 a ~ 4d for from Fig. 2 a ~ 2d, extracting respectively, Fig. 5 a ~ 5d is the normalization after-vision histogram of Fig. 4 a ~ 4d, wherein each histogram is all sorted out the target local feature description subvector that extracts under the corresponding visual angle by visual word in the visual dictionary (being histogrammic bucket), the visual word number of present embodiment visual dictionary is made as 200, the number that is the histogram bucket is to represent a barrel sequence number with the x axial coordinate among the 200(figure), statistics is included in the histogram number of local feature description's subvector carries out normalized divided by the characteristic vector sum in each barrel, show that with the block diagram form local feature description's subvector that is extracted is integrated into the probability distribution in the visual dictionary, the y axial coordinate represents the probability size among the figure, and the probability distribution take bucket as unit character descriptor vector is content shown in each video camera vision histogram of present embodiment.For the target area, this programme detects people's face in the C2 video camera, the factors such as combining information gain and image definition, the result that therefore optimum video camera is selected is C2, selection result shows has obtained the comparatively positive video image of this moment target, based on this, system calculate successively other video cameras information gain and with the mutual information that selects video camera, order of preference is followed successively by C2, C0, C3, C1 is namely when selecting video camera counting m=2, the camera chain selection result is { C2, C0}, when selecting the video camera counting to be made as m=3, selection result is { C2, C0, C3}.
Embodiment 3
Use video camera selective system that this programme realizes to take indoor scene as main i3DPost data set (N.Gkalelis, H.Kim, A.Hilton, N.Nikolaidis, and I.Pitas.The i3dpost multi-view and 3dhuman action/interaction database.In CVMP, 2009.) in video sequence select, this video set disposes 8 video cameras altogether, training generates visual dictionary under this scene as training data wherein to choose Walk video sequence D1-002 and D1-015, carries out the on-line selection test with Run video sequence D1-016, and the 62nd frame original image is shown in Fig. 6 a ~ Fig. 6 h, Fig. 6 a ~ Fig. 6 h represents respectively video camera C0, C1, C2, C3, C4, C5, the image that C6, C7 obtain.Fig. 7 a~Fig. 7 h represents target area under each visual angle after motion detection and the shade elimination, and for the target area, C5 and C6 people's face detected value are 1 in the optimum video camera selection course of this programme, all the other video camera values are 0, integrated information gain and clear picture degree, and it is optimum video camera that system chooses C5, based on this, system calculate successively other video cameras information gain and with the mutual information that selects video camera, order of preference is followed successively by C5, C6, C1, C3, C4, C7, C0, C2, namely when selecting video camera counting m=2, the camera chain selection result is { C5, C6}, when selecting the video camera counting to be made as m=3, selection result is { C5, C6, C1}, when selecting the video camera counting to be made as m=4, selection result is { C5, C6, C1, C3}, the selection result of all the other combined number the like.

Claims (6)

1. the combination system of selection of video camera in the visually-perceptible network is characterized in that, may further comprise the steps:
Step 1, target image vision histogram generates online: in the overlapping situation of a plurality of video camera FOVs, video data to the multichannel video camera of observing same target of online acquisition carries out motion detection, determine that by testing result target at the subregion in video frame images space, namely obtains object region; Object region is carried out local feature to be extracted; According to the visual dictionary that training in advance generates, calculate the vision histogram of object region under this visual angle;
Step 2, sequential forward direction video camera is selected: select an optimum visual angle, i.e. optimum video camera; In non-selected video camera set, the information gain of target image and candidate's video camera are to selecting the mutual information of video camera set in the vision histogram calculation candidate camera video data that calculate according to step 1, select the suboptimum video camera, the video camera set has been selected in its adding, and from the set of candidate's video camera, rejected; Constantly repeat until selected video camera counting reaches the video camera counting that needs.
2. the combination system of selection of video camera in a kind of visually-perceptible network according to claim 1, it is characterized in that, the visual dictionary that described training generates is: to the multi-path video data as training data of input, at first extract the constant transform characteristics of the yardstick local feature description subvector set of every width of cloth image; The k-mean cluster is carried out in the constant transform characteristics descriptor set of yardstick that all images extract; Each cluster centre is a descriptor vector, and as a visual word, the set of the visual word that obtains consists of the visual dictionary of off-line training;
Training generates visual dictionary and specifically may further comprise the steps:
Extract the constant transform characteristics descriptor vector of graphical rule: to every frame video frame images, adopt respectively Gauss's template that image is carried out filtering and ask for x direction and y direction gradient component I xWith gradient component I y, and calculating pixel point gradient magnitude mag (x, y) and direction θ (x, y), wherein
Figure FDA00002468276800011
θ (x, y)=arctan (I y, I x); From the image upper left corner, get the window of 16 * 16 sizes as the feature extraction sampling window in 8 pixels of the x and the every interval of y direction of image, each sampling window is divided into 4 * 4 square nets zone, sampled point in each zone is calculated respectively gradient relative direction with the sampling window center, the gradient magnitude of sampled point is included into respectively gradient orientation histogram on interior 8 directions in zone after by distance Gauss weighting, each sampling window generates the characteristic vector of one 128 dimension, the gained characteristic vector is carried out normalization form window local feature description subvector; The descriptor vector that every width of cloth image calculation is obtained adds Feature Descriptor set F={f (1), f (2), f (3)... f (t), f (i)∈ R 128, 1≤i≤t, wherein f (i)For this width of cloth image feature descriptor is gathered i descriptor vector, R 128Represent that this vector dimension is 128 dimensions, the Feature Descriptor vector sum that t extracts for this width of cloth image;
Characteristic vector is carried out the k-mean cluster: to Feature Descriptor vector set F, choose at random k vector as cluster centre, the distance of all vector distance cluster centres of iterative computation is also carried out clustering, recomputate cluster centre according to dividing the result, until before and after reaching the iterations of regulation or iteration the cluster centre change of distance less than setting threshold.
Visual dictionary consists of: each cluster centre as a visual word, is obtained and store the set of visual word, consist of visual dictionary.
3. the combination system of selection of video camera in a kind of visually-perceptible network according to claim 2 is characterized in that the online generation of described step 1 target image vision histogram specifically may further comprise the steps:
Step 11, video frequency motion target detects: the video data to each video camera input carries out respectively Video Motion Detection based on mixed Gauss model, and motion detection result is eliminated target shadow based on texture information, finally extracts moving target in the zone of image space;
Step 12, area image local feature description extracts: the motion target area image that step 11 is extracted extracts the set of the constant transform characteristics descriptor vector of yardstick;
Step 13, the vision histogram generates: the cluster centre of the visual dictionary that generates with training in advance is as a histogram bucket, the constant transform characteristics descriptor of the yardstick vector of the motion target area image that step 12 is extracted incorporates in the corresponding bucket of histogram, add up respectively descriptor vector number in each histogram bucket, to the histogram normalized, generate the vision histogram of moving target under a plurality of visual angles at last.
4. the combination system of selection of video camera in a kind of visually-perceptible network according to claim 3 is characterized in that, the sequential forward direction video camera of described step 2 is selected specifically to may further comprise the steps:
Step 21, initialization is selected: have video camera set C={c in the visually-perceptible network scenarios 1, c 2... c mObserving simultaneously moving target, m is the sum of video camera, has selected the video camera set
Figure FDA00002468276800021
Candidate's video camera set C u=C merges the constant transform characteristics descriptor vector of the yardstick set of all candidate's video cameras, generates the vision histogram H after merging Merge
Step 22, optimum video camera is selected: from candidate's camera chain C uOptimum video camera c of middle selection *, video camera set C has been selected in its adding s, i.e. C s={ c *, from the set of candidate's video camera, reject video camera c simultaneously *, i.e. C u=C u{ c *, the selected video camera counting of initial setting up count value is 1;
Step 23, the suboptimum video camera is selected: to selecting the video camera set
Figure FDA00002468276800031
The information gain of each iterative computation candidate video camera and to selecting the vision histogram mutual information of video camera is selected suboptimum video camera c *, join and select the video camera set and gather C from candidate's video camera uMiddle rejecting, and increase selected video camera counting, i.e. count=count+1;
Step 24, repeating step 23 is until selected video camera counting count reaches predefined video camera counting n.
5. the combination system of selection of video camera in a kind of visually-perceptible network according to claim 4 is characterized in that, the optimum video camera of described step 22 is selected specifically to may further comprise the steps:
Step 221, people's face detects: utilize human-face detector that candidate's video camera c' motion target area image is carried out people's face and detect, testing result is V Face=0,1}, 1 expression detects people's face, otherwise is 0;
Step 222, the gain of motion target area image information is calculated: to the vision histogram H of video camera c' c' and merge after-vision histogram H MergeCalculate visual information amount information gain V when selecting video camera c' and not selecting video camera c' IG, namely
Figure FDA00002468276800032
P (b wherein k, c') for selecting video camera c' and vision histogram H MergeIn the joint probability of k histogram bucket,
Figure FDA00002468276800033
Video camera c' and vision histogram H are not selected in expression MergeIn the joint probability of k histogram bucket, p (b k) be k histogram bucket probability, p (c') and
Figure FDA00002468276800034
The probability of video camera c' is selected and is not selected in expression respectively;
Step 223, target image sharpness computation: to the gradient calculation average gradient size of target area image
Figure FDA00002468276800035
Characterize the readability of target in image;
Step 224, optimum video camera is selected: establish weight coefficient α 1, α 2, α 3, α 1+ α 2+ α 3=1, select video camera c *As optimum video camera, make
Figure 20121048843411000011
6. the combination system of selection of video camera in a kind of visually-perceptible network according to claim 5 is characterized in that, described step 23 suboptimum video camera is selected specifically to may further comprise the steps:
Step 231, target area image information gain IG among the method calculated candidate video camera c' of employing step 222 C '
Step 232, candidate's video camera is in selecting the video camera mutual information to calculate: the vision histogram H of target area image among the calculated candidate video camera c' C'With select video camera set C sMiddle video camera c jThe vision histogram
Figure FDA00002468276800041
Between mutual information MI (c', c j),, c j∈ C s, MI (c', c j) target area image vision content similarity degree between two video cameras of expression:
Figure FDA00002468276800042
Wherein,
Figure FDA00002468276800043
Be histogram H C'X bucket,
Figure FDA00002468276800044
Be histogram Y bucket, n C'Vision histogram H for candidate's video camera c' C'The sum of bucket,
Figure FDA00002468276800046
For selecting video camera c jThe vision histogram The sum of bucket;
Step 233, the suboptimum video camera is selected: establish weight coefficient β, video camera c ' is selected in 0≤β≤1 *As optimum video camera, make
Figure FDA00002468276800048
Step 234 is to the suboptimum video camera c that chooses *, video camera set C has been selected in its adding s=C s∪ c *, from the set of candidate's video camera, reject this video camera C simultaneously u=C u{ c *.
CN201210488434.1A 2012-11-26 2012-11-26 Method for selecting camera combination in visual perception network Expired - Fee Related CN102932605B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210488434.1A CN102932605B (en) 2012-11-26 2012-11-26 Method for selecting camera combination in visual perception network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210488434.1A CN102932605B (en) 2012-11-26 2012-11-26 Method for selecting camera combination in visual perception network

Publications (2)

Publication Number Publication Date
CN102932605A true CN102932605A (en) 2013-02-13
CN102932605B CN102932605B (en) 2014-12-24

Family

ID=47647293

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210488434.1A Expired - Fee Related CN102932605B (en) 2012-11-26 2012-11-26 Method for selecting camera combination in visual perception network

Country Status (1)

Country Link
CN (1) CN102932605B (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103294813A (en) * 2013-06-07 2013-09-11 北京捷成世纪科技股份有限公司 Sensitive image search method and device
CN104918011A (en) * 2015-05-29 2015-09-16 华为技术有限公司 Method and device for playing video
CN104915677A (en) * 2015-05-25 2015-09-16 宁波大学 Three-dimensional video object tracking method
CN106778777A (en) * 2016-11-30 2017-05-31 成都通甲优博科技有限责任公司 A kind of vehicle match method and system
CN107111664A (en) * 2016-08-09 2017-08-29 深圳市瑞立视多媒体科技有限公司 A kind of video camera collocation method and device
CN107888897A (en) * 2017-11-01 2018-04-06 南京师范大学 A kind of optimization method of video source modeling scene
CN108234900A (en) * 2018-02-13 2018-06-29 深圳市瑞立视多媒体科技有限公司 A kind of camera configuration method and apparatus
CN108449551A (en) * 2018-02-13 2018-08-24 深圳市瑞立视多媒体科技有限公司 A kind of camera configuration method and apparatus
CN108471496A (en) * 2018-02-13 2018-08-31 深圳市瑞立视多媒体科技有限公司 A kind of camera configuration method and apparatus
CN108491857A (en) * 2018-02-11 2018-09-04 中国矿业大学 A kind of multiple-camera target matching method of ken overlapping
CN108495057A (en) * 2018-02-13 2018-09-04 深圳市瑞立视多媒体科技有限公司 A kind of camera configuration method and apparatus
CN108875507A (en) * 2017-11-22 2018-11-23 北京旷视科技有限公司 Pedestrian tracting method, equipment, system and computer readable storage medium
CN109493634A (en) * 2018-12-21 2019-03-19 深圳信路通智能技术有限公司 A kind of parking management system and method based on multiple-equipment team working
CN109639961A (en) * 2018-11-08 2019-04-16 联想(北京)有限公司 Acquisition method and electronic equipment
CN110505397A (en) * 2019-07-12 2019-11-26 北京旷视科技有限公司 The method, apparatus and computer storage medium of camera selection
CN111447404A (en) * 2019-01-16 2020-07-24 杭州海康威视数字技术股份有限公司 Video camera
CN111866468A (en) * 2020-07-29 2020-10-30 浙江大华技术股份有限公司 Object tracking distribution method and device, storage medium and electronic device
CN114900602A (en) * 2022-06-08 2022-08-12 北京爱笔科技有限公司 Video source camera determining method and device
CN117750040A (en) * 2024-02-20 2024-03-22 浙江宇视科技有限公司 Video service balancing method, device, equipment and medium of intelligent server cluster

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080075361A1 (en) * 2006-09-21 2008-03-27 Microsoft Corporation Object Recognition Using Textons and Shape Filters
CN102081666A (en) * 2011-01-21 2011-06-01 北京大学 Index construction method for distributed picture search and server
CN102208038A (en) * 2011-06-27 2011-10-05 清华大学 Image classification method based on visual dictionary
CN102509110A (en) * 2011-10-24 2012-06-20 中国科学院自动化研究所 Method for classifying images by performing pairwise-constraint-based online dictionary reweighting
CN102609732A (en) * 2012-01-31 2012-07-25 中国科学院自动化研究所 Object recognition method based on generalization visual dictionary diagram
CN102693311A (en) * 2012-05-28 2012-09-26 中国人民解放军信息工程大学 Target retrieval method based on group of randomized visual vocabularies and context semantic information

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080075361A1 (en) * 2006-09-21 2008-03-27 Microsoft Corporation Object Recognition Using Textons and Shape Filters
CN102081666A (en) * 2011-01-21 2011-06-01 北京大学 Index construction method for distributed picture search and server
CN102208038A (en) * 2011-06-27 2011-10-05 清华大学 Image classification method based on visual dictionary
CN102509110A (en) * 2011-10-24 2012-06-20 中国科学院自动化研究所 Method for classifying images by performing pairwise-constraint-based online dictionary reweighting
CN102609732A (en) * 2012-01-31 2012-07-25 中国科学院自动化研究所 Object recognition method based on generalization visual dictionary diagram
CN102693311A (en) * 2012-05-28 2012-09-26 中国人民解放军信息工程大学 Target retrieval method based on group of randomized visual vocabularies and context semantic information

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
赵春晖,等: "一种基于词袋模型的图像优化分类方法", 《电子与信息学报》 *
赵春晖,等: "一种改进的K-means聚类视觉词典构造方法", 《仪器仪表学报》 *

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103294813A (en) * 2013-06-07 2013-09-11 北京捷成世纪科技股份有限公司 Sensitive image search method and device
CN104915677A (en) * 2015-05-25 2015-09-16 宁波大学 Three-dimensional video object tracking method
CN104915677B (en) * 2015-05-25 2018-01-05 宁波大学 A kind of 3 D video method for tracking target
CN104918011B (en) * 2015-05-29 2018-04-27 华为技术有限公司 A kind of method and device for playing video
CN104918011A (en) * 2015-05-29 2015-09-16 华为技术有限公司 Method and device for playing video
CN107111664A (en) * 2016-08-09 2017-08-29 深圳市瑞立视多媒体科技有限公司 A kind of video camera collocation method and device
CN107111664B (en) * 2016-08-09 2018-03-06 深圳市瑞立视多媒体科技有限公司 A kind of video camera collocation method and device
CN106778777A (en) * 2016-11-30 2017-05-31 成都通甲优博科技有限责任公司 A kind of vehicle match method and system
CN107888897A (en) * 2017-11-01 2018-04-06 南京师范大学 A kind of optimization method of video source modeling scene
CN107888897B (en) * 2017-11-01 2019-11-26 南京师范大学 A kind of optimization method of video source modeling scene
CN108875507A (en) * 2017-11-22 2018-11-23 北京旷视科技有限公司 Pedestrian tracting method, equipment, system and computer readable storage medium
CN108491857B (en) * 2018-02-11 2022-08-09 中国矿业大学 Multi-camera target matching method with overlapped vision fields
CN108491857A (en) * 2018-02-11 2018-09-04 中国矿业大学 A kind of multiple-camera target matching method of ken overlapping
CN108495057B (en) * 2018-02-13 2020-12-08 深圳市瑞立视多媒体科技有限公司 Camera configuration method and device
CN108234900B (en) * 2018-02-13 2020-11-20 深圳市瑞立视多媒体科技有限公司 Camera configuration method and device
CN108471496A (en) * 2018-02-13 2018-08-31 深圳市瑞立视多媒体科技有限公司 A kind of camera configuration method and apparatus
CN108234900A (en) * 2018-02-13 2018-06-29 深圳市瑞立视多媒体科技有限公司 A kind of camera configuration method and apparatus
CN108495057A (en) * 2018-02-13 2018-09-04 深圳市瑞立视多媒体科技有限公司 A kind of camera configuration method and apparatus
CN108449551A (en) * 2018-02-13 2018-08-24 深圳市瑞立视多媒体科技有限公司 A kind of camera configuration method and apparatus
CN108449551B (en) * 2018-02-13 2020-11-03 深圳市瑞立视多媒体科技有限公司 Camera configuration method and device
CN108471496B (en) * 2018-02-13 2020-11-03 深圳市瑞立视多媒体科技有限公司 Camera configuration method and device
CN109639961A (en) * 2018-11-08 2019-04-16 联想(北京)有限公司 Acquisition method and electronic equipment
CN109493634A (en) * 2018-12-21 2019-03-19 深圳信路通智能技术有限公司 A kind of parking management system and method based on multiple-equipment team working
CN111447404B (en) * 2019-01-16 2022-02-01 杭州海康威视数字技术股份有限公司 Video camera
CN111447404A (en) * 2019-01-16 2020-07-24 杭州海康威视数字技术股份有限公司 Video camera
CN110505397B (en) * 2019-07-12 2021-08-31 北京旷视科技有限公司 Camera selection method, device and computer storage medium
CN110505397A (en) * 2019-07-12 2019-11-26 北京旷视科技有限公司 The method, apparatus and computer storage medium of camera selection
CN111866468A (en) * 2020-07-29 2020-10-30 浙江大华技术股份有限公司 Object tracking distribution method and device, storage medium and electronic device
CN111866468B (en) * 2020-07-29 2022-06-24 浙江大华技术股份有限公司 Object tracking distribution method, device, storage medium and electronic device
CN114900602A (en) * 2022-06-08 2022-08-12 北京爱笔科技有限公司 Video source camera determining method and device
CN114900602B (en) * 2022-06-08 2023-10-17 北京爱笔科技有限公司 Method and device for determining video source camera
CN117750040A (en) * 2024-02-20 2024-03-22 浙江宇视科技有限公司 Video service balancing method, device, equipment and medium of intelligent server cluster
CN117750040B (en) * 2024-02-20 2024-06-07 浙江宇视科技有限公司 Video service balancing method, device, equipment and medium of intelligent server cluster

Also Published As

Publication number Publication date
CN102932605B (en) 2014-12-24

Similar Documents

Publication Publication Date Title
CN102932605B (en) Method for selecting camera combination in visual perception network
CN110363122B (en) Cross-domain target detection method based on multi-layer feature alignment
CN111899172A (en) Vehicle target detection method oriented to remote sensing application scene
CN103824070B (en) A kind of rapid pedestrian detection method based on computer vision
CN101950426B (en) Vehicle relay tracking method in multi-camera scene
CN110853032B (en) Unmanned aerial vehicle video tag acquisition method based on multi-mode deep learning
CN112084869B (en) Compact quadrilateral representation-based building target detection method
CN110263712B (en) Coarse and fine pedestrian detection method based on region candidates
CN110688905B (en) Three-dimensional object detection and tracking method based on key frame
CN108564598B (en) Improved online Boosting target tracking method
CN103856727A (en) Multichannel real-time video splicing processing system
CN104517095B (en) A kind of number of people dividing method based on depth image
CN102256065A (en) Automatic video condensing method based on video monitoring network
Cepni et al. Vehicle detection using different deep learning algorithms from image sequence
CN111368660A (en) Single-stage semi-supervised image human body target detection method
CN103955888A (en) High-definition video image mosaic method and device based on SIFT
CN104834894A (en) Gesture recognition method combining binary coding and Hausdorff-like distance
CN113255608A (en) Multi-camera face recognition positioning method based on CNN classification
CN106529441A (en) Fuzzy boundary fragmentation-based depth motion map human body action recognition method
CN116614705A (en) Coal face camera regulation and control system based on multi-mode video feature analysis
CN106127813B (en) The monitor video motion segments dividing method of view-based access control model energy sensing
CN114170526A (en) Remote sensing image multi-scale target detection and identification method based on lightweight network
CN114495170A (en) Pedestrian re-identification method and system based on local self-attention inhibition
CN110046601B (en) Pedestrian detection method for crossroad scene
CN107730535B (en) Visible light infrared cascade video tracking method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20141224

Termination date: 20181126