CN113192125A

CN113192125A - Multi-camera video concentration method and system in geographic scene with optimal virtual viewpoint

Info

Publication number: CN113192125A
Application number: CN202110327605.1A
Authority: CN
Inventors: 解愉嘉; 毛波; 王崴
Original assignee: Nanjing University of Finance and Economics
Current assignee: Nanjing University of Finance and Economics
Priority date: 2021-03-26
Filing date: 2021-03-26
Publication date: 2021-07-30
Anticipated expiration: 2041-03-26
Also published as: CN113192125B

Abstract

A multi-camera video concentration method and system in a geographic scene with a virtual viewpoint being optimized acquire coordinate data of a same-name point pair by acquiring video sequence image information and acquiring the same-name point pair from a video image and a three-dimensional geographic scene model; establishing a mapping relation between a video image and a geographic space according to coordinate data of the same-name point pair, and positioning a camera view; constructing a camera observation domain model by analyzing the observable distance and the sight line deflection angle, and generating an observable set of a camera group; optimizing the observable set by constructing an evaluation model to obtain a virtual viewpoint group; and presetting display parameters of the moving target, and performing multi-camera video concentration according to the display parameters. The method has the remarkable effects that the mapping relation between the video target and the geographic scene is established, the effect of fusion expression of the monitoring video in the geographic scene is enhanced, and great convenience is provided for the quick retrieval and efficient understanding of the video geographic scene information integration.

Description

Multi-camera video concentration method and system in geographic scene with optimal virtual viewpoint

Technical Field

The invention relates to the technical field of real-time fusion of video streams and three-dimensional models, in particular to a method and a system for concentrating multi-camera videos in a geographic scene with optimal virtual viewpoints.

Background

With the requirement of the Virtual Geographic Environment (VGE) on the accuracy and the real-time performance of scene simulation, the multi-source heterogeneous data is introduced to enhance the visual expression and analysis functions of the VGE. The video data can not only realize the real scene display of the geographic environment, but also describe the space-time motion of moving objects (pedestrians, vehicles and the like) in the geographic scene. When a user views a video in a VGE, the virtual viewpoint is typically selected at a virtual location that is close to the original geographic location of the camera.

However, the conventional virtual viewpoint selecting method is often in actual deployment and use, and has the following difficulties and problems:

first, it is convenient to see a single camera and short duration videos in this way, but if a scene contains multiple videos, and the cameras have different sight directions and non-overlapping and discrete views, it is difficult for a user to view all videos through a single virtual viewpoint. If each path of video is individually arranged with one virtual viewpoint to be watched one by one, the watching duration can be greatly prolonged, and the user is not favorable for quickly watching the video content.

Secondly, video moving objects usually appear in cameras shooting different areas one after another, and each path of video image is viewed independently, so that the cross-camera global motion condition of the moving objects in a scene cannot be expressed.

Therefore, how to effectively select a small number of virtual viewpoints in a virtual scene to quickly view a multi-camera video target and show the cross-camera motion condition of the video target becomes a technical problem to be solved urgently.

Disclosure of Invention

Therefore, the invention provides a multi-camera video concentration method and system in a geographic scene with an optimal virtual viewpoint, which aims to solve the problem that in the prior art, when a user watches videos in a VGE, the virtual viewpoint is generally selected at a virtual position close to the original geographic position of a camera, so that the method is convenient for watching a single camera and short-duration videos, but if the scene contains multiple paths of videos, the sight directions of the cameras are different, and the sight fields are not overlapped and distributed discretely, the user is difficult to watch all the videos through a single virtual viewpoint; if each path of video is individually arranged with one virtual viewpoint to be watched one by one, the watching duration can be greatly prolonged, and the user is not facilitated to quickly view the video content; on the other hand, video moving objects usually appear in cameras shooting different areas in sequence, each path of video image is viewed independently, and the problem that the cross-camera global motion condition of the moving objects in a scene cannot be expressed is solved.

In order to achieve the above purpose, the invention provides the following technical scheme: in a first aspect, a method for concentrating multi-camera video in a geographic scene with a preferred virtual viewpoint is provided, which includes the following steps:

acquiring video sequence image information, acquiring homonymous point pairs in a video image and a three-dimensional geographic scene model, and acquiring coordinate data of the homonymous point pairs, wherein the coordinate data comprises image coordinates and geographic coordinates;

establishing a mapping relation between a video image and a geographic space according to coordinate data of the same-name point pair, and positioning a camera view;

constructing a camera observation domain model by analyzing the observable distance and the sight line deflection angle, and generating an observable set of a camera group;

optimizing the observable set by constructing an evaluation model to obtain a virtual viewpoint group;

presetting display parameters of the moving target, and performing multi-camera video concentration according to the display parameters.

As a preferred scheme of the multi-camera video concentration method in the geographic scene with the preferred virtual viewpoint, the video image is a first frame image of the intercepted monitoring video.

As a preferred scheme of the multi-camera video concentration method in the geographic scene with the preferred virtual viewpoint, the three-dimensional geographic scene model is a three-dimensional scene model constructed according to the measurement information of the real geographic scene, the number of the same-name point pairs collected on the video image and the virtual geographic scene is not less than three pairs, and the three pairs of the same-name point pairs are not completely collinear.

As a preferred scheme of the multi-camera video condensing method in the geographic scene with the preferred virtual viewpoint, the establishing of the mapping relationship between the video image and the geographic space includes the following steps:

a1) presetting an object point geographic space coordinate Q corresponding to a given image point image space coordinate Q, and expressing Q and Q as a homogeneous coordinate form:

q＝[x y 1]^T

Q＝[X Y Z 1]^T；

when a homography matrix M is recorded, the relationship between Q and Q is:

q＝MQ；

the expression of the homography matrix M is:

a2) solving the geographic space coordinates of the object points corresponding to the image points in each image:

a3) assuming that there are L cameras in the current video camera network, the mapping matrix is labeled M for the k-th camera (k is 1,2 … L)_k(ii) a Defining each camera position in geographic space as

The geospatial position of each camera field of view polygon is

Wherein the camera position

Considered as a point in the geographic space, the camera view polygon is bounded by o boundary points P_k,numSequentially concatenating the constituent polygon records.

As a preferred scheme of a multi-camera video concentration method in a geographic scene with a preferred virtual viewpoint, in the process of positioning the camera view, two factors of a virtual line-of-sight distance and a camera-virtual viewpoint included angle are selected as constraint conditions;

the virtual line-of-sight distance refers to the geographic space distance between a virtual viewpoint and a certain point in the view field; the camera-virtual viewpoint included angle is an included angle formed by projection of a given point in a visual field, the virtual viewpoint and a camera position point on a horizontal plane;

defining a distance threshold T_disAnd an angle threshold T_angAs a constraint, and assuming a distance threshold T_disAnd an angle threshold T_angIt has been given that finding areas in the scene model that meet the constraints serves as virtual viewpoint ranges.

As a preferred scheme of the multi-camera video concentration method in a geographical scene with a preferred virtual viewpoint, the method for constructing a camera observation domain model by analyzing an observable distance and a view deflection angle and generating an observable set of camera groups comprises the following steps:

b1) recording camera position

Polygon with camera view

And is

In the distance

Is P_k,n1P_k,n2The farthest line segment is P_k,n3P_k,n4；

b2) Respectively at point P_k,n3,P_k,n4As a circle center, a distance threshold value T_disIs a radius on the line segment P_k,n3P_k,n4Towards the camera position

Drawing a semicircle at one side, and taking two semicircles as a line segment P_k,n1P_k,n2Near camera position

One side intersection region as a virtual viewpoint distance reasonable region A_k,dis；

b3) Respectively at point P_k,n1,P_k,n2Being a corner point, an angle threshold T_angFor deflection angle, clockwise and anticlockwise respectively deflect T_angMaking four rays, and taking the four rays to be on the line segment P_k,n1P_k,n2Near camera position

One side intersection region as a virtual viewpoint angle reasonable region A_k,ang；

b4) Virtual viewpoint range A of camera_kIs A_k,disAnd A_k,angThe intersection of (a);

b5) let Obj be the total set of all video moving objects in all video cameras; note that there is N in the kth camera_kEach video moving object has a track marked as C_k,i，C_k,iThe expression of (a) is as follows:

Obj＝{C_k,i,(k＝1,2…L)}

C_k,i＝{I_k,i,j,P_k,i,j,(i＝1,2,…,N_k)(j＝1,2,…,n)}；

wherein L represents the number of cameras, I_k,i,jAnd P_k,i,jRepresenting the sub-image of the ith video moving object in the jth video frame in the kth camera and the geographic space position of the sub-image are analyzed by the cross-camera association of the video moving objects, and the tracks of the single-camera video moving object are merged to obtain the tracks of the multi-camera video moving object

Realizing the related organization of the multi-camera video moving target:

Cube_io＝{C_k1,i1,C_k2,i2…C_ko,iL,(k1,k2,…ko)∈(1,2…L)}；

wherein L is_oRepresenting the total number of video moving objects, Cube, after merging of cross-camera homonymous video moving objects in a surveillance video network_ioRepresenting the global track of the video moving object with the sequence number io in the monitoring video network,

and (4) representing the sub-track of the video moving object with the sequence number io in the ko camera.

As a preferred scheme of a multi-camera video concentration method in a geographical scene with a preferred virtual viewpoint, the observable collection is preferred by constructing an evaluation model to obtain a virtual viewpoint group, and the method specifically comprises the following steps:

c1) the number of cameras is recorded as L, and a set formed by all combination modes of the cameras is recorded as M:

m_i＝{n_i,j}

wherein m is_iThe ith camera combination mode comprises all camera sets in the combination mode; n is_i,jDigital camera combination mode m_iA j-th group of cameras under the condition that all cameras of the group are included;

the first camera in the j camera set in the i camera set combination mode is referred to;

c2) by defining a distance threshold T_disAnd an angle threshold T_angFor each cameraCombined mode m_iIn each camera group n_i,jAll cameras in

Calculating observable domains and calculating intersection; if a certain camera combination mode m_iAll the camera sets n_i,jIf the intersection of any number of the camera observation domains is not empty, the combination mode m of the camera is recorded_iIs an observable combination, otherwise, the camera combination mode m is recorded_iIs an unobservable combination;

c3) based on multi-camera video target trajectory data, the following video condensation optimization targets are specified to realize the optimization of the camera group:

firstly, expressing coherence of homonymous targets across cameras, namely expressing the video cameras with the appearance of a single target in a consistent manner by using as few virtual viewpoints as possible;

expressing that the total number of virtual viewpoints used by all video targets is as small as possible;

c4) comprehensively evaluating the multi-camera video target expression effect of the camera combination corresponding to the virtual viewpoint group by value:

wherein n is_cRepresents the total number of cameras, n_vRepresenting the number of virtual viewpoints, N representing the total number of video moving objects, m_iRepresenting the number of virtual viewpoints which are related and expressed by each video moving object, wherein mu is a weight parameter;

c5) at a distance threshold T_disAnd an angle threshold T_angWhen the value is constant, value values of all current camera observable sets are calculated through defining a parameter alpha, the maximum value is taken as a camera combination selection result, and multi-camera video concentration in a virtual scene is carried out.

As a preferred scheme of the method for concentrating the multi-camera video in the geographic scene with the preferred virtual viewpoint, the method for concentrating the multi-camera video according to the display parameters of the preset moving object comprises the following steps:

d1) recording that W virtual viewpoints (W is less than or equal to L) are needed for viewing the video moving objects of all L cameras under the current camera combination; meanwhile, setting a frame rate fps displayed by the video moving target subgraph in a three-dimensional scene as the number of the subgraphs displayed by a single video moving target per second; setting an object display interval time t₀As time intervals for additional display of new video moving objects;

d2) for a certain virtual viewpoint W (W ≦ W), the first occurring moving object O is first displayed₀In geospatial track T₀And identifying the sequence of appearance of the video object among different cameras;

screening the video object subgraph according to the frame rate fps, converting the plane coordinate corresponding to the screened video object subgraph into a geographic coordinate, and meanwhile, according to a proportionality coefficient P_w、P_hScaling a video object sub-graph, P_w、P_hThe calculation formula is as follows:

wherein

Randomly selecting an appropriate amount of average width and height of subgraphs from a video object subgraph library, mapping coordinates of three points, namely upper left, lower left and upper right, of the selected subgraphs in an original video frame to corresponding geographic positions in a virtual scene to obtain the length and height of the video object subgraphs in a three-dimensional space,

is the average length and height of the video object sub-graph displayed in the virtual scene;

d3) displaying O in the camera view field in the virtual scene according to the frame rate fps in the dynamic display process₀The video object sub-graph of the current frame is in the corresponding geographic position, and the old video object sub-graph is not displayed any more;

at t₀,2t₀…,nt₀At the moment, the video object O is added₁,O₂…O_nDynamically expressed in a three-dimensional scene model to realize the concentration of multi-camera video objects.

As a preferred scheme of the multi-camera video concentration method in the geographic scene with the preferred virtual viewpoint, for the condition that the same section of object track generated by overlapping the shooting areas of the cameras is obtained by a plurality of cameras, determining the camera for obtaining the object sub-image by comparing the included angles between the virtual viewpoint and the object track point and the connecting line of three points at the two camera positions:

and a view overlapping part C exists between the camera a and the camera b, for the video object passing through the view C, the included angle among the three points of the camera position, the track point and the virtual viewpoint V, namely the sizes of alpha and beta, is compared, if alpha is less than or equal to beta, the video object sub-graph acquired by the camera a is used, and otherwise, the video object sub-graph acquired by the camera b is used.

In a second aspect, there is provided a multi-camera video concentration system in a geographical scene with a preferred virtual viewpoint, where the multi-camera video concentration method in a geographical scene with a preferred virtual viewpoint in the first aspect or any possible implementation manner thereof is adopted, and the concentration system includes:

the same-name point acquisition module: the system comprises a video sequence image acquisition module, a video acquisition module, a three-dimensional geographic scene model acquisition module, a coordinate acquisition module and a display module, wherein the video sequence image acquisition module is used for acquiring video sequence image information, acquiring homonymous point pairs in a video image and the three-dimensional geographic scene model and acquiring coordinate data of the homonymous point pairs, and the coordinate data comprises image coordinates and geographic coordinates;

the mapping relation construction module: the method comprises the steps of establishing a mapping relation between a video image and a geographic space according to coordinate data of a same-name point pair, and positioning a camera view;

a camera group observable collection generation module: the system comprises a camera observation domain model and a camera group observation set, wherein the camera observation domain model is constructed by analyzing an observable distance and a sight line deflection angle to generate a camera group observable set;

a virtual viewpoint group generation module: the system is used for optimizing the observable collection by constructing an evaluation model to obtain a virtual viewpoint group;

the video target space-time motion expression module: the method comprises the steps of presetting display parameters of a moving target, and carrying out multi-camera video concentration according to the display parameters.

The invention has the following advantages: acquiring coordinate data of a same-name point pair by acquiring video sequence image information and acquiring the same-name point pair in a video image and a three-dimensional geographic scene model, wherein the coordinate data comprises an image coordinate and a geographic coordinate; establishing a mapping relation between a video image and a geographic space according to coordinate data of the same-name point pair, and positioning a camera view; constructing a camera observation domain model by analyzing the observable distance and the sight line deflection angle, and generating an observable set of a camera group; optimizing the observable set by constructing an evaluation model to obtain a virtual viewpoint group; and presetting display parameters of the moving target, and performing multi-camera video concentration according to the display parameters. The method has the remarkable effects that the mapping relation between the video target and the geographic scene is established, the effect of fusion expression of the monitoring video in the geographic scene is enhanced, and great convenience is provided for the quick retrieval and efficient understanding of the video geographic scene information integration.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.

The structures, ratios, sizes, and the like shown in the present specification are only used for matching with the contents disclosed in the specification, so that those skilled in the art can understand and read the present invention, and do not limit the conditions for implementing the present invention, so that the present invention has no technical significance, and any structural modifications, changes in the ratio relationship, or adjustments of the sizes, without affecting the functions and purposes of the present invention, should still fall within the scope of the present invention.

Fig. 1 is a schematic diagram illustrating an integrated representation of multiple channels of video in a VGE according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of a method for concentrating multi-camera video in a geographic scene with a preferred virtual viewpoint according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a camera and a geospatial coordinate system and an image space coordinate system provided in an embodiment of the invention;

FIG. 4(a) is a schematic diagram of a virtual line-of-sight distance provided in an embodiment of the present invention;

fig. 4(b) is a schematic diagram of a camera-virtual viewpoint included angle provided in the embodiment of the present invention;

FIG. 5(a) is a schematic diagram of the closest/farthest line segment of a camera to a video field provided in an embodiment of the present invention;

fig. 5(b) is a schematic diagram of a virtual viewpoint distance reasonable region provided in the embodiment of the present invention;

fig. 5(c) is a schematic diagram of a virtual viewpoint angle reasonable region provided in the embodiment of the present invention;

fig. 5(d) is a schematic diagram of a virtual viewpoint angle and distance plausible region provided in the embodiment of the present invention;

FIG. 6 is a schematic view of a set of observable cameras provided in an embodiment of the present invention;

FIG. 7 is a schematic diagram illustrating the concentration of multiple camera video objects in a geographic scene provided in an embodiment of the present invention;

FIG. 8 is a schematic view of a camera view overlap process provided in an embodiment of the invention;

fig. 9 is a schematic diagram of a multi-camera video concentrating system in a geographic scene with a preferred virtual viewpoint according to an embodiment of the present invention.

Detailed Description

The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1

Referring to fig. 1, by introducing a video into the VGE and supporting video intelligent analysis with geospatial information, related functions such as video data organization management, spatial mapping, video-scene fusion expression and the like in the VGE can be realized.

Referring to fig. 2, a method for concentrating multi-camera video in a geographical scene with a preferred virtual viewpoint is provided, which comprises the following steps:

s1, acquiring video sequence image information, acquiring homonymy point pairs in a video image and a three-dimensional geographic scene model, and acquiring coordinate data of the homonymy point pairs, wherein the coordinate data comprises image coordinates and geographic coordinates;

s2, establishing a mapping relation between a video image and a geographic space according to coordinate data of the same-name point pair, and positioning a camera view;

s3, constructing a camera observation domain model by analyzing the observable distance and the sight line deflection angle, and generating an observable set of a camera group;

s4, optimizing the observable set by constructing an evaluation model to obtain a virtual viewpoint group;

and S5, presetting display parameters of the moving target, and concentrating the multi-camera video according to the display parameters.

Specifically, in step S1, the video image is a first frame image of the captured monitoring video. In step S1, the three-dimensional geographic scene model is a three-dimensional scene model constructed according to the real geographic scene measurement information, the number of the same-name point pairs collected on the video image and the virtual geographic scene is not less than three pairs, and the three pairs of the same-name point pairs are not all collinear.

In particular, referring to fig. 3, the relationship between the camera and the image space coordinate system and the geospatial coordinate system is illustrated. Recording a camera station center as C and an image space coordinate system as O_iX_iY_iThe geospatial coordinate system is marked as O_gX_gY_gZ_g. In step S2, the establishing of the mapping relationship between the video image and the geographic space includes the following steps:

q＝[x y 1]^T

Q＝[X Y Z 1]^T；

when a homography matrix M is recorded, the relationship between Q and Q is:

q＝MQ；

the expression of the homography matrix M is:

a2) because M has 6 unknowns, at least 3 groups of known image point image space coordinates and object point geographic space coordinates are needed to realize the solution of M. After M is determined, solving the geographic space coordinates of the object points corresponding to the image points in each image:

a3) assuming that there are L cameras in the current video camera network, the mapping matrix is labeled M for the k-th camera (k is 1,2 … L)_k(ii) a On this basis, each camera position in the geographic space is defined as

The geospatial position of each camera field of view polygon is

Wherein the camera position

In this embodiment, in step S2, in the process of positioning the camera view, two factors, namely, the virtual line-of-sight distance and the camera-virtual viewpoint included angle, are selected as constraint conditions;

referring specifically to fig. 4, the virtual line-of-sight distance refers to a geospatial distance between a virtual viewpoint and a certain point in the view field; the camera-virtual viewpoint included angle is an included angle formed by projection of a given point in a visual field, the virtual viewpoint and a camera position point on a horizontal plane;

defining a distance threshold T_disAnd an angle threshold T_angAs a constraint, and assuming a distance threshold T_disAnd an angle threshold T_angIt has been given that on this basis, areas meeting the constraints are found in the scene model as virtual viewpoint ranges.

Specifically, in step S3, the step of constructing a camera observation domain model by analyzing the observable distance and the gaze deflection angle, and generating the observable set of the camera group includes the following steps:

b1) referring specifically to FIG. 5(a), camera position is recorded

Polygon with camera view

And is

In the distance

Is P_k,n1P_k,n2The farthest line segment is P_k,n3P_k,n4；

b2) With specific reference to FIG. 5(b), at points P_k,n3,P_k,n4As a circle center, a distance threshold value T_disIs a radius on the line segment P_k, _n3P_k,n4Towards the camera position

b3) Referring specifically to FIG. 5(c), at points P_k,n1,P_k,n2Being a corner point, an angle threshold T_angFor deflection angle, clockwise and anticlockwise respectively deflect T_angMaking four rays, and taking the four rays to be on the line segment P_k,n1P_k,n2Near camera position

b4) Referring specifically to fig. 5(d), the virtual viewpoint range a of the camera_kIs A_k,disAnd A_k,angThe intersection of (a);

b5) let Obj be the total set of all video moving objects in all video cameras; note N in the kth camera_kEach video moving object has a track marked as C_k,i，C_k,iThe expression of (a) is as follows:

Obj＝{C_k,i,(k＝1,2…L)}

C_k,i＝{I_k,i,j,P_k,i,j,(i＝1,2,…,N_k)(j＝1,2,…,n)}；

wherein L represents the number of cameras, I_k,i,jAnd P_k,i,jAnd (3) representing the sub-image of the ith video moving object in the jth video frame in the kth camera and the geographic spatial position of the sub-image are subjected to cross-camera correlation analysis by the video moving object, and the single camera view is mergedObtaining multi-camera video motion target track by frequency motion target track

Realizing the related organization of the multi-camera video moving target:

Cube_io＝{C_k1,i1,C_k2,i2…C_ko,iL,(k1,k2,…ko)∈(1,2…L)}；

Specifically, in step S4, the optimizing the observable set by constructing an evaluation model to obtain a virtual viewpoint group specifically includes the following steps:

m_i＝{n_i,j}

see in particular fig. 6, where m_iThe ith camera combination mode comprises all camera sets in the combination mode; n is_i,jDigital camera combination mode m_iA j-th group of cameras under the condition that all cameras of the group are included;

c2) by defining a distance threshold T_disAnd an angle threshold T_angAssembling the patterns m for each camera_iIn each camera group n_i,jAll cameras in

c5) at a distance threshold T_disAnd an angle threshold T_angWhen the value is taken to be one, the value values of all current camera observable sets are calculated by defining a parameter alpha, and the maximum value is taken as the camera combination for selectionAnd taking the result, and concentrating the multi-camera video in the virtual scene.

Referring to fig. 7 in particular, step S4 sets the moving object display parameters to perform multi-camera video concentration under the condition that the camera display combination is selected, with the central point of the observable domain of each camera group as the virtual viewpoint, based on the observable set preference result.

In step S5, the preset display parameters of the moving object, according to the display parameters, perform multi-camera video concentration, including the following steps:

wherein

Randomly selecting the average width and height of a proper number of subgraphs from a video object subgraph library, and selectingThe extracted subgraph is mapped to the corresponding geographic position in the virtual scene corresponding to the three-point coordinates of the upper left, the lower left and the upper right in the original video frame to obtain the length and the height of the video object subgraph in the three-dimensional space,

on the other hand, at t₀,2t₀…,nt₀At the moment, the video object O is added₁,O₂…O_nDynamically expressed in a three-dimensional scene model to realize the concentration of multi-camera video objects.

Specifically, in step S5, for the case where the same object trajectory generated by overlapping the camera shooting areas is obtained by multiple cameras, the camera that obtains the object sub-graph is determined by comparing the included angles between the virtual viewpoint and the connecting line between the object trajectory point and the three points at the two camera positions:

specifically referring to fig. 8, a view overlapping part C exists between the camera a and the camera b, for a video object passing through the view C, the included angles between three points, i.e., the sizes of α and β, of the camera position, the track point and the virtual viewpoint V are compared, if α is not greater than β, the video object sub-graph acquired by the camera a is used, otherwise, the video object sub-graph acquired by the camera b is used.

Example 2

Referring to fig. 9, the present invention further provides a multi-camera video concentration system in a geographical scene with a preferred virtual viewpoint, which employs the multi-camera video concentration method in a geographical scene with a preferred virtual viewpoint in embodiment 1 or any possible implementation manner thereof, and the concentration system includes:

the homologous point acquisition module 1: the system comprises a video sequence image acquisition module, a video acquisition module, a three-dimensional geographic scene model acquisition module, a coordinate acquisition module and a display module, wherein the video sequence image acquisition module is used for acquiring video sequence image information, acquiring homonymous point pairs in a video image and the three-dimensional geographic scene model and acquiring coordinate data of the homonymous point pairs, and the coordinate data comprises image coordinates and geographic coordinates;

the mapping relation building module 2: the method comprises the steps of establishing a mapping relation between a video image and a geographic space according to coordinate data of a same-name point pair, and positioning a camera view;

the camera group observable collection generation module 3: the system comprises a camera observation domain model and a camera group observation set, wherein the camera observation domain model is constructed by analyzing an observable distance and a sight line deflection angle to generate a camera group observable set;

virtual viewpoint group generation module 4: the system is used for optimizing the observable collection by constructing an evaluation model to obtain a virtual viewpoint group;

specifically, a video image visual domain model is constructed to describe the range in which each camera in a virtual scene can be effectively observed, then a virtual viewpoint is generated, and the global motion condition of a video motion target in a multi-camera geographic scene is checked; on the basis of a camera observation domain model, exhaustively exhausting a camera observable set, and preferably selecting a combination with the best video target information expression effect as a virtual viewpoint generation region;

the video target space-time motion expression module 5: the method comprises the steps of presetting display parameters of a moving target, and carrying out multi-camera video concentration according to the display parameters.

Specifically, based on the optimal result of the observable set, under the condition that the camera display combination is selected, the central point of the observable domain of each camera group is used as a virtual viewpoint, the moving object display parameters are set, and multi-camera video concentration is performed.

It should be noted that, for the information interaction, execution process and other contents between the modules/units of the multi-camera video concentrating system in the geographic scene with the preferred virtual viewpoint, the same concept is based on as the method embodiment in the embodiment of the present application, and the technical effect brought by the same concept is the same as the method embodiment of the present application, and specific contents can be referred to the description in the foregoing method embodiment of the present application.

Acquiring video sequence image information, acquiring homonymous point pairs in a video image and a three-dimensional geographic scene model, and acquiring coordinate data of the homonymous point pairs, wherein the coordinate data comprises image coordinates and geographic coordinates; establishing a mapping relation between a video image and a geographic space according to coordinate data of the same-name point pair, and positioning a camera view; constructing a camera observation domain model by analyzing the observable distance and the sight line deflection angle, and generating an observable set of a camera group; optimizing the observable set by constructing an evaluation model to obtain a virtual viewpoint group; and presetting display parameters of the moving target, and performing multi-camera video concentration according to the display parameters. The method has the remarkable effects that the mapping relation between the video target and the geographic scene is established, the effect of fusion expression of the monitoring video in the geographic scene is enhanced, and great convenience is provided for the quick retrieval and efficient understanding of the video geographic scene information integration.

As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "module" or "platform.

Although the invention has been described in detail above with reference to a general description and specific examples, it will be apparent to one skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.

Claims

1. A method for concentrating multi-camera video in a geographic scene with a preferred virtual viewpoint is characterized by comprising the following steps:

2. The method for concentrating multi-camera video in geographic scene with preferred virtual viewpoint as claimed in claim 1, wherein the video image is the first frame image of the intercepted surveillance video.

3. The method as claimed in claim 2, wherein the three-dimensional geo-scene model is a three-dimensional scene model constructed according to real geo-scene measurement information, the number of the same-name point pairs collected on the video image and the virtual geo-scene is not less than three, and the three same-name point pairs are not all collinear.

4. The method of claim 3, wherein the step of establishing a mapping relationship between the video images and the geographic space and positioning the camera view comprises the steps of:

q＝[x y 1]^T

Q＝[X Y Z 1]^T；

when a homography matrix M is recorded, the relationship between Q and Q is:

q＝MQ；

the expression of the homography matrix M is:

a3) assuming that there are L cameras in the current video camera network, the mapping matrix of the kth camera (k 1, 2.. L) is labeled M_k(ii) a Defining each camera position in geographic space as

The geospatial position of each camera field of view polygon is

Wherein the camera position

Considered as a point in the geographic space, the camera view polygon is bounded by o boundary points P_k，numSequentially concatenating the constituent polygon records.

5. The method for concentrating multi-camera video in geographic scene with a preferred virtual viewpoint as claimed in claim 4, wherein the process of positioning the camera view selects two factors of virtual line-of-sight distance and camera-virtual viewpoint angle as constraints;

6. The method for concentrating multiple cameras in a geographic scene with a preferred virtual viewpoint as claimed in claim 5, wherein the step of constructing a camera observation domain model by analyzing an observable distance and a view deflection angle to generate a camera group observable set comprises the following steps:

b1) recording camera position

Polygon with camera view

And is

In the distance

Is P_k，n1P_k，n2The farthest line segment is P_k，n3P_k，n4；

b2) Respectively at point P_k，n3，P_k，n4As a circle center, a distance threshold value T_disIs a radius on the line segment P_k，n3P_k，n4Towards the camera position

Drawing a semicircle at one side, and taking two semicircles as a line segment P_k，n1P_k，n2Near camera position

One side intersection region as a virtual viewpoint distance reasonable region A_k，dis；

b3) Respectively at point P_k，n1，P_k，n2Being a corner point, an angle threshold T_angFor deflection angle, clockwise and anticlockwise respectively deflect T_angMaking four rays, and taking the four rays to be on the line segment P_k，n1P_k，n2Near camera position

One side intersection region as a virtual viewpoint angle reasonable region A_k，ang；

b4) Virtual viewpoint range A of camera_kIs A_k，disAnd A_k，angThe intersection of (a);

b5) let Obj be the total set of all video moving objects in all video cameras; note that there is N in the kth camera_kEach video moving object has a track marked as C_k，i，C_k，iThe expression of (a) is as follows:

Obj＝{C_k，i，(k＝1，2...L)}

C_k，i＝{I_k，i，j，P_k，i，j，(i＝1，2，...，N_k)(j＝1，2，...，n)}；

wherein L represents the number of cameras, I_k，i，jAnd P_k，i，jRepresenting the sub-image of the ith video moving object in the jth video frame in the kth camera and the geographic space position of the sub-image are analyzed by the cross-camera association of the video moving objects, and the tracks of the single-camera video moving object are merged to obtain the tracks of the multi-camera video moving object

Realizing the related organization of the multi-camera video moving target:

Cube_io＝{C_k1，i1，C_k2，i2...C_ko，iL，(k1，k2，...ko)∈(1，2...L)}；

7. The method for concentrating the multi-camera video in the geographic scene with the preferred virtual viewpoint as claimed in claim 6, wherein the observable set is preferred by constructing an evaluation model to obtain a virtual viewpoint group, specifically comprising the following steps:

m_i＝{n_i，j}

wherein m is_iThe ith camera combination mode comprises all camera sets in the combination mode; n is_i，jDigital camera combination mode m_iA j-th group of cameras under the condition that all cameras of the group are included;

c2) by defining a distance threshold T_disAnd an angle threshold T_angAssembling the patterns m for each camera_iIn each camera group n_i，jAll cameras in

Calculating observable domains and calculating intersection; if a certain camera combination mode m_iAll the camera sets n_i，jIf the intersection of any number of the camera observation domains is not empty, the combination mode m of the camera is recorded_iIs an observable combination, otherwise, the camera combination mode m is recorded_iIs an unobservable combination;

8. The method of claim 7, wherein the method for concentrating the multi-camera video in the geographic scene with the virtual viewpoint being preferred comprises the following steps:

wherein

at t₀，2t₀...，nt₀At the moment, the video object O is added₁，O₂...O_nDynamically expressed in a three-dimensional scene model to realize the concentration of multi-camera video objects.

9. The method according to claim 8, wherein for the case that the same object track generated by overlapping camera shooting areas is obtained by a plurality of cameras, the cameras for obtaining the object subgraph are determined by comparing the included angles between the virtual viewpoint and the object track point and the connecting line of three points at the two camera positions:

10. A multi-camera video concentration system in a geographical scene with a preferred virtual viewpoint, which adopts the multi-camera video concentration method in a geographical scene with a preferred virtual viewpoint as claimed in any one of claims 1 to 9, wherein the concentration system comprises: