CN117036448B - Scene construction method and system of multi-view camera - Google Patents
Scene construction method and system of multi-view camera Download PDFInfo
- Publication number
- CN117036448B CN117036448B CN202311300861.7A CN202311300861A CN117036448B CN 117036448 B CN117036448 B CN 117036448B CN 202311300861 A CN202311300861 A CN 202311300861A CN 117036448 B CN117036448 B CN 117036448B
- Authority
- CN
- China
- Prior art keywords
- event
- view camera
- human body
- dimensional
- constructing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000010276 construction Methods 0.000 title claims abstract description 35
- 238000000034 method Methods 0.000 claims abstract description 78
- 230000004927 fusion Effects 0.000 claims description 17
- 230000008569 process Effects 0.000 claims description 14
- 230000001788 irregular Effects 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 8
- 230000001960 triggered effect Effects 0.000 claims description 6
- 238000005457 optimization Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 description 9
- 238000001514 detection method Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 108010001267 Protein Subunits Proteins 0.000 description 1
- 230000005856 abnormality Effects 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
- G06T7/593—Depth or shape recovery from multiple images from stereo images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/80—Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Geometry (AREA)
- General Health & Medical Sciences (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Computer Graphics (AREA)
- Length Measuring Devices By Optical Means (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to the technical field of cameras, and particularly relates to a scene construction method and system of a multi-view camera. The method comprises the following steps: step 1: constructing a multi-information source event sensor based on a multi-view camera array; step 2: constructing an event trigger source in a complex three-dimensional space area; step 3: constructing a trigger object based on three-dimensional human body posture key points; step 4: triggering and recording an event; according to the method, the multi-information source event sensor is built by adopting the multi-view camera array, three-dimensional reconstruction of a human body under a plurality of view angles is achieved, event sensing is completed, meanwhile, an programmable three-dimensional space is adopted as an event trigger source, and event triggering of a complex structure is achieved.
Description
Technical Field
The invention belongs to the technical field of cameras, and particularly relates to a scene construction method and system of a multi-view camera.
Background
Three-dimensional reconstruction (3 d reconstruction 1 n) is a mathematical model for creating a three-dimensional object suitable for computer representation and processing, is a basis for processing, operating and analyzing the three-dimensional object in a computer environment, and is a key technology for creating virtual reality expressing an objective world in a computer.
In computer vision, three-dimensional reconstruction refers to the process of reconstructing three-dimensional information from single-view or multi-view images. Because the information of the single video is incomplete, the three-dimensional reconstruction needs to use empirical knowledge, and the three-dimensional reconstruction of multiple views (similar to binocular positioning of people) is relatively easy.
In three-dimensional reconstruction, the construction of an event trigger source of a complex three-dimensional space region is key, and event trigger and recording are marking and recording performed for accurately positioning and describing a scene with activity or abnormality in the video processing process.
In current commonly used uncoupled video scenes, events are typically defined in a single view only, and are described by a simple two-dimensional image region, which is prone to a large number of missed and false detections, and difficult to detect for complex scenes. Furthermore, complex event definitions are often difficult to make in a single view.
Disclosure of Invention
Therefore, a main object of the present invention is to provide a method and a system for constructing a scene of a multi-view camera, wherein the method adopts a multi-view camera array to construct a multi-information source event sensor, thereby realizing three-dimensional reconstruction of a human body under a plurality of view angles, completing event sensing, and simultaneously adopting an orchestratable three-dimensional space as an event triggering source, and realizing event triggering of a complex structure.
In order to achieve the above purpose, the technical scheme of the invention is realized as follows:
a scene construction method of a multi-view camera, the method comprising the steps of:
step 1: the construction of the multi-information source event sensor based on the multi-view camera array specifically comprises the following steps:
step 1.1: performing internal parameter calibration of the multi-view camera;
step 1.2: performing external parameter calibration of the multi-view camera and bundling adjustment of parameters of the multi-view camera;
step 1.3: the multi-view camera acquires multiple views, performs three-dimensional reconstruction and tracking of human bodies based on the multiple views, and completes construction of the multi-information-source event sensor;
step 2: constructing an event trigger source in a complex three-dimensional space area;
step 3: constructing a trigger object based on three-dimensional human body posture key points;
step 4: the event triggering and recording method specifically comprises the following steps:
step 4.1: operating a multi-information source event sensor based on a multi-view camera array;
step 4.2: continuously detecting and reconstructing human body information;
step 4.3: checking the inclusion relation between the human body information and the set three-dimensional space region forming the event trigger source, if the inclusion condition is met, detecting whether the human body gesture accords with an event start human body gesture key point signal contained in the event, and if the inclusion condition is met, triggering the event corresponding to the event trigger source and starting recording; if the triggering event exists, detecting whether the triggering event meets the continuous human body gesture key point signal of the event, if yes, continuously recording, otherwise ending the event recording; if the inclusion condition is not satisfied and the event has triggered, the event recording is ended.
Further, the method for constructing the event trigger source in the complex three-dimensional space region in the step 2 includes: step 2.1: selecting a critical point of an event area in the multiple views; step 2.2: calculating a critical point under a camera coordinate system based on a triangulation method; step 2.3: and constructing an event three-dimensional space region which is wrapped by the polyhedron, and taking the event three-dimensional space region as an event trigger source of the region.
Further, the method for constructing the event three-dimensional space region wrapped by the polyhedron in the step 2.3 includes: based on a critical point under a camera coordinate system, constructing a boundary of a positive N surface body, and wrapping an event area in the boundary; the value of N is required to meet the following constraint conditions:
wherein n is the number of views of the multi-view camera; (x) n ,y n ) Is the coordinates of the critical point.
Further, the method for constructing the triggering object based on the three-dimensional human body posture key point in the step 3 includes: step 3.1: selecting an event trigger source; step 3.2: and defining human body posture key point signals of a plurality of events and human body posture key point signals of continuous events, registering the human body posture key point signals and the corresponding events in an event trigger source, and completing the construction of a trigger object.
Further, the method for performing internal reference calibration in the step 1.1 includes: combining a plurality of groups of high-precision cubes with known sizes to form irregular cubes with mutually different shapes; shooting the irregular cube at a plurality of different angles by using a multi-view camera to obtain a plurality of groups of shooting results; generating depth data of a plurality of groups of shooting results through perspective projection inverse process, respectively, projecting the depth data back into a three-dimensional space under a local coordinate system of a camera, generating a normal map with the same frame number, and dividing planes of a calibration object on the normal map to obtain a plurality of groups of plane depth data; carrying out data fusion on a plurality of groups of plane depth data to obtain fusion depth data; the fusion depth data is projected back to a three-dimensional space under a local coordinate system of the camera through a perspective projection inverse process, and a three-dimensional point set corresponding to each plane is obtained; performing a least square fitting method on each obtained three-dimensional point set to obtain a plane corresponding to the three-dimensional point set; calculating the included angle and the distance between the marked planes based on the obtained planes; actually measuring the included angle and the distance between marked planes of the high-precision cube, comparing the obtained included angle with the distance, constructing an optimized objective function with the aim of minimizing the difference, optimizing the internal parameters of the multi-view camera by using a nonlinear iterative optimization method through the optimized objective function, minimizing the objective function, and completing the internal parameter calibration of the multi-view camera.
Further, the method for performing data fusion on multiple groups of plane depth data to obtain fused depth data includes:wherein R is fusion depth data, and m is the group number of plane depth data; o is the number of cubes contained in each set of high precision cube combinations; r is (r) i Is plane depth data; s is the surface area of the irregular cube; m is the average number of faces of cubes in each set of high precision cube combinations.
Further, the method for performing external parameter calibration in the step 1.2 comprises the following steps: acquiring included angles of the multi-view camera in three regular directions; further, the projections of the external parameters of the multi-view camera in three regular directions under the included angles are obtained, and three external parameter projection sets are respectively obtained; the three regular directions are a first regular direction, a second regular direction and a third regular direction respectively; the fitting error for three canonical directions is calculated using the following formula:
;
wherein, l is the parameter number in the external parameter projection set under each regular direction; w (w) 1 Calculate the function, y, for the error l The multi-view camera is externally referred in a certain regular direction,the projection of the multi-view camera in a certain regular direction is externally referred; θ k An included angle of the multi-view camera in a certain regular direction; and taking the regular direction corresponding to the minimum fitting error as a standard projection direction according to the calculated fitting error, and taking an external parameter projection set obtained by projecting in the regular direction as an external parameter.
Further, the method for performing bundle adjustment of the parameters of the multi-view camera in step 1.2 is a parallel bundle adjustment method.
Further, the method for obtaining multiple views by the multi-view camera in the step 1.3, performing three-dimensional reconstruction and tracking of the human body based on the multiple views, and completing the construction of the multi-information-source event sensor comprises the following steps: under each multi-view, carrying out three-dimensional reconstruction and tracking on the human body to obtain a plurality of three-dimensional reconstruction and tracking results of the human body; and under the three-dimensional reconstruction and tracking results of each human body, constructing information source event perceptrons, and combining each information source event perceptrons into a multi-information source event perceptrons.
A scene construction system of a multi-view camera, the system comprising: a construction unit of the multi-information source event sensor, configured to construct the multi-information source event sensor based on the multi-view camera array; the regional event trigger source construction unit is configured to construct a complex three-dimensional spatial regional event trigger source; a trigger object construction unit configured to construct a trigger object based on three-dimensional human body posture key points: and the event processing unit is configured for carrying out event triggering and recording.
The scene construction method and system of the multi-view camera have the following beneficial effects: according to the invention, the multi-information source event sensor is constructed by adopting the multi-view camera array, so that effective sensing can be performed on a complex scene, the shielding robustness is very strong, the three-dimensional reconstruction of a human body can be realized by information complementation of a plurality of view angles, and further, the event sensing task is completed. Meanwhile, the invention adopts the programmable three-dimensional space area as the event trigger source, and constructs the event trigger source with various event types and complex structures by programming the programmable three-dimensional space area. In addition, the three-dimensional human body gesture key points are used as event triggering objects, and corresponding events are triggered and recorded by detecting the specific gesture of the three-dimensional human body gesture key points and the interaction relation between the key points and the three-dimensional space area.
Drawings
Fig. 1 is a schematic flow chart of a scene construction method of a multi-view camera according to an embodiment of the present invention.
Detailed Description
The method of the present invention will be described in further detail with reference to the accompanying drawings.
Example 1
A scene construction method of a multi-view camera, the method comprising the steps of:
step 1: the construction of the multi-information source event sensor based on the multi-view camera array specifically comprises the following steps:
step 1.1: performing internal parameter calibration of the multi-view camera;
step 1.2: performing external parameter calibration of the multi-view camera and bundling adjustment of parameters of the multi-view camera;
step 1.3: the multi-view camera acquires multiple views, performs three-dimensional reconstruction and tracking of human bodies based on the multiple views, and completes construction of the multi-information-source event sensor;
step 2: constructing an event trigger source in a complex three-dimensional space area;
step 3: constructing a trigger object based on three-dimensional human body posture key points;
step 4: the event triggering and recording method specifically comprises the following steps:
step 4.1: operating a multi-information source event sensor based on a multi-view camera array;
step 4.2: continuously detecting and reconstructing human body information;
step 4.3: checking the inclusion relation between the human body information and the set three-dimensional space region forming the event trigger source, if the inclusion condition is met, detecting whether the human body gesture accords with an event start human body gesture key point signal contained in the event, and if the inclusion condition is met, triggering the event corresponding to the event trigger source and starting recording; if the triggering event exists, detecting whether the triggering event meets the continuous human body gesture key point signal of the event, if yes, continuously recording, otherwise ending the event recording; if the inclusion condition is not satisfied and the event has triggered, the event recording is ended.
Specifically, in computer vision, three-dimensional reconstruction is a process of reconstructing three-dimensional information according to single-view or multi-view images, and because the information of a single video is incomplete, three-dimensional reconstruction needs to use priori knowledge, while multi-view three-dimensional reconstruction can reconstruct a three-dimensional model by using the information of two-dimensional images of more viewpoints. However, most three-dimensional reconstruction algorithms at present are not accurate and comprehensive enough to utilize two-dimensional information, and the calculation process excessively depends on information provided by external equipment, such as depth information provided by a depth camera, or depends on segmentation results of a target and a background, so that the reconstructed results are still rough.
According to the invention, the multi-information source event sensor is constructed by adopting the multi-view camera array, so that effective sensing can be performed on a complex scene. The result is not dependent on the segmentation result of the target and the background, but the corresponding event is triggered and recorded by detecting the specific gesture of the three-dimensional human gesture key point and the interaction relation between the key point and the three-dimensional space region, so that the accuracy of the reconstruction result is higher and the method is more suitable for complex three-dimensional scenes.
Example 2
On the basis of the above embodiment, the method for constructing the complex three-dimensional space region event trigger source in the step 2 includes: step 2.1: selecting a critical point of an event area in the multiple views; step 2.2: calculating a critical point under a camera coordinate system based on a triangulation method; step 2.3: and constructing an event three-dimensional space region which is wrapped by the polyhedron, and taking the event three-dimensional space region as an event trigger source of the region.
Specifically, the event trigger source defines an event triggered area.
Example 3
On the basis of the above embodiment, the method for constructing the event three-dimensional space region enclosed by the polyhedron in the step 2.3 includes: based on a critical point under a camera coordinate system, constructing a boundary of a positive N surface body, and wrapping an event area in the boundary; the value of N is required to meet the following constraint conditions:wherein n is the number of views of the multi-view camera; (x) n ,y n ) Is the coordinates of the critical point.
Specifically, in general, the larger the N value of the positive N surface body is, the more accurate the obtained event area is, and the more accurate the subsequent scene construction result is.
Example 4
On the basis of the above embodiment, the method for constructing the triggering object based on the three-dimensional human body posture key point in the step 3 includes: step 3.1: selecting an event trigger source; step 3.2: and defining human body posture key point signals of a plurality of events and human body posture key point signals of continuous events, registering the human body posture key point signals and the corresponding events in an event trigger source, and completing the construction of a trigger object.
In particular, in image processing, a key point is essentially a feature. It is an abstract description of a fixed region or spatial physical relationship, describing a combination or context within a certain neighborhood. It is not just a point information, or represents a location, but rather a combination of context and surrounding neighborhood. The object of the detection of key points is to find out the coordinates of these points from an image by a computer, which is a basic task in the field of computer vision, and the detection of key points has a crucial meaning for high-level tasks such as identification and classification.
Specifically, the internal parameter calibration algorithm in the prior art is often realized based on only a single parameter or parameter set, and the accuracy of the result is low. Whereas the scene construction due to the invention is based on multi-view cameras. In this case, the accuracy is lower by using the conventional internal reference calibration method.
Human body posture key point detection (Human Keypoint Detection) is also called human body posture recognition, aims to accurately position the positions of human body joints in images, and is a front-end task of human body action recognition, human body behavior analysis and human-computer interaction. Unlike human face key point detection, the human trunk part is more flexible, the change is more difficult to predict, the coordinate regression-based method is difficult to compete, and a thermodynamic diagram regression key point detection method is generally used.
Example 5
On the basis of the above embodiment, the method for performing internal reference calibration in step 1.1 includes: combining a plurality of groups of high-precision cubes with known sizes to form irregular cubes with mutually different shapes; shooting the irregular cube at a plurality of different angles by using a multi-view camera to obtain a plurality of groups of shooting results; generating depth data of a plurality of groups of shooting results through perspective projection inverse process, respectively, projecting the depth data back into a three-dimensional space under a local coordinate system of a camera, generating a normal map with the same frame number, and dividing planes of a calibration object on the normal map to obtain a plurality of groups of plane depth data; carrying out data fusion on a plurality of groups of plane depth data to obtain fusion depth data; the fusion depth data is projected back to a three-dimensional space under a local coordinate system of the camera through a perspective projection inverse process, and a three-dimensional point set corresponding to each plane is obtained; performing a least square fitting method on each obtained three-dimensional point set to obtain a plane corresponding to the three-dimensional point set; calculating the included angle and the distance between the marked planes based on the obtained planes; actually measuring the included angle and the distance between marked planes of the high-precision cube, comparing the obtained included angle with the distance, constructing an optimized objective function with the aim of minimizing the difference, optimizing the internal parameters of the multi-view camera by using a nonlinear iterative optimization method through the optimized objective function, minimizing the objective function, and completing the internal parameter calibration of the multi-view camera.
The key point detection method can be generally divided into two types, one is solved by a coordinate regression mode, the other is to model the key points into a thermodynamic diagram, and the position of the key points is obtained by regression thermodynamic diagram distribution through a pixel classification task. Both methods are means or approaches, and solve the problem of finding out the position and relation of the point in the image
Example 6
On the basis of the above embodiment, the method for performing data fusion on multiple sets of plane depth data to obtain fused depth data includes:wherein R is fusion depth data, and m is the group number of plane depth data; o is the number of cubes contained in each set of high precision cube combinations; r is (r) i Is plane depth data; s is the surface area of the irregular cube; m is the average number of faces of cubes in each set of high precision cube combinations.
Specifically, data fusion is performed on multiple groups of plane depth data, and the obtained fusion depth data based on fusion can reflect the result obtained by shooting the multi-view camera under multiple angles on the whole. Thereby making the calibration result more accurate.
Example 7
Based on the above embodiment, the method for performing the external parameter calibration in step 1.2 is as follows: acquiring included angles of the multi-view camera in three regular directions; further, the projections of the external parameters of the multi-view camera in three regular directions under the included angles are obtained, and three external parameter projection sets are respectively obtained; the three regular directions are a first regular direction, a second regular direction and a third regular direction respectively; the fitting error for three canonical directions is calculated using the following formula:
;
where l is each canonical directionThe number of parameters in the underlying projection set of the external parameters; w (w) 1 Calculate the function, y, for the error l The multi-view camera is externally referred in a certain regular direction,the projection of the multi-view camera in a certain regular direction is externally referred; θ k An included angle of the multi-view camera in a certain regular direction; and taking the regular direction corresponding to the minimum fitting error as a standard projection direction according to the calculated fitting error, and taking an external parameter projection set obtained by projecting in the regular direction as an external parameter.
In particular, in the external parameter calibration, if the method in the prior art is carried into a multi-view camera, the result is easily inaccurate. And through the projection of a plurality of regular directions, the external parameter calibration is carried out based on the fitting error of each regular direction, so that the result is more accurate.
Example 8
On the basis of the above embodiment, the method for performing bundle adjustment of the parameters of the multi-view camera in step 1.2 is a parallel bundle adjustment method.
Example 9
On the basis of the above embodiment, the method for constructing the multi-information source event sensor by acquiring multiple views by the multi-view camera in step 1.3 and performing three-dimensional reconstruction and tracking of the human body based on the multiple views includes: under each multi-view, carrying out three-dimensional reconstruction and tracking on the human body to obtain a plurality of three-dimensional reconstruction and tracking results of the human body; and under the three-dimensional reconstruction and tracking results of each human body, constructing information source event perceptrons, and combining each information source event perceptrons into a multi-information source event perceptrons.
Example 10
A scene construction system of a multi-view camera, the system comprising: a construction unit of the multi-information source event sensor, configured to construct the multi-information source event sensor based on the multi-view camera array; the regional event trigger source construction unit is configured to construct a complex three-dimensional spatial regional event trigger source; a trigger object construction unit configured to construct a trigger object based on three-dimensional human body posture key points: and the event processing unit is configured for carrying out event triggering and recording.
It should be noted that, in the system provided in the foregoing embodiment, only the division of the foregoing functional units is illustrated, in practical application, the foregoing functional allocation may be performed by different functional units, that is, the units or steps in the embodiment of the present invention are further decomposed or combined, for example, the units in the foregoing embodiment may be combined into one unit, or may be further split into multiple sub-units, so as to complete all or the functions of the units described above. The names of the units and the steps related to the embodiment of the invention are only used for distinguishing the units or the steps, and are not to be construed as undue limitation of the invention.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the storage device and the processing device described above and the related description may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein.
Those of skill in the art will appreciate that the various illustrative elements, method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the program(s) corresponding to the software elements, method steps may be embodied in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may implement the described functionality using different approaches for each particular application, but such implementation is not intended to be limiting.
The terms "first," "another portion," and the like, are used for distinguishing between similar objects and not for describing a particular sequential or chronological order.
The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or unit/apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or unit/apparatus.
Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related art marks may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will fall within the scope of the present invention.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention.
Claims (9)
1. A scene construction method of a multi-view camera, the method comprising the steps of:
step 1: the construction of the multi-information source event sensor based on the multi-view camera array specifically comprises the following steps:
step 1.1: performing internal parameter calibration of the multi-view camera;
step 1.2: performing external parameter calibration of the multi-view camera and bundling adjustment of parameters of the multi-view camera;
step 1.3: the multi-view camera acquires multiple views, performs three-dimensional reconstruction and tracking of human bodies based on the multiple views, and completes construction of the multi-information-source event sensor;
step 2: constructing an event trigger source in a complex three-dimensional space area;
step 3: constructing a trigger object based on three-dimensional human body posture key points;
step 4: the event triggering and recording method specifically comprises the following steps:
step 4.1: operating a multi-information source event sensor based on a multi-view camera array;
step 4.2: continuously detecting and reconstructing human body information;
step 4.3: checking the inclusion relation between the human body information and the set three-dimensional space region forming the event trigger source, if the inclusion condition is met, detecting whether the human body gesture accords with an event start human body gesture key point signal contained in the event, and if the inclusion condition is met, triggering the event corresponding to the event trigger source and starting recording; if the triggering event exists, detecting whether the triggering event meets the continuous human body gesture key point signal of the event, if yes, continuously recording, otherwise ending the event recording; if the inclusion condition is not satisfied and the event has been triggered, ending the event record;
the method for performing internal reference calibration in the step 1.1 comprises the following steps: combining a plurality of groups of high-precision cubes with known sizes to form irregular cubes with mutually different shapes; shooting the irregular cube at a plurality of different angles by using a multi-view camera to obtain a plurality of groups of shooting results; generating depth data of a plurality of groups of shooting results through perspective projection inverse process, respectively, projecting the depth data back into a three-dimensional space under a local coordinate system of a camera, generating a normal map with the same frame number, and dividing planes of a calibration object on the normal map to obtain a plurality of groups of plane depth data; carrying out data fusion on a plurality of groups of plane depth data to obtain fusion depth data; the fusion depth data is projected back to a three-dimensional space under a local coordinate system of the camera through a perspective projection inverse process, and a three-dimensional point set corresponding to each plane is obtained; performing a least square fitting method on each obtained three-dimensional point set to obtain a plane corresponding to the three-dimensional point set; calculating the included angle and the distance between the marked planes based on the obtained planes; the method comprises the steps of actually measuring the angle and the distance between marked planes of a high-precision cube, comparing the obtained included angle with the distance, constructing an optimized objective function with the purpose of minimum difference, optimizing the internal parameters of the multi-view camera by using a nonlinear iterative optimization method through the optimized objective function, minimizing the objective function, and completing the internal parameter calibration of the multi-view camera.
2. The method according to claim 1, wherein the method for constructing the complex three-dimensional space region event trigger source in step 2 comprises: step 2.1: selecting a critical point of an event area in the multiple views; step 2.2: calculating a critical point under a camera coordinate system based on a triangulation method; step 2.3: and constructing an event three-dimensional space region which is wrapped by the polyhedron, and taking the event three-dimensional space region as an event trigger source of the region.
3. The method according to claim 2, wherein the method for constructing the event three-dimensional space region enclosed by the polyhedron in step 2.3 comprises: based on a critical point under a camera coordinate system, constructing a boundary of a positive N surface body, and wrapping an event area in the boundary; the value of N is required to meet the following constraint conditions:wherein n is the number of views of the multi-view camera; (x) n ,y n ) Is the coordinates of the critical point.
4. The method according to claim 3, wherein the method for constructing the trigger object based on the three-dimensional human body posture key point in the step 3 comprises the following steps: step 3.1: selecting an event trigger source; step 3.2: and defining human body posture key point signals of a plurality of events and human body posture key point signals of continuous events, registering the human body posture key point signals and the corresponding events in an event trigger source, and completing the construction of a trigger object.
5. The method of claim 4, wherein the data fusing the plurality of sets of planar depth data to obtain fused depth data comprises:wherein R is fusion depth data, and m is the group number of plane depth data; o is the height of each groupThe number of cubes included in the precision cube combination; r is (r) i Is plane depth data; s is the surface area of the irregular cube; h is the average number of faces of cubes in each set of high precision cube combinations.
6. The method according to claim 5, wherein the method for performing the external parameter calibration in step 1.2 comprises: acquiring included angles of the multi-view camera in three regular directions; further, the projections of the external parameters of the multi-view camera in three regular directions under the included angles are obtained, and three external parameter projection sets are respectively obtained; the three regular directions are a first regular direction, a second regular direction and a third regular direction respectively; the fitting error for three canonical directions is calculated using the following formula:
;
wherein, l is the parameter number in the external parameter projection set under each regular direction; w (w) 1 Calculate the function, y, for the error l Is an external parameter of the multi-view camera in a certain regular direction,the projection of the multi-view camera in a certain regular direction is externally referred; θ k An included angle of the multi-view camera in a certain regular direction; and taking the regular direction corresponding to the minimum fitting error as a standard projection direction according to the calculated fitting error, and taking an external parameter projection set obtained by projecting in the regular direction as an external parameter.
7. The method of claim 6, wherein the bundle adjustment of the multi-view camera parameters in step 1.2 is a parallel bundle adjustment method.
8. The method of claim 7, wherein the multi-view camera in step 1.3 acquires multiple views, and the method for performing three-dimensional reconstruction and tracking of the human body based on the multiple views to complete the construction of the multi-information source event sensor comprises: under each multi-view, carrying out three-dimensional reconstruction and tracking on the human body to obtain a plurality of three-dimensional reconstruction and tracking results of the human body; and under the three-dimensional reconstruction and tracking results of each human body, constructing information source event perceptrons, and combining each information source event perceptrons into a multi-information source event perceptrons.
9. A scene construction system for a multi-view camera for implementing the method of one of claims 1 to 8, characterized in that the system comprises: a construction unit of the multi-information source event sensor, configured to construct the multi-information source event sensor based on the multi-view camera array; the regional event trigger source construction unit is configured to construct a complex three-dimensional spatial regional event trigger source; a trigger object construction unit configured to construct a trigger object based on three-dimensional human body posture key points: and the event processing unit is configured for carrying out event triggering and recording.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311300861.7A CN117036448B (en) | 2023-10-10 | 2023-10-10 | Scene construction method and system of multi-view camera |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311300861.7A CN117036448B (en) | 2023-10-10 | 2023-10-10 | Scene construction method and system of multi-view camera |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117036448A CN117036448A (en) | 2023-11-10 |
CN117036448B true CN117036448B (en) | 2024-04-02 |
Family
ID=88634106
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311300861.7A Active CN117036448B (en) | 2023-10-10 | 2023-10-10 | Scene construction method and system of multi-view camera |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117036448B (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112470188A (en) * | 2018-05-25 | 2021-03-09 | 艾奎菲股份有限公司 | System and method for multi-camera placement |
CN113066168A (en) * | 2021-04-08 | 2021-07-02 | 云南大学 | Multi-view stereo network three-dimensional reconstruction method and system |
CN113870322A (en) * | 2021-08-23 | 2021-12-31 | 首都师范大学 | Event camera-based multi-target tracking method and device and computer equipment |
CN114359744A (en) * | 2021-12-07 | 2022-04-15 | 中山大学 | Depth estimation method based on fusion of laser radar and event camera |
WO2022194884A1 (en) * | 2021-03-17 | 2022-09-22 | Robovision | Improved vision-based measuring |
CN116188750A (en) * | 2023-02-06 | 2023-05-30 | 深圳纷来智能有限公司 | 3D human body joint movement sequence recording method |
CN116205991A (en) * | 2023-02-03 | 2023-06-02 | 深圳纷来智能有限公司 | Construction method of multi-information source event sensor based on multi-view camera array |
CN116258817A (en) * | 2023-02-16 | 2023-06-13 | 浙江大学 | Automatic driving digital twin scene construction method and system based on multi-view three-dimensional reconstruction |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8427656B2 (en) * | 2008-04-21 | 2013-04-23 | Max-Planck-Gesellschaft Zur Foderung Der Wissenschaften E.V. | Robust three-dimensional shape acquisition method and system |
-
2023
- 2023-10-10 CN CN202311300861.7A patent/CN117036448B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112470188A (en) * | 2018-05-25 | 2021-03-09 | 艾奎菲股份有限公司 | System and method for multi-camera placement |
WO2022194884A1 (en) * | 2021-03-17 | 2022-09-22 | Robovision | Improved vision-based measuring |
CN113066168A (en) * | 2021-04-08 | 2021-07-02 | 云南大学 | Multi-view stereo network three-dimensional reconstruction method and system |
CN113870322A (en) * | 2021-08-23 | 2021-12-31 | 首都师范大学 | Event camera-based multi-target tracking method and device and computer equipment |
CN114359744A (en) * | 2021-12-07 | 2022-04-15 | 中山大学 | Depth estimation method based on fusion of laser radar and event camera |
CN116205991A (en) * | 2023-02-03 | 2023-06-02 | 深圳纷来智能有限公司 | Construction method of multi-information source event sensor based on multi-view camera array |
CN116188750A (en) * | 2023-02-06 | 2023-05-30 | 深圳纷来智能有限公司 | 3D human body joint movement sequence recording method |
CN116258817A (en) * | 2023-02-16 | 2023-06-13 | 浙江大学 | Automatic driving digital twin scene construction method and system based on multi-view three-dimensional reconstruction |
Also Published As
Publication number | Publication date |
---|---|
CN117036448A (en) | 2023-11-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108416791B (en) | Binocular vision-based parallel mechanism moving platform pose monitoring and tracking method | |
KR102674646B1 (en) | Apparatus and method for obtaining distance information from a view | |
JP5631025B2 (en) | Information processing apparatus, processing method thereof, and program | |
JP3735344B2 (en) | Calibration apparatus, calibration method, and calibration program | |
JP4900204B2 (en) | Object recognition method | |
CN110689577B (en) | Active rigid body pose positioning method in single-camera environment and related equipment | |
US20130106833A1 (en) | Method and apparatus for optical tracking of 3d pose using complex markers | |
JP2021527877A (en) | 3D human body posture information detection method and devices, electronic devices, storage media | |
Santos et al. | 3D plant modeling: localization, mapping and segmentation for plant phenotyping using a single hand-held camera | |
CN107077735A (en) | Three dimensional object is recognized | |
CN109035330A (en) | Cabinet approximating method, equipment and computer readable storage medium | |
McKinnon et al. | Towards automated and in-situ, near-real time 3-D reconstruction of coral reef environments | |
JP2011242183A (en) | Image processing device, image processing method, and program | |
Nousias et al. | Large-scale, metric structure from motion for unordered light fields | |
Özdemir et al. | A multi-purpose benchmark for photogrammetric urban 3D reconstruction in a controlled environment | |
CN116921932A (en) | Welding track recognition method, device, equipment and storage medium | |
Bethmann et al. | Object-based multi-image semi-global matching–concept and first results | |
KR102317836B1 (en) | Method and apparatus for reconstructing three-dimensional image via a diffraction grating | |
CN117036448B (en) | Scene construction method and system of multi-view camera | |
JPH05135155A (en) | Three-dimensional model constitution device using successive silhouette image | |
CN103489165A (en) | Decimal lookup table generation method for video stitching | |
Nistér et al. | Non-parametric self-calibration | |
JP3548652B2 (en) | Apparatus and method for restoring object shape | |
JP3512919B2 (en) | Apparatus and method for restoring object shape / camera viewpoint movement | |
Maimone et al. | A taxonomy for stereo computer vision experiments |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |