CN117036448B - Scene construction method and system of multi-view camera - Google Patents

Scene construction method and system of multi-view camera Download PDF

Info

Publication number
CN117036448B
CN117036448B CN202311300861.7A CN202311300861A CN117036448B CN 117036448 B CN117036448 B CN 117036448B CN 202311300861 A CN202311300861 A CN 202311300861A CN 117036448 B CN117036448 B CN 117036448B
Authority
CN
China
Prior art keywords
event
view camera
human body
dimensional
constructing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311300861.7A
Other languages
Chinese (zh)
Other versions
CN117036448A (en
Inventor
顾平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Fanlai Intelligent Co ltd
Original Assignee
Shenzhen Fanlai Intelligent Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Fanlai Intelligent Co ltd filed Critical Shenzhen Fanlai Intelligent Co ltd
Priority to CN202311300861.7A priority Critical patent/CN117036448B/en
Publication of CN117036448A publication Critical patent/CN117036448A/en
Application granted granted Critical
Publication of CN117036448B publication Critical patent/CN117036448B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Geometry (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Computer Graphics (AREA)
  • Length Measuring Devices By Optical Means (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of cameras, and particularly relates to a scene construction method and system of a multi-view camera. The method comprises the following steps: step 1: constructing a multi-information source event sensor based on a multi-view camera array; step 2: constructing an event trigger source in a complex three-dimensional space area; step 3: constructing a trigger object based on three-dimensional human body posture key points; step 4: triggering and recording an event; according to the method, the multi-information source event sensor is built by adopting the multi-view camera array, three-dimensional reconstruction of a human body under a plurality of view angles is achieved, event sensing is completed, meanwhile, an programmable three-dimensional space is adopted as an event trigger source, and event triggering of a complex structure is achieved.

Description

Scene construction method and system of multi-view camera
Technical Field
The invention belongs to the technical field of cameras, and particularly relates to a scene construction method and system of a multi-view camera.
Background
Three-dimensional reconstruction (3 d reconstruction 1 n) is a mathematical model for creating a three-dimensional object suitable for computer representation and processing, is a basis for processing, operating and analyzing the three-dimensional object in a computer environment, and is a key technology for creating virtual reality expressing an objective world in a computer.
In computer vision, three-dimensional reconstruction refers to the process of reconstructing three-dimensional information from single-view or multi-view images. Because the information of the single video is incomplete, the three-dimensional reconstruction needs to use empirical knowledge, and the three-dimensional reconstruction of multiple views (similar to binocular positioning of people) is relatively easy.
In three-dimensional reconstruction, the construction of an event trigger source of a complex three-dimensional space region is key, and event trigger and recording are marking and recording performed for accurately positioning and describing a scene with activity or abnormality in the video processing process.
In current commonly used uncoupled video scenes, events are typically defined in a single view only, and are described by a simple two-dimensional image region, which is prone to a large number of missed and false detections, and difficult to detect for complex scenes. Furthermore, complex event definitions are often difficult to make in a single view.
Disclosure of Invention
Therefore, a main object of the present invention is to provide a method and a system for constructing a scene of a multi-view camera, wherein the method adopts a multi-view camera array to construct a multi-information source event sensor, thereby realizing three-dimensional reconstruction of a human body under a plurality of view angles, completing event sensing, and simultaneously adopting an orchestratable three-dimensional space as an event triggering source, and realizing event triggering of a complex structure.
In order to achieve the above purpose, the technical scheme of the invention is realized as follows:
a scene construction method of a multi-view camera, the method comprising the steps of:
step 1: the construction of the multi-information source event sensor based on the multi-view camera array specifically comprises the following steps:
step 1.1: performing internal parameter calibration of the multi-view camera;
step 1.2: performing external parameter calibration of the multi-view camera and bundling adjustment of parameters of the multi-view camera;
step 1.3: the multi-view camera acquires multiple views, performs three-dimensional reconstruction and tracking of human bodies based on the multiple views, and completes construction of the multi-information-source event sensor;
step 2: constructing an event trigger source in a complex three-dimensional space area;
step 3: constructing a trigger object based on three-dimensional human body posture key points;
step 4: the event triggering and recording method specifically comprises the following steps:
step 4.1: operating a multi-information source event sensor based on a multi-view camera array;
step 4.2: continuously detecting and reconstructing human body information;
step 4.3: checking the inclusion relation between the human body information and the set three-dimensional space region forming the event trigger source, if the inclusion condition is met, detecting whether the human body gesture accords with an event start human body gesture key point signal contained in the event, and if the inclusion condition is met, triggering the event corresponding to the event trigger source and starting recording; if the triggering event exists, detecting whether the triggering event meets the continuous human body gesture key point signal of the event, if yes, continuously recording, otherwise ending the event recording; if the inclusion condition is not satisfied and the event has triggered, the event recording is ended.
Further, the method for constructing the event trigger source in the complex three-dimensional space region in the step 2 includes: step 2.1: selecting a critical point of an event area in the multiple views; step 2.2: calculating a critical point under a camera coordinate system based on a triangulation method; step 2.3: and constructing an event three-dimensional space region which is wrapped by the polyhedron, and taking the event three-dimensional space region as an event trigger source of the region.
Further, the method for constructing the event three-dimensional space region wrapped by the polyhedron in the step 2.3 includes: based on a critical point under a camera coordinate system, constructing a boundary of a positive N surface body, and wrapping an event area in the boundary; the value of N is required to meet the following constraint conditions:
wherein n is the number of views of the multi-view camera; (x) n ,y n ) Is the coordinates of the critical point.
Further, the method for constructing the triggering object based on the three-dimensional human body posture key point in the step 3 includes: step 3.1: selecting an event trigger source; step 3.2: and defining human body posture key point signals of a plurality of events and human body posture key point signals of continuous events, registering the human body posture key point signals and the corresponding events in an event trigger source, and completing the construction of a trigger object.
Further, the method for performing internal reference calibration in the step 1.1 includes: combining a plurality of groups of high-precision cubes with known sizes to form irregular cubes with mutually different shapes; shooting the irregular cube at a plurality of different angles by using a multi-view camera to obtain a plurality of groups of shooting results; generating depth data of a plurality of groups of shooting results through perspective projection inverse process, respectively, projecting the depth data back into a three-dimensional space under a local coordinate system of a camera, generating a normal map with the same frame number, and dividing planes of a calibration object on the normal map to obtain a plurality of groups of plane depth data; carrying out data fusion on a plurality of groups of plane depth data to obtain fusion depth data; the fusion depth data is projected back to a three-dimensional space under a local coordinate system of the camera through a perspective projection inverse process, and a three-dimensional point set corresponding to each plane is obtained; performing a least square fitting method on each obtained three-dimensional point set to obtain a plane corresponding to the three-dimensional point set; calculating the included angle and the distance between the marked planes based on the obtained planes; actually measuring the included angle and the distance between marked planes of the high-precision cube, comparing the obtained included angle with the distance, constructing an optimized objective function with the aim of minimizing the difference, optimizing the internal parameters of the multi-view camera by using a nonlinear iterative optimization method through the optimized objective function, minimizing the objective function, and completing the internal parameter calibration of the multi-view camera.
Further, the method for performing data fusion on multiple groups of plane depth data to obtain fused depth data includes:wherein R is fusion depth data, and m is the group number of plane depth data; o is the number of cubes contained in each set of high precision cube combinations; r is (r) i Is plane depth data; s is the surface area of the irregular cube; m is the average number of faces of cubes in each set of high precision cube combinations.
Further, the method for performing external parameter calibration in the step 1.2 comprises the following steps: acquiring included angles of the multi-view camera in three regular directions; further, the projections of the external parameters of the multi-view camera in three regular directions under the included angles are obtained, and three external parameter projection sets are respectively obtained; the three regular directions are a first regular direction, a second regular direction and a third regular direction respectively; the fitting error for three canonical directions is calculated using the following formula:
wherein, l is the parameter number in the external parameter projection set under each regular direction; w (w) 1 Calculate the function, y, for the error l The multi-view camera is externally referred in a certain regular direction,the projection of the multi-view camera in a certain regular direction is externally referred; θ k An included angle of the multi-view camera in a certain regular direction; and taking the regular direction corresponding to the minimum fitting error as a standard projection direction according to the calculated fitting error, and taking an external parameter projection set obtained by projecting in the regular direction as an external parameter.
Further, the method for performing bundle adjustment of the parameters of the multi-view camera in step 1.2 is a parallel bundle adjustment method.
Further, the method for obtaining multiple views by the multi-view camera in the step 1.3, performing three-dimensional reconstruction and tracking of the human body based on the multiple views, and completing the construction of the multi-information-source event sensor comprises the following steps: under each multi-view, carrying out three-dimensional reconstruction and tracking on the human body to obtain a plurality of three-dimensional reconstruction and tracking results of the human body; and under the three-dimensional reconstruction and tracking results of each human body, constructing information source event perceptrons, and combining each information source event perceptrons into a multi-information source event perceptrons.
A scene construction system of a multi-view camera, the system comprising: a construction unit of the multi-information source event sensor, configured to construct the multi-information source event sensor based on the multi-view camera array; the regional event trigger source construction unit is configured to construct a complex three-dimensional spatial regional event trigger source; a trigger object construction unit configured to construct a trigger object based on three-dimensional human body posture key points: and the event processing unit is configured for carrying out event triggering and recording.
The scene construction method and system of the multi-view camera have the following beneficial effects: according to the invention, the multi-information source event sensor is constructed by adopting the multi-view camera array, so that effective sensing can be performed on a complex scene, the shielding robustness is very strong, the three-dimensional reconstruction of a human body can be realized by information complementation of a plurality of view angles, and further, the event sensing task is completed. Meanwhile, the invention adopts the programmable three-dimensional space area as the event trigger source, and constructs the event trigger source with various event types and complex structures by programming the programmable three-dimensional space area. In addition, the three-dimensional human body gesture key points are used as event triggering objects, and corresponding events are triggered and recorded by detecting the specific gesture of the three-dimensional human body gesture key points and the interaction relation between the key points and the three-dimensional space area.
Drawings
Fig. 1 is a schematic flow chart of a scene construction method of a multi-view camera according to an embodiment of the present invention.
Detailed Description
The method of the present invention will be described in further detail with reference to the accompanying drawings.
Example 1
A scene construction method of a multi-view camera, the method comprising the steps of:
step 1: the construction of the multi-information source event sensor based on the multi-view camera array specifically comprises the following steps:
step 1.1: performing internal parameter calibration of the multi-view camera;
step 1.2: performing external parameter calibration of the multi-view camera and bundling adjustment of parameters of the multi-view camera;
step 1.3: the multi-view camera acquires multiple views, performs three-dimensional reconstruction and tracking of human bodies based on the multiple views, and completes construction of the multi-information-source event sensor;
step 2: constructing an event trigger source in a complex three-dimensional space area;
step 3: constructing a trigger object based on three-dimensional human body posture key points;
step 4: the event triggering and recording method specifically comprises the following steps:
step 4.1: operating a multi-information source event sensor based on a multi-view camera array;
step 4.2: continuously detecting and reconstructing human body information;
step 4.3: checking the inclusion relation between the human body information and the set three-dimensional space region forming the event trigger source, if the inclusion condition is met, detecting whether the human body gesture accords with an event start human body gesture key point signal contained in the event, and if the inclusion condition is met, triggering the event corresponding to the event trigger source and starting recording; if the triggering event exists, detecting whether the triggering event meets the continuous human body gesture key point signal of the event, if yes, continuously recording, otherwise ending the event recording; if the inclusion condition is not satisfied and the event has triggered, the event recording is ended.
Specifically, in computer vision, three-dimensional reconstruction is a process of reconstructing three-dimensional information according to single-view or multi-view images, and because the information of a single video is incomplete, three-dimensional reconstruction needs to use priori knowledge, while multi-view three-dimensional reconstruction can reconstruct a three-dimensional model by using the information of two-dimensional images of more viewpoints. However, most three-dimensional reconstruction algorithms at present are not accurate and comprehensive enough to utilize two-dimensional information, and the calculation process excessively depends on information provided by external equipment, such as depth information provided by a depth camera, or depends on segmentation results of a target and a background, so that the reconstructed results are still rough.
According to the invention, the multi-information source event sensor is constructed by adopting the multi-view camera array, so that effective sensing can be performed on a complex scene. The result is not dependent on the segmentation result of the target and the background, but the corresponding event is triggered and recorded by detecting the specific gesture of the three-dimensional human gesture key point and the interaction relation between the key point and the three-dimensional space region, so that the accuracy of the reconstruction result is higher and the method is more suitable for complex three-dimensional scenes.
Example 2
On the basis of the above embodiment, the method for constructing the complex three-dimensional space region event trigger source in the step 2 includes: step 2.1: selecting a critical point of an event area in the multiple views; step 2.2: calculating a critical point under a camera coordinate system based on a triangulation method; step 2.3: and constructing an event three-dimensional space region which is wrapped by the polyhedron, and taking the event three-dimensional space region as an event trigger source of the region.
Specifically, the event trigger source defines an event triggered area.
Example 3
On the basis of the above embodiment, the method for constructing the event three-dimensional space region enclosed by the polyhedron in the step 2.3 includes: based on a critical point under a camera coordinate system, constructing a boundary of a positive N surface body, and wrapping an event area in the boundary; the value of N is required to meet the following constraint conditions:wherein n is the number of views of the multi-view camera; (x) n ,y n ) Is the coordinates of the critical point.
Specifically, in general, the larger the N value of the positive N surface body is, the more accurate the obtained event area is, and the more accurate the subsequent scene construction result is.
Example 4
On the basis of the above embodiment, the method for constructing the triggering object based on the three-dimensional human body posture key point in the step 3 includes: step 3.1: selecting an event trigger source; step 3.2: and defining human body posture key point signals of a plurality of events and human body posture key point signals of continuous events, registering the human body posture key point signals and the corresponding events in an event trigger source, and completing the construction of a trigger object.
In particular, in image processing, a key point is essentially a feature. It is an abstract description of a fixed region or spatial physical relationship, describing a combination or context within a certain neighborhood. It is not just a point information, or represents a location, but rather a combination of context and surrounding neighborhood. The object of the detection of key points is to find out the coordinates of these points from an image by a computer, which is a basic task in the field of computer vision, and the detection of key points has a crucial meaning for high-level tasks such as identification and classification.
Specifically, the internal parameter calibration algorithm in the prior art is often realized based on only a single parameter or parameter set, and the accuracy of the result is low. Whereas the scene construction due to the invention is based on multi-view cameras. In this case, the accuracy is lower by using the conventional internal reference calibration method.
Human body posture key point detection (Human Keypoint Detection) is also called human body posture recognition, aims to accurately position the positions of human body joints in images, and is a front-end task of human body action recognition, human body behavior analysis and human-computer interaction. Unlike human face key point detection, the human trunk part is more flexible, the change is more difficult to predict, the coordinate regression-based method is difficult to compete, and a thermodynamic diagram regression key point detection method is generally used.
Example 5
On the basis of the above embodiment, the method for performing internal reference calibration in step 1.1 includes: combining a plurality of groups of high-precision cubes with known sizes to form irregular cubes with mutually different shapes; shooting the irregular cube at a plurality of different angles by using a multi-view camera to obtain a plurality of groups of shooting results; generating depth data of a plurality of groups of shooting results through perspective projection inverse process, respectively, projecting the depth data back into a three-dimensional space under a local coordinate system of a camera, generating a normal map with the same frame number, and dividing planes of a calibration object on the normal map to obtain a plurality of groups of plane depth data; carrying out data fusion on a plurality of groups of plane depth data to obtain fusion depth data; the fusion depth data is projected back to a three-dimensional space under a local coordinate system of the camera through a perspective projection inverse process, and a three-dimensional point set corresponding to each plane is obtained; performing a least square fitting method on each obtained three-dimensional point set to obtain a plane corresponding to the three-dimensional point set; calculating the included angle and the distance between the marked planes based on the obtained planes; actually measuring the included angle and the distance between marked planes of the high-precision cube, comparing the obtained included angle with the distance, constructing an optimized objective function with the aim of minimizing the difference, optimizing the internal parameters of the multi-view camera by using a nonlinear iterative optimization method through the optimized objective function, minimizing the objective function, and completing the internal parameter calibration of the multi-view camera.
The key point detection method can be generally divided into two types, one is solved by a coordinate regression mode, the other is to model the key points into a thermodynamic diagram, and the position of the key points is obtained by regression thermodynamic diagram distribution through a pixel classification task. Both methods are means or approaches, and solve the problem of finding out the position and relation of the point in the image
Example 6
On the basis of the above embodiment, the method for performing data fusion on multiple sets of plane depth data to obtain fused depth data includes:wherein R is fusion depth data, and m is the group number of plane depth data; o is the number of cubes contained in each set of high precision cube combinations; r is (r) i Is plane depth data; s is the surface area of the irregular cube; m is the average number of faces of cubes in each set of high precision cube combinations.
Specifically, data fusion is performed on multiple groups of plane depth data, and the obtained fusion depth data based on fusion can reflect the result obtained by shooting the multi-view camera under multiple angles on the whole. Thereby making the calibration result more accurate.
Example 7
Based on the above embodiment, the method for performing the external parameter calibration in step 1.2 is as follows: acquiring included angles of the multi-view camera in three regular directions; further, the projections of the external parameters of the multi-view camera in three regular directions under the included angles are obtained, and three external parameter projection sets are respectively obtained; the three regular directions are a first regular direction, a second regular direction and a third regular direction respectively; the fitting error for three canonical directions is calculated using the following formula:
where l is each canonical directionThe number of parameters in the underlying projection set of the external parameters; w (w) 1 Calculate the function, y, for the error l The multi-view camera is externally referred in a certain regular direction,the projection of the multi-view camera in a certain regular direction is externally referred; θ k An included angle of the multi-view camera in a certain regular direction; and taking the regular direction corresponding to the minimum fitting error as a standard projection direction according to the calculated fitting error, and taking an external parameter projection set obtained by projecting in the regular direction as an external parameter.
In particular, in the external parameter calibration, if the method in the prior art is carried into a multi-view camera, the result is easily inaccurate. And through the projection of a plurality of regular directions, the external parameter calibration is carried out based on the fitting error of each regular direction, so that the result is more accurate.
Example 8
On the basis of the above embodiment, the method for performing bundle adjustment of the parameters of the multi-view camera in step 1.2 is a parallel bundle adjustment method.
Example 9
On the basis of the above embodiment, the method for constructing the multi-information source event sensor by acquiring multiple views by the multi-view camera in step 1.3 and performing three-dimensional reconstruction and tracking of the human body based on the multiple views includes: under each multi-view, carrying out three-dimensional reconstruction and tracking on the human body to obtain a plurality of three-dimensional reconstruction and tracking results of the human body; and under the three-dimensional reconstruction and tracking results of each human body, constructing information source event perceptrons, and combining each information source event perceptrons into a multi-information source event perceptrons.
Example 10
A scene construction system of a multi-view camera, the system comprising: a construction unit of the multi-information source event sensor, configured to construct the multi-information source event sensor based on the multi-view camera array; the regional event trigger source construction unit is configured to construct a complex three-dimensional spatial regional event trigger source; a trigger object construction unit configured to construct a trigger object based on three-dimensional human body posture key points: and the event processing unit is configured for carrying out event triggering and recording.
It should be noted that, in the system provided in the foregoing embodiment, only the division of the foregoing functional units is illustrated, in practical application, the foregoing functional allocation may be performed by different functional units, that is, the units or steps in the embodiment of the present invention are further decomposed or combined, for example, the units in the foregoing embodiment may be combined into one unit, or may be further split into multiple sub-units, so as to complete all or the functions of the units described above. The names of the units and the steps related to the embodiment of the invention are only used for distinguishing the units or the steps, and are not to be construed as undue limitation of the invention.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the storage device and the processing device described above and the related description may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein.
Those of skill in the art will appreciate that the various illustrative elements, method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the program(s) corresponding to the software elements, method steps may be embodied in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may implement the described functionality using different approaches for each particular application, but such implementation is not intended to be limiting.
The terms "first," "another portion," and the like, are used for distinguishing between similar objects and not for describing a particular sequential or chronological order.
The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or unit/apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or unit/apparatus.
Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related art marks may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will fall within the scope of the present invention.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention.

Claims (9)

1. A scene construction method of a multi-view camera, the method comprising the steps of:
step 1: the construction of the multi-information source event sensor based on the multi-view camera array specifically comprises the following steps:
step 1.1: performing internal parameter calibration of the multi-view camera;
step 1.2: performing external parameter calibration of the multi-view camera and bundling adjustment of parameters of the multi-view camera;
step 1.3: the multi-view camera acquires multiple views, performs three-dimensional reconstruction and tracking of human bodies based on the multiple views, and completes construction of the multi-information-source event sensor;
step 2: constructing an event trigger source in a complex three-dimensional space area;
step 3: constructing a trigger object based on three-dimensional human body posture key points;
step 4: the event triggering and recording method specifically comprises the following steps:
step 4.1: operating a multi-information source event sensor based on a multi-view camera array;
step 4.2: continuously detecting and reconstructing human body information;
step 4.3: checking the inclusion relation between the human body information and the set three-dimensional space region forming the event trigger source, if the inclusion condition is met, detecting whether the human body gesture accords with an event start human body gesture key point signal contained in the event, and if the inclusion condition is met, triggering the event corresponding to the event trigger source and starting recording; if the triggering event exists, detecting whether the triggering event meets the continuous human body gesture key point signal of the event, if yes, continuously recording, otherwise ending the event recording; if the inclusion condition is not satisfied and the event has been triggered, ending the event record;
the method for performing internal reference calibration in the step 1.1 comprises the following steps: combining a plurality of groups of high-precision cubes with known sizes to form irregular cubes with mutually different shapes; shooting the irregular cube at a plurality of different angles by using a multi-view camera to obtain a plurality of groups of shooting results; generating depth data of a plurality of groups of shooting results through perspective projection inverse process, respectively, projecting the depth data back into a three-dimensional space under a local coordinate system of a camera, generating a normal map with the same frame number, and dividing planes of a calibration object on the normal map to obtain a plurality of groups of plane depth data; carrying out data fusion on a plurality of groups of plane depth data to obtain fusion depth data; the fusion depth data is projected back to a three-dimensional space under a local coordinate system of the camera through a perspective projection inverse process, and a three-dimensional point set corresponding to each plane is obtained; performing a least square fitting method on each obtained three-dimensional point set to obtain a plane corresponding to the three-dimensional point set; calculating the included angle and the distance between the marked planes based on the obtained planes; the method comprises the steps of actually measuring the angle and the distance between marked planes of a high-precision cube, comparing the obtained included angle with the distance, constructing an optimized objective function with the purpose of minimum difference, optimizing the internal parameters of the multi-view camera by using a nonlinear iterative optimization method through the optimized objective function, minimizing the objective function, and completing the internal parameter calibration of the multi-view camera.
2. The method according to claim 1, wherein the method for constructing the complex three-dimensional space region event trigger source in step 2 comprises: step 2.1: selecting a critical point of an event area in the multiple views; step 2.2: calculating a critical point under a camera coordinate system based on a triangulation method; step 2.3: and constructing an event three-dimensional space region which is wrapped by the polyhedron, and taking the event three-dimensional space region as an event trigger source of the region.
3. The method according to claim 2, wherein the method for constructing the event three-dimensional space region enclosed by the polyhedron in step 2.3 comprises: based on a critical point under a camera coordinate system, constructing a boundary of a positive N surface body, and wrapping an event area in the boundary; the value of N is required to meet the following constraint conditions:wherein n is the number of views of the multi-view camera; (x) n ,y n ) Is the coordinates of the critical point.
4. The method according to claim 3, wherein the method for constructing the trigger object based on the three-dimensional human body posture key point in the step 3 comprises the following steps: step 3.1: selecting an event trigger source; step 3.2: and defining human body posture key point signals of a plurality of events and human body posture key point signals of continuous events, registering the human body posture key point signals and the corresponding events in an event trigger source, and completing the construction of a trigger object.
5. The method of claim 4, wherein the data fusing the plurality of sets of planar depth data to obtain fused depth data comprises:wherein R is fusion depth data, and m is the group number of plane depth data; o is the height of each groupThe number of cubes included in the precision cube combination; r is (r) i Is plane depth data; s is the surface area of the irregular cube; h is the average number of faces of cubes in each set of high precision cube combinations.
6. The method according to claim 5, wherein the method for performing the external parameter calibration in step 1.2 comprises: acquiring included angles of the multi-view camera in three regular directions; further, the projections of the external parameters of the multi-view camera in three regular directions under the included angles are obtained, and three external parameter projection sets are respectively obtained; the three regular directions are a first regular direction, a second regular direction and a third regular direction respectively; the fitting error for three canonical directions is calculated using the following formula:
wherein, l is the parameter number in the external parameter projection set under each regular direction; w (w) 1 Calculate the function, y, for the error l Is an external parameter of the multi-view camera in a certain regular direction,the projection of the multi-view camera in a certain regular direction is externally referred; θ k An included angle of the multi-view camera in a certain regular direction; and taking the regular direction corresponding to the minimum fitting error as a standard projection direction according to the calculated fitting error, and taking an external parameter projection set obtained by projecting in the regular direction as an external parameter.
7. The method of claim 6, wherein the bundle adjustment of the multi-view camera parameters in step 1.2 is a parallel bundle adjustment method.
8. The method of claim 7, wherein the multi-view camera in step 1.3 acquires multiple views, and the method for performing three-dimensional reconstruction and tracking of the human body based on the multiple views to complete the construction of the multi-information source event sensor comprises: under each multi-view, carrying out three-dimensional reconstruction and tracking on the human body to obtain a plurality of three-dimensional reconstruction and tracking results of the human body; and under the three-dimensional reconstruction and tracking results of each human body, constructing information source event perceptrons, and combining each information source event perceptrons into a multi-information source event perceptrons.
9. A scene construction system for a multi-view camera for implementing the method of one of claims 1 to 8, characterized in that the system comprises: a construction unit of the multi-information source event sensor, configured to construct the multi-information source event sensor based on the multi-view camera array; the regional event trigger source construction unit is configured to construct a complex three-dimensional spatial regional event trigger source; a trigger object construction unit configured to construct a trigger object based on three-dimensional human body posture key points: and the event processing unit is configured for carrying out event triggering and recording.
CN202311300861.7A 2023-10-10 2023-10-10 Scene construction method and system of multi-view camera Active CN117036448B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311300861.7A CN117036448B (en) 2023-10-10 2023-10-10 Scene construction method and system of multi-view camera

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311300861.7A CN117036448B (en) 2023-10-10 2023-10-10 Scene construction method and system of multi-view camera

Publications (2)

Publication Number Publication Date
CN117036448A CN117036448A (en) 2023-11-10
CN117036448B true CN117036448B (en) 2024-04-02

Family

ID=88634106

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311300861.7A Active CN117036448B (en) 2023-10-10 2023-10-10 Scene construction method and system of multi-view camera

Country Status (1)

Country Link
CN (1) CN117036448B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112470188A (en) * 2018-05-25 2021-03-09 艾奎菲股份有限公司 System and method for multi-camera placement
CN113066168A (en) * 2021-04-08 2021-07-02 云南大学 Multi-view stereo network three-dimensional reconstruction method and system
CN113870322A (en) * 2021-08-23 2021-12-31 首都师范大学 Event camera-based multi-target tracking method and device and computer equipment
CN114359744A (en) * 2021-12-07 2022-04-15 中山大学 Depth estimation method based on fusion of laser radar and event camera
WO2022194884A1 (en) * 2021-03-17 2022-09-22 Robovision Improved vision-based measuring
CN116188750A (en) * 2023-02-06 2023-05-30 深圳纷来智能有限公司 3D human body joint movement sequence recording method
CN116205991A (en) * 2023-02-03 2023-06-02 深圳纷来智能有限公司 Construction method of multi-information source event sensor based on multi-view camera array
CN116258817A (en) * 2023-02-16 2023-06-13 浙江大学 Automatic driving digital twin scene construction method and system based on multi-view three-dimensional reconstruction

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8427656B2 (en) * 2008-04-21 2013-04-23 Max-Planck-Gesellschaft Zur Foderung Der Wissenschaften E.V. Robust three-dimensional shape acquisition method and system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112470188A (en) * 2018-05-25 2021-03-09 艾奎菲股份有限公司 System and method for multi-camera placement
WO2022194884A1 (en) * 2021-03-17 2022-09-22 Robovision Improved vision-based measuring
CN113066168A (en) * 2021-04-08 2021-07-02 云南大学 Multi-view stereo network three-dimensional reconstruction method and system
CN113870322A (en) * 2021-08-23 2021-12-31 首都师范大学 Event camera-based multi-target tracking method and device and computer equipment
CN114359744A (en) * 2021-12-07 2022-04-15 中山大学 Depth estimation method based on fusion of laser radar and event camera
CN116205991A (en) * 2023-02-03 2023-06-02 深圳纷来智能有限公司 Construction method of multi-information source event sensor based on multi-view camera array
CN116188750A (en) * 2023-02-06 2023-05-30 深圳纷来智能有限公司 3D human body joint movement sequence recording method
CN116258817A (en) * 2023-02-16 2023-06-13 浙江大学 Automatic driving digital twin scene construction method and system based on multi-view three-dimensional reconstruction

Also Published As

Publication number Publication date
CN117036448A (en) 2023-11-10

Similar Documents

Publication Publication Date Title
CN108416791B (en) Binocular vision-based parallel mechanism moving platform pose monitoring and tracking method
KR102674646B1 (en) Apparatus and method for obtaining distance information from a view
JP5631025B2 (en) Information processing apparatus, processing method thereof, and program
JP3735344B2 (en) Calibration apparatus, calibration method, and calibration program
JP4900204B2 (en) Object recognition method
CN110689577B (en) Active rigid body pose positioning method in single-camera environment and related equipment
US20130106833A1 (en) Method and apparatus for optical tracking of 3d pose using complex markers
JP2021527877A (en) 3D human body posture information detection method and devices, electronic devices, storage media
Santos et al. 3D plant modeling: localization, mapping and segmentation for plant phenotyping using a single hand-held camera
CN107077735A (en) Three dimensional object is recognized
CN109035330A (en) Cabinet approximating method, equipment and computer readable storage medium
McKinnon et al. Towards automated and in-situ, near-real time 3-D reconstruction of coral reef environments
JP2011242183A (en) Image processing device, image processing method, and program
Nousias et al. Large-scale, metric structure from motion for unordered light fields
Özdemir et al. A multi-purpose benchmark for photogrammetric urban 3D reconstruction in a controlled environment
CN116921932A (en) Welding track recognition method, device, equipment and storage medium
Bethmann et al. Object-based multi-image semi-global matching–concept and first results
KR102317836B1 (en) Method and apparatus for reconstructing three-dimensional image via a diffraction grating
CN117036448B (en) Scene construction method and system of multi-view camera
JPH05135155A (en) Three-dimensional model constitution device using successive silhouette image
CN103489165A (en) Decimal lookup table generation method for video stitching
Nistér et al. Non-parametric self-calibration
JP3548652B2 (en) Apparatus and method for restoring object shape
JP3512919B2 (en) Apparatus and method for restoring object shape / camera viewpoint movement
Maimone et al. A taxonomy for stereo computer vision experiments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant