CN114818992B

CN114818992B - Image data analysis method, scene estimation method and 3D fusion method

Info

Publication number: CN114818992B
Application number: CN202210714675.7A
Authority: CN
Inventors: 何金龙; 袁霞; 温序铭
Original assignee: Chengdu Sobey Digital Technology Co Ltd
Current assignee: Chengdu Sobey Digital Technology Co Ltd
Priority date: 2022-06-23
Filing date: 2022-06-23
Publication date: 2022-09-23
Anticipated expiration: 2042-06-23
Also published as: CN114818992A

Abstract

The invention discloses an image data analysis method, a scene estimation method and a 3D fusion method, which belong to the field of videos and comprise an image data analysis step, an image raw data analysis step, a focal stack data analysis step, a camera parameter analysis step, a scene estimation step and an image fusion step aiming at multi-terminal real-scene data remote fusion, so that real-time multi-terminal camera cooperation and multi-application video image remote fusion is realized, and the problem of complex limitation at a screen terminal is avoided; meanwhile, the three-dimensional rendering engine is separated, the live-action data of a plurality of different ground ends can be reconstructed, the correct 3D visual geometric relation is obtained, a solution of live-action fusion is provided, and the phenomenon that the transition depends on 3D virtual scene modeling operation is avoided; and the viewpoint can be freely changed while the consistent expression of the fusion data is kept, and after the viewpoint of the local end is changed, the scene content of the fusion data changes along with the change of the viewpoint of the local end, so that the real-time seamless fusion of the remote scene and the local scene in vision is realized.

Description

Image data analysis method, scene estimation method and 3D fusion method

Technical Field

The invention relates to the technical field of videos, in particular to an image data analysis method, a scene estimation method and a 3D fusion method.

Background

With the development of video technology, applications of AR (augmented reality), VR (virtual reality), naked eye 3D, MR (mixed reality) and xr (extended reality) are becoming more sophisticated, which brings about rapid maturity of various 3D visual products and applications, but the initial design of such 3D visual products is a technical architecture based on a virtual-real fusion scheme. Meanwhile, image fusion still stays in the two-dimensional data fusion stage of the color, the brightness and the like of a scene in the academic stage.

In order to meet the increasing experience requirements of users, a method commonly adopted by the fusion technology is to implant a virtual object into an actual shot image sequence to complete a three-dimensional fusion effect, and the traditional image fusion is difficult to achieve the technical presentation of a three-dimensional layer.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, and provides an image data analysis method, a scene estimation method and a 3D fusion method, so that the video image remote fusion of real-time, multi-terminal camera cooperation and multiple applications is realized, the complex limitation of the screen terminal in the prior art is avoided, a three-dimensional rendering engine is separated, the live-action data of a plurality of remote terminals can replace a virtual scene, the solution of live-action fusion live-action is initiatively completed, the consistent expression of fusion data can be kept while the viewpoint is freely changed, the position of a main imaging terminal is allowed to change, the content and the scene of the fusion data change in real time along with the change of the viewpoint, and the phenomenon of cavities and the like are avoided.

The purpose of the invention is realized by the following scheme:

an image data analysis method is used for analyzing image data required by multi-terminal live-action data remote fusion, and specifically comprises the following steps:

s10, analyzing image raw data;

s11, focal stack data analysis;

and S12, analyzing camera parameters.

Further, in step S10, the parsing of the image raw data includes the sub-steps of:

s101, calculating main data in each frame extracted from a two-dimensional image and the scale of the main data under the current frame;

s102, carrying out quantization operation on image data, carrying out interframe similarity judgment on video data of a camera at the same end and carrying out difference judgment on video data at different ends; the quantization operation method comprises a method based on image color information, gray scale information, gradient information and amplitude data in a frequency domain, and different intermediate data are generated;

s103, measurement processing, including similarity measurement of data at the same end and difference measurement of data at different ends; the similarity measurement of the data at the same end comprises marking video frames at non-equidistant intervals so as to judge the position and scale information of a dynamic main body in the video, so that the main body can keep the stability of the size and the position in the subsequent processing process; the difference measurement of the different-end data comprises estimating a relation factor between dynamic bodies in the multi-end video frame by frame, and confirming the relation factor to ensure that each frame of fused data completes local consistency;

and S104, modeling, wherein the similarity measurement parameters of the data at the same end and the difference measurement parameters of the data at the different end are jointly modeled and estimated to obtain a global measurement factor, and the global measurement factor is utilized to ensure the global consistency of the analyzed video data.

Further, in step S11, the focal stack data parsing includes the sub-steps of:

s111, focal stack estimation, namely normalizing focal stack data of the multi-terminal video to a common scale, processing each frame of image data in a frequency domain, and estimating the position of a focal segment where each frame of data is located;

and S112, focal stack fusion, namely completing the focal stack state conversion of the image data subjected to the focal stack estimation processing in the step S111 in the frequency domain, and then completing the partial image data fusion.

Further, in step S12, the camera parameter parsing includes the sub-steps of:

s121, establishing a 3D relation between multi-frame image data for data in image raw data analysis and focus stack data analysis based on camera parameter estimation of an image, and estimating a CCD (charge coupled device), an FOV (field of view) and a physical focal length of the camera through a re-projection process so as to recover visual cone data imaged by the camera;

and S122, solving the mapping between the physical focal length of the camera and the image focal stack data, estimating the mapping relation between the actual focal length range of the camera and the focal stack data by utilizing the discrete focal stack range of each end device obtained in the focal stack estimation and combining the camera parameter estimation result based on the image, and fitting the function change relation between the data.

A scene estimation method, including a three-dimensional scene data reconstruction step, by which data analyzed by the image data analysis method described above is subjected to reconstruction of three-dimensional scene data, specifically including the substeps of:

s201, screen parameterization estimation, namely displaying a dot matrix image on a screen, extracting coordinates of a dot matrix of a shot screen image, and estimating a parameterization function of screen data in a Euclidean space;

s202, scene scale estimation, namely stitching camera imaging visual cone data of different ends processed by the camera parameter analysis unit to enable multiple cameras to jointly form an equivalent visual imaging system to obtain the final output scale of the scene;

s203, aiming at a static scene, combining the scale data obtained by the image raw data analysis unit, and simulating a plurality of planes to approximate a three-dimensional static scene space based on the camera visual cone structure; and aiming at the dynamic scene, estimating the motion trail and the geometric skeleton of the dynamic scene, and restoring the three-dimensional data of the dynamic scene to a real scale by combining the scale data obtained by the image raw data analysis unit.

A3D fusion method comprises a fusion step, wherein data obtained after the processing of the scene estimation method and data obtained after the analysis of the image data analysis method are fused by the fusion step, and the method specifically comprises the following substeps:

s301, performing geometric fusion, extracting matching data by using image information, establishing a 3D geometric relationship, and converting multi-terminal three-dimensional scene data into two-dimensional image data on an equivalent visual imaging system;

s302, image fusion, namely defining image data as different image blocks according to a 3D geometric relation in the geometric fusion, respectively establishing a pixel data histogram of each image block, calculating the similarity degree between the different image blocks, and then generating a corresponding mask image to assist the edge fusion between the image blocks;

and S303, performing fusion consistency processing, calculating the conversion of multi-terminal video data to image data under a shooting terminal according to geometric fusion and image fusion, and projecting the multi-terminal video data to a display medium according to parameters of scene estimation.

Further, in step S303, the fusion consistency processing includes geometric consistency processing and image data consistency processing; firstly, linking the external parameter data of the multi-end camera and the camera external parameter data of the main imaging end by utilizing the geometric consistency processing, and calculating a relative pose relation; and correcting the image data on the display medium by using a color mapping algorithm through the image data consistency processing.

The beneficial effects of the invention include:

the invention provides a real-time multi-terminal camera cooperation and multi-application remote fusion technology. Firstly, the invention avoids the complex limitation at the screen end in the prior art scheme, and can be applied to any number of screens, screens with any shape and structure, and common screens (LED, liquid crystal television, curtain projection screen, etc.) of any kind; secondly, the method is separated from a three-dimensional rendering engine, and can replace the live-action data of a plurality of different ground ends for a virtual scene, thereby creatively completing a perfect solution of fusing live-actions with live-actions; thirdly, the invention can freely change the viewpoint and simultaneously keep the consistent expression of the fusion data, allows the position of the main imaging terminal to change, and changes the content and the scene of the fusion data in real time along with the change of the observation point, thereby avoiding the phenomenon of a cavity.

The embodiment of the invention provides a two-dimensional image data analysis process aiming at multi-terminal live-action data 3D effect fusion, which comprises a method for solving image raw data analysis, focal stack data analysis and camera parameter analysis of videos at any terminal. Meanwhile, the embodiment of the invention carries out diversified processing on the object of the two-dimensional image data analysis so as to meet the application of different video products.

The embodiment of the invention provides a scene estimation method which can promote the data fusion degree of a data fusion module to be higher, and simultaneously recover a three-dimensional scene as completely as possible so as to reduce the limitation brought by a display medium (display screen).

The embodiment of the invention provides a 3D visual fusion method, which is different from a common image fusion process in that a data fusion process of the embodiment of the invention is respectively added with two processes of geometric fusion and fusion consistency processing at the head and the tail, and the rationality and the accuracy of fusion output data in different places can be ensured according to the geometric fusion, the image fusion and the fusion consistency processing.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a system framework diagram of an embodiment of the present invention;

FIG. 2 is a schematic diagram of an equivalent viewing cone system and a multi-end imaging system in an embodiment of the present invention; (a) a multi-end imaging system, and (b) an equivalent viewing cone system;

FIG. 3 is a flowchart illustrating steps of parsing image raw data according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating a step of resolving focus stack data according to an embodiment of the present invention;

FIG. 5 is a flowchart illustrating steps of estimating a mapping relationship between an actual camera focal length range and focal stack data according to an embodiment of the present invention;

FIG. 6 is a flowchart illustrating steps for reconstructing three-dimensional scene data according to an embodiment of the present invention;

FIG. 7 is a flowchart illustrating the steps of data fusion according to an embodiment of the present invention.

Detailed Description

All features disclosed in all embodiments in this specification, or all methods or process steps implicitly disclosed, may be combined and/or expanded, or substituted, in any way, except for mutually exclusive features and/or steps.

Regarding the technical problem solved: in the process of solving the problems of the existing virtual-real fusion scheme-based architecture described in the background, the embodiments of the present invention find the following technical problems: image fusion also stays in the two-dimensional data fusion stage of color, brightness and the like of a scene in the academic stage. In order to meet the increasing experience requirements of users, a method commonly adopted by the existing fusion technology is to implant a virtual object into an actual shot image sequence to complete a three-dimensional fusion effect, and the traditional image fusion is difficult to achieve the technical presentation of a three-dimensional layer.

One of the technical ideas of the invention is to provide a 3D allopatric fusion system of video images, which mainly comprises the following three functional modules: image data analysis, multi-terminal scene estimation and data fusion, and the whole system framework is shown in fig. 1.

Image data analysis

Unlike the general scene three-dimensional reconstruction, the core of the embodiment of the invention is to take the main body (such as a person) in the video data as the fusion core, which greatly reduces the calculation amount of the subsequent data fusion stage. No matter the traditional image fusion technology or technical products such as AR, VR and XR based on virtual-real fusion, the limitation of the application of the technology on functions is very obvious, and only when complete three-dimensional information is obtained, a more vivid 3D fusion effect can be obtained. In order to get rid of the dilemma caused by the phenomenon in the product and technology aspects, the embodiment of the invention provides a technology for multi-end live-action data allopatric fusion, and aims to solve the problems of image raw data analysis, focus stack data analysis and camera parameter analysis of videos at any end. Meanwhile, the embodiment of the invention carries out diversified processing on the object of the two-dimensional image data analysis so as to meet the application of different video products.

1) Image raw data analysis: similar to conventional video signal processing, the input data of such products is a two-dimensional image sequence, so in this step, while data analysis is performed on the image according to a conventional method, a corresponding method is also designed to analyze relevant intermediate data in a frequency domain to prepare for subsequent image focal stack estimation. In the specific implementation process, the method comprises the following steps:

step a: firstly, a deep learning-based method is adopted for human body recognition, human body recognition data sets under different conditions such as indoor, outdoor, strong light, cloudy days and the like are collected, a model with high accuracy is obtained through a large amount of training and optimization, human body data can be efficiently extracted from a two-dimensional image, and figure data in each frame and the scale of the figure data in the current frame are calculated.

Step b: after the step a is completed, quantization operation needs to be performed on the image data, and the purpose of the quantization operation is to perform interframe similarity judgment on the video data of the same-end camera and perform difference judgment on the video data of different ends. The quantization operation method is based on image color information, gray scale information, gradient information and amplitude data in a frequency domain, and different generated intermediate data respectively act on different subsequent processes.

Step c: and b, the image data enters a measurement stage after the quantization process of the step b. This includes two parts: similarity measurement of data at the same end and difference measurement of data at different ends. The main process of similarity measurement of data at the same end is to mark video frames at intervals in a non-equidistant manner so as to judge the information such as the position and the scale of a dynamic main body in a video, so that the main body (character) can keep the stability of the size and the position in the subsequent processing process; the main purpose of the difference measurement of the data at the different ends is to estimate the relationship factors between dynamic bodies in the multi-end video frame by frame, and the fused data at each frame can be ensured to complete local consistency after the relationship factors are confirmed.

Step d: through the data analysis of the three steps, the data only completes local consistency analysis no matter between the same-end data or the different-end data. If a global analysis result of all the end video data is to be obtained, the similarity measurement parameter of the same end data and the difference measurement parameter of the different end data in the step c need to be modeled and estimated together to obtain a global measurement factor, so that the global consistency of the analyzed video data can be ensured.

2) And (3) resolving the coke stack data: for studio products, the fidelity of 3D information perception of the fused data is greatly influenced by camera parameters. Under the condition of shooting by a multi-end video camera, because camera equipment of videos at different ends may be different, and because of participation of internal reference information such as focal length and the like, the depth of field of each end is different, and a blur area and a focus area in an image are completely different, focal stack information of a video frame at each end needs to be estimated in an image data analysis stage, so that the richness of analysis data on a 2D layer is improved.

Step a: and (3) estimating a focal stack: the method needs to estimate the focal stack information frame by frame for video data at each end on an image level, and as a reverse engineering of an imaging process, the method is invented based on frequency domain data. Firstly, normalizing the focus stack data of the multi-terminal video to a common scale, then processing each frame of image data in a frequency domain, and estimating the position of a focus segment where each frame of data is located.

Step b: focal stack fusion: since the remote fusion is a time sequence-based processing flow, the focus stack data of the multi-end video frame is usually different in the fusion stage, and therefore the image data of the different end needs to be converted into the focus stack state of the main imaging end. Similar to the image refocusing operation, in combination with the frequency domain data in step a) in step 2), the step completes the focal stack state conversion of the image data in the frequency domain, and completes the data fusion of the part.

3) Analyzing camera parameters: the steps 1) and 2) are only used for analyzing and converting the input source data on the 2D image level, and data support is provided for the subsequent fusion step. In order to achieve a more perfect state of the fusion effect, 3D analysis of the data is required. Mainly comprises the following steps:

a, step a: image-based camera parameter estimation: in case the camera parameters are not known, we provide a more flexible solution to estimate their camera parameters. Combining the data of the step a in the step 1) and the data of the step 2), the step establishes a 3D relationship between the multi-frame image data, and estimates the CCD, the FOV and the physical focal length of the camera through a less complex re-projection process, thereby recovering the visual vertebra data imaged by the camera.

Step b: solving the mapping between the physical focal length of the camera and the image focal stack data: the discrete focal stack range of each end device is obtained from the image layer in the step a) in the step 2), and the mapping relation between the actual camera focal length range and the focal stack data is estimated by combining the step a in the step 3), and the function change relation between the data is fitted, so that the data fusion in any focal length state in the subsequent fusion stage is facilitated.

Two, multi-terminal scene estimation

The last image data analysis module analyzes the image data and the camera parameters and transmits the image data and the camera parameters to the multi-terminal scene estimation module to reconstruct the three-dimensional scene data, and the main purpose of the method is to promote the data fusion degree of the next data fusion module to be higher and simultaneously recover the three-dimensional scene as completely as possible so as to reduce the limitation brought by a display medium (display screen). The method comprises the following specific steps:

1) and (3) screen parameterization estimation: although the final rendering effect of the allopatric fusion does not depend on the kind and structure of the display medium, if the rendering work needs to be completed on a specific display screen, the display medium needs to be parameterized and estimated, and the input data is projected to the screen in a correct geometric relationship. Common screens comprise a single plane screen, an L screen, a triple-folding screen, a curved screen and the like, and aiming at various display devices, the invention designs a unified screen parameterization estimation method. The dot matrix image is displayed on a screen, and the coordinate extraction of the dot matrix is carried out on the shot screen image, so that a parameterized function of the screen data in the Euclidean space can be estimated.

2) Scene scale estimation: through the screen parameterization estimation in the step 1), the scale problem of the imaging hardware equipment end is solved, the accuracy degree of the final projection data is directly influenced by the output result of the screen parameterization estimation, and the geometric consistency of the data projected on the screen and the picture in a real scene is ensured. However, since the data fusion at two ends is not considered in the remote fusion, the data at the input end may be more than one signal, and the scale factor of the two-dimensional layer obtained in the image data analysis module is combined, on the basis, the scale factor of the three-dimensional scene layer needs to be estimated to complete the scene scale estimation. Since different end data are projected to different locations of the display device, angular rotation and translation of the scene must occur. In the step, the output of camera parameter analysis in the image data analysis module is combined, and the camera visual cone data of different ends are spliced together, so that the problem of visual elegance caused by the combined action of different focal lengths and image layer scale factors is solved, and finally, a multi-end camera jointly forms an equivalent visual imaging system, as shown in fig. 2. Wherein, (a) is a multi-end imaging system, and (b) is an equivalent viewing cone system. This step is completed, and the final output scale of the scene, i.e. the transformation standard of all scene data, is obtained.

3) And (3) reconstructing a static scene: video data usually has a relatively static background part and a moving subject part (generally a character), and in order to improve the efficiency of the whole project, different methods are designed for recovering 3D data from a 2D image for two scenes. For a static scene, the scale data obtained by the image data analysis module is combined, and meanwhile, a plurality of planes are simulated to approximate a three-dimensional static scene space based on the camera visual cone structure. Different from the three-dimensional reconstruction mode in the traditional SLAM or SFM technology, the method does not need to introduce some reconstruction errors through a triangulation algorithm, and also avoids the efficiency problem caused by reprojection errors, and the visual cone-based multi-plane three-dimensional scene reconstruction is very suitable for the development of specific application scenes in the embodiment of the invention.

4) Dynamic scene reconstruction: in consideration of the real-time requirement of the whole remote fusion system, the three-dimensional data recovery of a main body (human) in video data is different from the recovery processing method of a static scene three-dimensional scene. In the step, complete three-dimensional reconstruction of the dynamic scene is not needed, the motion track and the geometric skeleton of the dynamic scene are only needed to be estimated, and the three-dimensional data of the dynamic scene is restored to a real scale by combining with the scale data obtained by the image data analysis module.

Three, data fusion

All input data required for fusion are obtained after the image data analysis and the multi-end scene estimation of the first two modules. Different from a common image fusion process, the data fusion process of the invention needs to respectively add two processes of geometric fusion and fusion consistency processing at the head and the tail. The rationality and accuracy of the allopatric fusion output data can be ensured only by strictly processing the flow sequence according to geometric fusion, image fusion and fusion consistency.

1) Geometric fusion: an equivalent visual imaging system is obtained through the scene scale estimation of the multi-terminal scene estimation module, so that each terminal imaging system is converted into a non-standard visual cone system with offset and inclination angle on the basis of a standard visual imaging system. In order to correctly perceive the three-dimensional relationship of the multi-terminal scene data on the equivalent visual imaging system, corresponding matching data are extracted by utilizing image information, a fused 3D geometric relationship is established, and the multi-terminal three-dimensional scene data is converted into two-dimensional image data on the equivalent visual imaging system.

2) 3D visual fusion: the image data of the imaging end fused with the multi-end video data is obtained through the step 1) of data fusion, but the image data at the moment is inevitably subjected to the hard segmentation phenomenon of the image at the stitching edge due to the 3D geometric stitching relationship. The step is mainly used for solving the problem of fusion of the stitching edges of the image data in the image processing level. Defining the 3D geometric relationship of the image data in the step 1) of data fusion as different image blocks, respectively establishing a pixel data histogram of each image block, calculating the similarity degree between different blocks, and then generating a corresponding mask image to assist the edge fusion between the image blocks.

3) And (3) fusion consistency processing: after obtaining the corrected image data, calculating the conversion of the multi-terminal video data to the image data under the shooting terminal according to the steps 1) and 2), and then projecting the image data onto a display medium according to the relevant parameters of the multi-terminal scene estimation module. The work of this step mainly deals with the problem of consistency between the projection data and the real scene data on the display medium, which includes geometric consistency and image data consistency. The geometric problem of the step is initialized by multi-terminal scene estimation, and the geometric consistency of the remote fusion system can be kept on data all the time only by linking external parameter data of a multi-terminal camera and external parameter of a camera of a main imaging terminal and calculating a relative pose relation; the image consistency is mainly embodied in the consistency mapping of the screen color and the real scene color space, and the image data on the display medium is corrected through a color mapping algorithm, so that the image consistency processing is completed.

Example 1

An image data analysis method is used for analyzing image data required by multi-terminal live-action data fusion in different places, and specifically comprises the following steps:

s10, analyzing image raw data;

s11, resolving focal stack data;

and S12, analyzing camera parameters.

Example 2

Based on embodiment 1, as shown in fig. 3, in step S10, the image raw data parsing includes the sub-steps of:

s102, carrying out quantization operation on image data, carrying out similarity judgment between frames on video data of a camera at the same end and carrying out difference judgment on video data at different ends; the method of quantization operation includes a method based on image color information, gray scale information, gradient information, and amplitude data in the frequency domain, and generates different intermediate data;

s103, measurement processing, including similarity measurement of data at the same end and difference measurement of data at different ends; the similarity measurement of the data at the same end comprises marking video frames at intervals in a non-equidistant way so as to judge the position and scale information of a dynamic main body in the video, so that the main body can keep the stability of the size and the position in the subsequent processing process; the difference measurement of the data at the different ends comprises the steps of estimating a relation factor between dynamic bodies in the multi-end video frame by frame, and confirming the relation factor to ensure that each frame of fused data completes local consistency;

Example 3

On the basis of embodiment 1 or 2, as shown in fig. 4, in step S11, the focal stack data parsing includes the sub-steps of:

s111, focal stack estimation, namely normalizing focal stack data of the multi-end video to a common scale, then processing each frame of image data in a frequency domain, and estimating the position of a focal segment where each frame of data is located;

and S112, fusing the focal stacks, namely completing the focal stack state conversion of the image data subjected to the focal stack estimation processing in the step S111 in the frequency domain, and then completing the partial image data fusion.

Example 4

On the basis of embodiment 3, as shown in fig. 5, in step S12, the camera parameter analysis includes the sub-steps of:

Example 5

A scene estimation method, as shown in fig. 6, includes a three-dimensional scene data reconstruction step, and the reconstruction of three-dimensional scene data using data analyzed by the image data analysis method described in embodiment 1 or embodiment 2 includes the following specific sub-steps:

s201, screen parameterization estimation, namely displaying a dot matrix image on a screen, extracting coordinates of the dot matrix of the shot screen image, and estimating a parameterization function of screen data in a Euclidean space;

Example 6

A 3D fusion method, as shown in fig. 7, includes a fusion step, where the fusion step is used to fuse data obtained after processing by the scene estimation method described in embodiment 5 and data obtained after parsing by the image data parsing methods described in embodiments 1 and 2, and specifically includes the following sub-steps:

and S303, performing fusion consistency processing, calculating the conversion of multi-terminal video data to image data under a shooting terminal according to geometric fusion and image fusion, and projecting the image data to a display medium according to parameters of scene estimation.

Example 7

On the basis of embodiment 6, in step S303, the fusion consistency processing includes geometric consistency processing and image data consistency processing; firstly, linking external parameter data of a multi-end camera and external parameter data of a main imaging end by utilizing geometric consistency processing, and calculating a relative pose relation; and correcting the image data on the display medium by using a color mapping algorithm through image data consistency processing.

The parts not involved in the present invention are the same as or can be implemented using the prior art.

Other embodiments than the above examples may be devised by those skilled in the art based on the foregoing disclosure, or by adapting and using knowledge or techniques of the relevant art, and features of various embodiments may be interchanged or substituted and such modifications and variations that may be made by those skilled in the art without departing from the spirit and scope of the present invention are intended to be within the scope of the following claims.

Claims

1. An image data analysis method is characterized in that image data required by multi-end live-action data allopatric fusion is analyzed, and the method specifically comprises the following steps:

s10, analyzing image raw data; in step S10, the image raw data parsing includes the sub-steps of:

s102, carrying out quantization operation on image data, carrying out similarity judgment between frames on video data of a camera at the same end and carrying out difference judgment on video data at different ends; the method of quantization operation is based on image color information, gray scale information, gradient information and amplitude data in frequency domain, and different intermediate data are generated;

s103, measurement processing, including similarity measurement of data at the same end and difference measurement of data at different ends; the similarity measurement of the data at the same end comprises marking video frames at non-equidistant intervals so as to judge the position and scale information of a dynamic main body in the video, so that the main body can keep the stability of the size and the position in the subsequent processing process; the difference measurement of the data at the different ends comprises the steps of estimating a relation factor between dynamic bodies in the multi-end video frame by frame, and confirming the relation factor to ensure that each frame of fused data completes local consistency;

s104, modeling, namely performing common modeling estimation on the similarity measurement parameter of the data at the same end and the difference measurement parameter of the data at the different end to obtain a global measurement factor, and ensuring the global consistency of the analyzed video data by using the global measurement factor;

s11, focal stack data analysis; in step S11, the focal stack data parsing includes the sub-steps of:

s111, focal stack estimation, namely normalizing focal stack data of the multi-end video to a common scale, then processing each frame of image data in a frequency domain, and estimating the position of a focal stack where each frame of data is located;

s112, fusing a focal stack, namely completing the focal stack state conversion of the image data subjected to the focal stack estimation processing in the step S111 in a frequency domain, and then completing the partial image data fusion;

s12, camera parameter parsing, in step S12, the camera parameter parsing includes the sub-steps of:

2. A scene estimation method, comprising a three-dimensional scene data reconstruction step by which data analyzed by the image data analysis method according to claim 1 is subjected to reconstruction of three-dimensional scene data, specifically comprising the sub-steps of:

s202, scene scale estimation, namely stitching camera imaging visual cone data of different ends processed by the camera parameter analysis unit to enable multiple cameras to jointly form an equivalent visual imaging system to obtain the final output scale of a scene;

3. A 3D fusion method, comprising a fusion step of fusing data obtained by processing by the scene estimation method according to claim 2 and data obtained by analyzing by the image data analysis method according to claim 1, specifically comprising the substeps of:

4. The 3D fusion method according to claim 3, wherein in step S303, the fusion consistency process includes a geometric consistency process and an image data consistency process; firstly, linking the external parameter data of the multi-end camera and the camera external parameter data of the main imaging end by utilizing the geometric consistency processing, and calculating a relative pose relation; and correcting the image data on the display medium by using a color mapping algorithm through the image data consistency processing.