CN114373153B

CN114373153B - Video imaging optimization system and method based on multi-scale array camera

Info

Publication number: CN114373153B
Application number: CN202210030211.4A
Authority: CN
Inventors: 温建伟; 赵月峰
Original assignee: Beijing Zhuohe Technology Co Ltd
Current assignee: Beijing Zhuohe Technology Co Ltd
Priority date: 2022-01-12
Filing date: 2022-01-12
Publication date: 2022-12-27
Anticipated expiration: 2042-01-12
Also published as: CN114373153A

Abstract

The invention provides a video imaging optimization method and system based on a multi-scale array camera, and belongs to the technical field of video optimization processing. The method comprises the following steps: extracting a first picture characteristic point and a second picture characteristic point; matching the first picture characteristic points with the second picture characteristic points; combining and screening the matched feature points to obtain effective feature points; and splicing the video data of the first scale array camera and the video data of the second scale array camera based on the effective characteristic points to obtain optimized video imaging data. The system is connected with the high-resolution detail camera array and the low-resolution global camera array, and video data output by different array cameras are spliced to obtain optimized video imaging data. The invention improves the number of effective characteristic points matched with the corresponding areas of the high-resolution detail video and the low-resolution global video, thereby enhancing the robustness of the algorithm and the imaging effect of the high-resolution video.

Description

Video imaging optimization system and method based on multi-scale array camera

Technical Field

The invention belongs to the technical field of video optimization processing, and particularly relates to a video imaging optimization system and method based on a multi-scale array camera, electronic equipment for realizing the method and a computer-readable storage medium.

Background

The multi-scale array camera comprises a wide-angle camera serving as a reference and is responsible for acquiring a low-resolution global reference video of a preset scene, and an array formed by a plurality of long-focus cameras is responsible for shooting a high-resolution detail video of a specific area.

When a high-resolution full-width video or a high-resolution local video in any area needs to be displayed, a global reference video is used as a reference to splice the high-resolution detail video. In the splicing process, feature point matching is carried out on corresponding areas of the high-resolution detail video and the low-resolution global video.

The feature point matching algorithm widely used at present is a Scale-invariant feature transform (SIFT) algorithm. When the algorithm is used for matching two pictures with large resolution difference, the number of matched feature points is small, and the number of effective feature points is small, so that the final high-resolution video imaging effect is not ideal, and the robustness of the video imaging algorithm is poor.

Due to the fact that the resolution difference between the high-resolution video and the low-resolution video is large (10 times or even 40 times), most features in the high-resolution detail video disappear from the low-resolution global video, the problems that matched feature points are few and the like often occur, image splicing is affected, and the imaging effect of the final high-resolution video is not ideal.

Disclosure of Invention

In order to solve the above technical problems, the present invention provides a video imaging optimization system and method based on a multi-scale array camera, an electronic device and a computer-readable storage medium for implementing the method.

In a first aspect of the present invention, a video imaging optimization method based on a multi-scale array camera is provided, and the method is applied to a scene including a high-resolution detail camera array and a low-resolution global camera array, and can improve the number of effective feature points matched with corresponding regions of a high-resolution detail video and a low-resolution global video, so as to improve the robustness of an algorithm and the imaging effect of the high-resolution video.

Specifically, the method comprises the following steps:

s100: acquiring first scale array camera video data and second scale array camera video data;

s200: extracting a first picture feature point in the first scale array camera video data and a second picture feature point in the second scale array camera video data;

s300: carrying out feature point matching on the first picture feature point and the second picture feature point;

s400: combining and screening the matched feature points to obtain effective feature points;

s500: and splicing the video data of the first scale array camera and the video data of the second scale array camera based on the effective characteristic points to obtain optimized video imaging data.

In step S100, the first scale array camera video data is first high resolution video data, and the second scale array camera video data is second low resolution video data;

the first high resolution is N times the second low resolution, the N being greater than 10.

As a more specific key technical means for improvement, the step S300 specifically includes:

s301: selecting a high resolution video frame A of a detail camera ₁ And corresponding low resolution video pictures A in the global camera ₂ ；

S302: performing first feature point matching based on a scale invariant feature transform algorithm;

s303: and performing second-time feature point matching based on an edge detection feature point matching algorithm.

The technical scheme of the first aspect provides a video imaging optimization algorithm based on a multi-scale array camera, and a high-resolution video picture A of a detail camera is selected ₁ And corresponding low resolution video pictures A in the global camera ₂ And matching the characteristic points of the two pictures. And on the basis of matching the feature points by the traditional scale invariant feature transformation algorithm, performing feature point matching by using an edge detection feature point matching module.

And combining the feature points matched by the traditional scale invariant feature transformation algorithm and the feature points matched by the image after edge detection to obtain all matched feature point information.

Then, the step S400 specifically includes:

s401: combining the feature point matching results of the step S302 and the step S303;

s402: and screening out effective characteristic points by a comparison method.

Specifically, the number of the feature points affects the calculation complexity of the algorithm, and on the premise of ensuring that the splicing effect is not affected, the technical scheme of the invention can reduce the number of the feature points as much as possible, reserve effective feature points, delete useless feature points and improve the calculation efficiency of the algorithm. Especially, in the real-time processing process of hundred million pixel videos, the time of every 1 millisecond is very precious, and the extraction of effective feature points is particularly critical. Furthermore, the matched feature points may be matched incorrectly, and the incorrectly matched feature points also need to be deleted.

Based on the improvement, when the feature points of the images with great resolution difference are matched, the technical scheme of the invention can obviously increase the number of effective feature points, improve the robustness of a video imaging algorithm, improve the image splicing quality and obviously optimize the video imaging effect.

In a second aspect of the present invention, a video imaging optimization system based on a multi-scale array camera is provided, where the system connects a high-resolution detail camera array and a low-resolution global camera array, and can be used to implement the method of the first aspect, i.e., the number of effective feature points matched in corresponding regions of a high-resolution detail video and a low-resolution global video can be increased, thereby increasing the robustness of the algorithm and the imaging effect of the high-resolution video.

In a specific structure, the system comprises:

the picture characteristic point extraction module is used for extracting a first picture characteristic point of a video picture acquired by the high-resolution detail camera array and a second picture characteristic point of the video picture acquired by the low-resolution global camera array;

the characteristic point matching module is used for carrying out characteristic point matching on the first picture characteristic point and the second picture characteristic point;

the characteristic point screening module is used for combining and screening the matched characteristic points to obtain effective characteristic points;

and the video splicing module is used for splicing the video data output by the high-resolution detail camera array and the low-resolution global camera array based on the effective characteristic points to obtain optimized video imaging data.

The feature point matching module comprises a scale-invariant feature transformation feature point matching module and an edge detection feature point matching module;

the edge detection feature point matching module comprises a Gaussian smoothing filter and a difference value detection module.

In the edge detection feature point matching module, firstly, a Gaussian smoothing filter is used for filtering two images to eliminate noise mixed in the image digitization, the interference of the noise on the edge detection is reduced, the error detection is prevented, and A is respectively obtained ₁ Gaussian smooth graph G of picture ₁ And A ₂ Gaussian smooth graph G of picture ₂ . And detecting the part with large contrast between the pixel and the periphery on the image by using a difference value method, so that other parts are changed into gray.

And the feature point combining module combines the feature points matched by the traditional scale invariant feature transformation algorithm and the feature points matched by the image after the edge detection to obtain the information of all the matched feature points.

The effective feature point screening module screens all the matched feature points, the number of the feature points can influence the calculation complexity of the algorithm, on the premise that the splicing effect is not influenced, the number of the feature points needs to be reduced as much as possible, effective feature points are reserved, useless feature points are deleted, and the calculation efficiency of the algorithm is improved. Especially, in the real-time processing process of hundred million pixel videos, the time of every 1 millisecond is very precious, and the extraction of effective feature points is particularly critical. Furthermore, the matched feature points may be matched incorrectly, and the incorrectly matched feature points also need to be deleted.

To implement the above method steps, in a third aspect of the present invention, an electronic device, such as an image processing terminal, is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the above steps are implemented.

In a fourth aspect of the invention, a data storage medium is provided, which may be, for example, a computer-readable storage medium, storing a computer program which, when executed by a processor, performs the above steps.

The method aims to improve the number of effective characteristic points matched with corresponding areas of a high-resolution detail video and a low-resolution global video on the basis of the existing algorithm, so that the robustness of the algorithm and the imaging effect of the high-resolution video are improved.

When the feature point matching is carried out between images with great resolution difference, the algorithm can obviously increase the number of effective feature points, improve the robustness of the video imaging algorithm, improve the image splicing quality and obviously optimize the video imaging effect.

Further advantages of the invention will be apparent in the detailed description section in conjunction with the drawings attached hereto.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a main flow chart of a video imaging optimization method based on a multi-scale array camera according to an embodiment of the present invention;

FIGS. 2-3 are diagrams of further preferred embodiment steps of portions of sub-steps of the method of FIG. 1;

FIG. 4 is a flow chart of steps of a more detailed embodiment of the method of FIG. 1;

FIG. 5 is a schematic diagram of an electronic device implementing the method of FIGS. 1-4;

fig. 6 is a schematic structural diagram of a main body of a video imaging optimization system based on a multi-scale array camera according to an embodiment of the present invention;

FIG. 7 is a functional block architecture diagram of the multi-scale array camera based video imaging optimization system of FIG. 6;

FIG. 8 (A) is a schematic diagram of feature points obtained based on a scale invariant feature transform algorithm;

FIG. 8 (B) is a schematic diagram of feature points obtained based on an edge detection feature point matching algorithm and merging and screening;

FIG. 9 (A) is a schematic diagram illustrating the effect of video splicing without the technical solution of the present invention;

fig. 9 (B) is a schematic diagram of the effect of video stitching by using the technical solution of the present invention.

Detailed Description

The invention is further described with reference to the following drawings and detailed description.

Referring to fig. 1, a main flowchart of a video imaging optimization method based on a multi-scale array camera according to an embodiment of the present invention is shown.

In fig. 1, the method includes steps S100 to S500, and each step is implemented as follows:

On the basis of fig. 1, see fig. 2-3.

In fig. 2, it is shown that the step S300 specifically includes:

s301: selecting detail camera high resolution video picture A ₁ And corresponding low resolution video pictures A in the global camera ₂ ；

In fig. 3, it is shown that the step S400 specifically includes:

s402: and screening out effective characteristic points by a comparison method.

More specifically, in step S100, the first-scale array camera video data is first high-resolution video data, and the second-scale array camera video data is second low-resolution video data;

Preferably, the 10 n-s 40 are constructed.

As another preferable mode, in step S100, the first-scale array camera video data is a detailed high-resolution video picture taken for a local area, and the second-scale array camera video data is a global low-resolution video picture taken for a global area.

Specifically, the multi-scale array camera comprises a wide-angle camera as a reference, namely the second-scale array camera is responsible for acquiring a low-resolution global reference video of a preset scene, and the first-scale array camera consisting of a plurality of long-focus cameras is responsible for shooting a high-resolution detail video of a specific area.

By way of further introduction, step S302 performs a first feature point matching based on a Scale-invariant feature transform (SIFT) algorithm.

However, with this step only, when matching is performed between two frames with large resolution difference, there are fewer feature points that can be matched, and there are fewer effective feature points, so that the final high-resolution video imaging effect is not ideal, and the robustness of the video imaging algorithm is poor.

Therefore, the present invention further includes step S303: and performing secondary feature point matching based on an edge detection feature point matching algorithm.

By the algorithm, the number of effective characteristic points matched with corresponding areas of the high-resolution detail video and the low-resolution global video can be increased on the basis of the existing algorithm, so that the robustness of the algorithm and the imaging effect of the high-resolution video are improved.

When the second matching of feature points is performed based on the edge detection feature point matching algorithm, firstly, a Gaussian smoothing filter is used for filtering two images to eliminate noises mixed during image digitization, reduce the interference of the noises on edge detection, prevent error detection and respectively obtain A ₁ Gaussian smooth graph G of picture ₁ And A ₂ Gaussian smooth graph G of picture ₂ . And detecting the part with large contrast between the pixels and the surrounding area on the image by using a difference value method, so that other parts are changed into gray.

E ₁ ＝A ₁ -G ₁

E ₂ ＝A ₂ -G ₂

Then, a threshold value T is specified, the threshold value directly influences the detection result of the image edge, if the threshold value T is too large, all images are considered as edges, if the threshold value T is too small, only a small amount of edge information can be detected, and therefore the suggested value range of T is 100-130. To E ₁ And E ₂ All values of e in _ij Comparing with T, changing the image into white if greater than T, and changing the image into black if less than T to obtain edge-detected image B ₁ And B ₂ . For image B ₁ And B ₂ And carrying out feature point matching.

Therefore, the step S303 specifically includes:

s3031: filtering the two images by using a Gaussian smoothing filter;

s3032: the difference method is used to detect the part of the image where the contrast between the pixel and the surrounding exceeds the set threshold, determine the edge information, and make the other parts except the edge information become grey.

Then, step S401 combines the feature points matched by the conventional scale invariant feature transform algorithm with the feature points matched by the image after edge detection, to obtain information of all matched feature points.

However, the number of feature points may affect the computational complexity of the algorithm. Therefore, as a further improvement, in the step S402, on the premise of ensuring that the splicing effect is not affected, the number of feature points is reduced as much as possible, valid feature points are retained, useless feature points are deleted, and the algorithm calculation efficiency is improved. Especially, in the real-time processing process of hundred million pixel videos, the time of every 1 millisecond is very precious, and the extraction of effective feature points is particularly critical. Furthermore, the matched feature points may be matched incorrectly, and the incorrectly matched feature points also need to be deleted.

Specifically, the valid feature points are screened by the alignment method, i.e., step S402.

By integrating the above steps, a step flow chart of a more detailed embodiment of the method illustrated in fig. 1 and illustrated in fig. 3 can be obtained, where the step S402 specifically includes:

s4021: connecting all the matched characteristic points, wherein the total number of the connecting lines is n;

s4022: sorting by length of wire X = [ X = [ [ X ] ₁ ，x ₂ ，…x _n ]Calculating the median of the length of the link

S4023: calculating a four-place value Q ₃ = (n + 1) × 0.75 and lower quartile Q ₁ ＝(n+1)×0.25；

S4024: according to Q ₃ And Q ₁ Calculating a normalized metric value R = (Q) ₃ -Q ₁ )×0.7413

S4025: calculating confidence Z of each connecting line length _i ：

Wherein: x is the number of _i The length of each link;

is the median of the length of the link; r is a standardized measurement value;

s4026: if | Z _i If | is less than or equal to 2, the corresponding characteristic point is retained, if | Z is less than or equal to 2 _i If > 2, the length x is stated _i And (4) outlier deleting the corresponding characteristic points.

The method steps described in fig. 1-4 may all be automated in the form of a computer program. Fig. 5 is a schematic structural diagram of an electronic device implementing the method described in fig. 1-4.

Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices.

In particular, the electronic device may be an image processing terminal comprising a processor and a memory.

As shown in fig. 5, the image processing terminal device 700 includes a calculation unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 can also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

A plurality of components in the image processing terminal device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, or the like; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the Internet, and blockchain networks.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

On the basis, referring to fig. 6, fig. 6 is a schematic structural diagram of a main body of a video imaging optimization system based on a multi-scale array camera according to an embodiment of the present invention.

Fig. 6 shows a video imaging optimization system based on a multi-scale array camera, where the system connects a high-resolution detail camera array and a low-resolution global camera array, and splices video data output by the high-resolution detail camera array and the low-resolution global camera array, so as to obtain optimized video imaging data.

In particular, see fig. 7. The system comprises:

In the specific function realization, the system firstly selects a detail camera high-resolution video picture A ₁ And corresponding low resolution video pictures A in the global camera ₂ And matching the characteristic points of the two pictures. And on the basis of matching the feature points by the traditional scale invariant feature transformation algorithm, performing feature point matching by using an edge detection feature point matching module.

In the edge detection feature point matching module, firstly, a Gaussian smoothing filter is used for filtering two images to eliminate noise mixed in the image digitization, the interference of the noise on the edge detection is reduced, the error detection is prevented, and A is respectively obtained ₁ Gaussian smooth graph G of picture ₁ And A ₂ Gaussian smooth graph G of picture ₂ . And detecting the part with large contrast between the pixels and the surrounding area on the image by using a difference value method, so that other parts are changed into gray.

E ₁ ＝A ₁ -G ₁

E ₂ ＝A ₂ -G ₂

And the characteristic point combining module combines the characteristic points matched by the traditional scale invariant characteristic transformation algorithm and the characteristic points matched by the image after the edge detection to obtain all matched characteristic point information.

The effective feature point screening module screens all the matched feature points, the number of the feature points can influence the calculation complexity of the algorithm, and on the premise that the splicing effect is not influenced, the number of the feature points needs to be reduced as much as possible, effective feature points are reserved, useless feature points are deleted, and the calculation efficiency of the algorithm is improved. Especially, in the real-time processing process of hundred million pixel videos, the time of every 1 millisecond is very precious, and the extraction of effective feature points is particularly critical. Furthermore, the matched feature points may be matched incorrectly, and the incorrectly matched feature points also need to be deleted.

And screening effective characteristic points by an alignment method. Connecting all matched characteristic points, wherein the total number of the connecting lines is n, and sorting X = [ X ] according to the length of the connecting lines ₁ ，x ₂ ，…x _n ]Calculating the median of the length of the link

Then calculating to obtain the upper quartile value Q ₃ = (n + 1) × 0.75 and lower quartile Q ₁ = (n + 1) × 0.25, according to Q ₃ And Q ₁ Calculating a normalized metric value R = (Q) ₃ -Q ₁ ) X 0.7413. Then, calculating confidence Z of each line length _i 。

Wherein: x is the number of _i For the length of each link

To the median of the line

R is a normalized metric

If | Z _i If | is less than or equal to 2, the corresponding characteristic point is retained, if | Z is less than or equal to 2 _i If > 2, the length x is stated _i And (4) outlier deleting the corresponding characteristic points.

fig. 8 (B) is a schematic diagram of feature points obtained based on an edge detection feature point matching algorithm and merging and screening.

Based on the comparison schematic diagrams of fig. 8 (a) and fig. 8 (B), it can be seen that when a Scale-invariant feature transform (SIFT) algorithm is used for matching between two pictures with large resolution difference, fewer feature points can be matched, and fewer effective feature points are available, so that the final high-resolution video imaging effect is unsatisfactory, and the video imaging algorithm has poor robustness.

FIG. 9 (A) is a schematic diagram of the effect of video splicing without the technical solution of the present invention; fig. 9 (B) is a schematic diagram of the effect of video stitching by using the technical solution of the present invention.

As can be seen from the comparison schematic diagrams of fig. 9 (a) and fig. 9 (B), when the technical solution of the present invention is not adopted, when two pictures with large resolution difference are matched, fewer feature points can be matched, and fewer effective feature points are available, so that the final high-resolution video imaging effect is not ideal, and the robustness of the video imaging algorithm is poor.

Through the improvement of the invention, the feature points matched by the traditional scale-invariant feature transformation algorithm and the feature points matched by the images after the edge detection are combined to obtain all matched feature point information, and the feature points which are matched in error are deleted, so that when the feature points are matched between the images with huge resolution difference, the algorithm can obviously increase the number of effective feature points, improve the robustness of the video imaging algorithm, improve the image splicing quality, and obviously optimize the video imaging effect.

The present invention is not limited to the specific module structure described in the prior art. The prior art mentioned in the background section can be used as part of the invention to understand the meaning of some technical features or parameters. The scope of the present invention is defined by the claims.

Claims

1. A video imaging optimization method based on a multi-scale array camera is characterized in that,

the method comprises the following steps:

s301: selecting a high-resolution video picture A1 of a detail camera and a corresponding low-resolution video picture A2 in a global camera;

s303: performing second feature point matching based on an edge detection feature point matching algorithm;

s3031: filtering the two images by using a Gaussian smoothing filter;

s3032: detecting a part of the image, of which the contrast between the pixels and the periphery exceeds a set threshold value, by using a difference value method, determining edge information, and changing other parts except the edge information into grey;

s402: screening out effective characteristic points by a comparison method;

s4022: sorting by the length of the connecting line X = [ X1, X2, … xn]Calculating the median of the line

S4024: according to Q ₃ And Q ₁ Calculating a normalized metric value R = (Q) ₃ -Q ₁ )×0.7413；

S4025: calculating confidence Z of each line length _i ：

Wherein:

the length of each link;

is the median of the connecting line; r is a standardized measurement value;

s4026: if | Z _i If | is less than or equal to 2, the corresponding characteristic point is retained, if | Z is less than or equal to 2 _i If > 2, the length x is stated _i Outlier, delete the correspondent characteristic point;

2. The method of claim 1, wherein the video image is optimized based on a multi-scale array camera,

3. The method for optimizing video imaging based on multi-scale array camera as claimed in claim 1 or 2,

in step S100, the first scale array camera video data is a detailed high-resolution video picture shot for a local area, and the second scale array camera video data is a global low-resolution video picture shot for a global area.

4. The method of claim 1, wherein the video image is optimized based on a multi-scale array camera,

the step S300 specifically includes:

s301: selecting a high-resolution video picture A1 of a detail camera and a corresponding low-resolution video picture A in a global camera ₂ ；

5. A video imaging optimization system implementing the video imaging optimization method of any one of claims 1-4, the system connecting a high resolution detail camera array and a low resolution global camera array, the system comprising:

6. The system for optimizing video imaging based on multi-scale array camera as claimed in claim 5, wherein the characteristic values are as follows:

7. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform a method for multi-scale array camera based video imaging optimization according to any one of claims 1-4.