CN109922331B

CN109922331B - Image processing method and device

Info

Publication number: CN109922331B
Application number: CN201910037310.3A
Authority: CN
Inventors: 杨萌; 戴付建; 赵烈烽
Original assignee: Zhejiang Sunny Optics Co Ltd
Current assignee: Zhejiang Sunny Optics Co Ltd
Priority date: 2019-01-15
Filing date: 2019-01-15
Publication date: 2021-12-07
Anticipated expiration: 2039-01-15
Also published as: CN109922331A

Abstract

The invention provides an image processing method and device, wherein the method comprises the following steps: shooting a scene through a first lens group, a second lens group and a third lens group respectively to obtain a first image, a second image and a third image; determining three depth maps by offsets of pixels in two of the first image, the second image, and the third image; combining one or more of the three depth maps with the received image information to generate virtual media information; the virtual media information is combined with one of the first image, the second image and the third image, so that the problem of how to finish depth information processing by using a mobile terminal carried by a user without adding extra equipment in the related technology can be solved, the virtual reality combination is finished through the mobile terminal under the condition of not adding extra equipment, and the user experience effect is improved.

Description

Image processing method and device

Technical Field

The present invention relates to the field of communications, and in particular, to an image processing method and apparatus.

Background

Augmented Reality (AR) is a technology that adds virtual media information including video, images, text, sound, and other computer-generated information to real-world visual information. An important application area of the technology is to help users experience experiences that are not likely to be touched in the current scene in physical distance or time, and to increase or improve the perception of the users for information in real-world scenes. However, AR technology may require specialized systems or hardware devices, such as head-mounted displays, smart glasses, computers with separate display cards, etc., which require a certain cost or use environment, virtually limiting the AR usage scenarios. Especially, the depth information processing process in the AR is the key point for realizing the fusion of the virtual scene and the real scene by the AR system or the device, and the important problem to be solved is still that the depth information processing can be completed by using the mobile terminal carried by the user without adding extra devices.

In order to solve the problem of how to complete deep information processing by using a mobile terminal carried by a user without adding extra equipment in the related art, no solution is provided.

Disclosure of Invention

The embodiment of the invention provides an image processing method and device, which at least solve the problem of how to finish depth information processing by using a mobile terminal carried by a user without adding extra equipment in the related art.

According to an embodiment of the present invention, there is provided an image processing method including:

shooting a scene through a first lens group, a second lens group and a third lens group respectively to obtain a first image, a second image and a third image;

determining a first depth map by the offset of the pixels in the first image and the second image, determining a second depth map by the offset of the pixels in the second image and the third image, and determining a third depth map by the offset of the pixels in the first image and the third image;

combining one or more of the first depth map, the second depth map, and the third depth map with the received image information to generate virtual media information;

combining the virtual media information with one of the first image, the second image, and the third image.

Optionally, combining one or more of the first depth map, the second depth map, and the third depth map with the received image information, and generating the virtual media information includes:

determining a depth map with the smallest error in the first depth map, the second depth map and the third depth map, obtaining depth information of environmental features in the depth map with the smallest error, and combining the depth information with received image information to generate virtual media information; or

Determining a depth map with the minimum error in the first depth map, the second depth map and the third depth map, supplementing a corresponding part with the maximum error in the depth map with the minimum error through a part with the minimum error in the depth maps except the depth map with the minimum error, obtaining depth information of environmental features in the supplemented depth map with the minimum error, and combining the depth information with the received image information to generate virtual media information;

determining a depth map with the highest definition in the first depth map, the second depth map and the third depth map, obtaining depth information of environmental features in the depth map with the highest definition, and combining the depth information with received image information to generate virtual media information; or

Determining the depth map with the highest definition in the first depth map, the second depth map and the third depth map, supplementing the corresponding part with the larger error in the depth map with the highest definition through the part with the smaller error in the depth maps except the depth map with the highest definition, obtaining the depth information of the environmental characteristics in the supplemented depth map with the highest definition, and combining the depth information with the received image information to generate virtual media information.

adjusting the size, the rotation direction and the movement direction of the received image information according to one or more depth maps of the first depth map, the second depth map and the third depth map;

creating a three-dimensional scene from one or more of the first, second and third depth maps;

and positioning the adjusted image information in the three-dimensional scene to generate the virtual media information.

Optionally, after the first image, the second image and the third image are obtained by shooting the scene through the first lens group, the second lens group and the third lens group respectively, the method further comprises:

adjusting brightness and contrast of the first image, the second image, and the third image.

Optionally, the offset is a difference between coordinates of the same pixel in the two images; or

The offset amount is a difference between coordinates of the same pixel in projected images of the two images, the projected images being obtained by converting the first image, the second image, and the third image respectively according to a correction matrix stored in advance.

Optionally, the first lens group, the second lens group and the third lens group are positioned on the same line, and the second lens group is positioned between the first lens group and the third lens group.

Optionally, a distance between the first lens group and the second lens group is smaller than a distance between the second lens group and the third lens group.

Optionally, the first lens group, the second lens group and the third lens group have the same angle of view; and/or

The first lens group, the second lens group and the third lens group image in an infrared band.

According to another embodiment of the present invention, there is also provided an image processing apparatus including:

the shooting module is used for shooting a scene through the first lens group, the second lens group and the third lens group respectively to obtain a first image, a second image and a third image;

the determining module is used for determining a first depth map according to the offset of pixels in the first image and the second image, determining a second depth map according to the offset of pixels in the second image and the third image, and determining a third depth map according to the offset of pixels in the first image and the third image;

a generating module, configured to combine one or more of the first depth map, the second depth map, and the third depth map with received image information to generate virtual media information;

a combining module to combine the virtual media information with one of the first image, the second image, and the third image.

Optionally, the generating module includes:

the first generating unit is configured to determine a depth map with the smallest error in the first depth map, the second depth map, and the third depth map, acquire depth information of an environmental feature in the depth map with the smallest error, and combine the depth information with the received image information to generate virtual media information;

a second generating unit, configured to determine a depth map with a minimum error in the first depth map, the second depth map, and the third depth map, supplement a corresponding portion with a larger error in the depth map with the minimum error by using a portion with a smaller error in the depth maps except the depth map with the minimum error, obtain depth information of an environmental characteristic in the supplemented depth map with the minimum error, and combine the depth information with received image information to generate virtual media information;

optionally, the generating module includes:

a third generating unit, configured to determine a depth map with the highest definition in the first depth map, the second depth map, and the third depth map, obtain depth information of an environmental feature in the depth map with the highest definition, and combine the depth information with received image information to generate virtual media information;

a fourth generating unit, configured to determine a depth map with the highest definition in the first depth map, the second depth map, and the third depth map, supplement a corresponding portion with a larger error in the depth map with the highest definition by using a portion with a smaller error in the depth maps except for the depth map with the highest definition, obtain depth information of an environmental characteristic in the supplemented depth map with the highest definition, and combine the depth information with received image information to generate virtual media information.

Optionally, the generating module includes:

an adjusting unit, configured to adjust a size, a rotation direction, and a movement direction of the received image information according to one or more depth maps of the first depth map, the second depth map, and the third depth map;

a building unit for building a three-dimensional scene from one or more of the first, second and third depth maps;

and the positioning generation unit is used for positioning the adjusted image information in the three-dimensional scene to generate the virtual media information.

Optionally, the apparatus further comprises:

and the adjusting module is used for adjusting the brightness and the contrast of the first image, the second image and the third image.

According to a further embodiment of the present invention, there is also provided a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.

According to yet another embodiment of the present invention, there is also provided an electronic device, including a memory in which a computer program is stored and a processor configured to execute the computer program to perform the steps in any of the above method embodiments.

According to the invention, a scene is shot through the first lens group, the second lens group and the third lens group respectively to obtain a first image, a second image and a third image; determining a first depth map by the offset of the pixels in the first image and the second image, determining a second depth map by the offset of the pixels in the second image and the third image, and determining a third depth map by the offset of the pixels in the first image and the third image; combining one or more of the first depth map, the second depth map, and the third depth map with the received image information to generate virtual media information; the virtual media information is combined with one of the first image, the second image and the third image, so that the problem of how to finish depth information processing by using a mobile terminal carried by a user without adding extra equipment in the related technology can be solved, the virtual reality combination is finished through the mobile terminal under the condition of not adding extra equipment, and the user experience effect is improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

fig. 1 is a block diagram of a hardware configuration of a mobile terminal of an image processing method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method of image processing according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of measuring a positional difference between images according to an embodiment of the present invention;

fig. 4 is a schematic view of a lens group according to an embodiment of the present invention;

fig. 5 is a block diagram of an image processing apparatus according to an embodiment of the present invention;

fig. 6 is a block diagram of an image processing apparatus according to a preferred embodiment of the present invention.

Detailed Description

The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

Example 1

The method provided by the first embodiment of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Taking a mobile terminal as an example, fig. 1 is a hardware structure block diagram of a mobile terminal of an image processing method according to an embodiment of the present invention, as shown in fig. 1, a mobile terminal 10 may include one or more processors 102 (only one is shown in fig. 1) (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), and a memory 104 for storing data, and optionally, the mobile terminal may further include a transmission device 106 for communication function and an input/output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration, and does not limit the structure of the mobile terminal. For example, the mobile terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

The memory 104 may be used to store a computer program, for example, a software program of application software and a module, such as a computer program corresponding to the message receiving method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the computer program stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the mobile terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the mobile terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC), which can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

In this embodiment, an image processing method operating in the mobile terminal or the network architecture is provided, and fig. 2 is a flowchart of an image processing method according to an embodiment of the present invention, as shown in fig. 2, the flowchart includes the following steps:

step S202, shooting a scene through a first lens group, a second lens group and a third lens group respectively to obtain a first image, a second image and a third image;

step S204, determining a first depth map according to the offset of the pixels in the first image and the second image, determining a second depth map according to the offset of the pixels in the second image and the third image, and determining a third depth map according to the offset of the pixels in the first image and the third image;

step S206, combining one or more of the first depth map, the second depth map and the third depth map with the received image information to generate virtual media information;

step S208, combining the virtual media information with one of the first image, the second image and the third image.

Through the steps S202 to S208, a first image, a second image and a third image are obtained by shooting a scene through the first lens group, the second lens group and the third lens group respectively; determining a first depth map by the offset of the pixels in the first image and the second image, determining a second depth map by the offset of the pixels in the second image and the third image, and determining a third depth map by the offset of the pixels in the first image and the third image; combining one or more of the first depth map, the second depth map, and the third depth map with the received image information to generate virtual media information; the virtual media information is combined with one of the first image, the second image and the third image, so that the problem of how to finish depth information processing by using a mobile terminal carried by a user without adding extra equipment in the related technology can be solved, the virtual reality combination is finished through the mobile terminal under the condition of not adding extra equipment, and the user experience effect is improved.

The embodiment of the invention acquires three depth maps according to the combination of any two of the three lens groups, judges the depth information (such as scene distance, position, edge, angle and the like) of the environmental characteristics according to all three depth maps, receives image information (such as an object and a person) from another user terminal in communication, and combines the image information and the depth information to generate a virtual object which can be closely combined with a real scene.

In an embodiment, the step S206 may specifically include:

determining a depth map with the smallest error in the first depth map, the second depth map and the third depth map, obtaining depth information of environmental features in the depth map with the smallest error, and combining the depth information with received image information to generate virtual media information;

in another embodiment, the step S206 may further include:

determining a depth map with the highest definition in the first depth map, the second depth map and the third depth map, obtaining depth information of environmental features in the depth map with the highest definition, and combining the depth information with received image information to generate virtual media information;

In the embodiment of the invention, weighted combination, averaging, the selection of the one with the smallest error or the selection of the clearest one can be adopted.

In another embodiment, the step S206 may further include: adjusting the size, the rotation direction and the movement direction of the received image information according to one or more depth maps of the first depth map, the second depth map and the third depth map; creating a three-dimensional scene from one or more of the first, second and third depth maps; and positioning the adjusted image information in the three-dimensional scene to generate the virtual media information.

The three-dimensional map is mapped according to the depth map, and a three-dimensional model of the received movie is displayed in the three-dimensional map, for example, an open portion where the shade is small is displayed in the three-dimensional map, and the movement can be performed according to the coordinates in the three-dimensional map.

In the embodiment of the present invention, for the case that the ambient light is too strong or too weak, the brightness and the contrast of the first image, the second image, and the third image may be adjusted.

In the embodiment of the invention, when no error occurs in the coaxiality and the coplanarity of the lens groups, the offset is the difference between the coordinates of the same pixel in the two images; or

When errors occur on the coaxiality and the coplanarity of the lens groups, a correction matrix is obtained through prestored lens errors, specifically, the offset is the difference of coordinates of the same pixel in projected images of two images, and the projected images are obtained by respectively transforming the first image, the second image and the third image according to the prestored correction matrix.

In an embodiment of the present invention, the first lens group, the second lens group and the third lens group are located on a same line, and the second lens group is located between the first lens group and the third lens group. The calculation range of the depth of field can be changed by changing the length of the base line, and the selection can be made among the three lens groups according to the relative distance. Further, a distance between the first lens group and the second lens group is smaller than a distance between the second lens group and the third lens group. The selection of the length of the base line as much as possible can be set, and the range of adjusting the depth of field is optimized.

In an embodiment of the present invention, the first lens group, the second lens group and the third lens group have the same field angle, and the field angle may be 60 degrees, 80 degrees, 100 degrees, etc. for better reducing the error; and/or the first lens group, the second lens group and the third lens group are imaged in an infrared band, which may be specifically 850-1050nm or the like or less in order to reduce errors.

In the embodiment of the invention, two lens groups are used for shooting the scenery in the multi-shooting lens group, and the depth of field of the scenery is judged by utilizing the position difference of the same scenery in different pictures. The scene is in ideal condition, the same scene in the pictures shot by the two lens groups is located on the same shooting horizontal line, in non-ideal condition, the image correction is carried out through pre-stored data to convert the image into ideal condition equivalent to that the lens groups are coaxial and coplanar, then matched objects are searched on the shooting horizontal line, and the searching can be carried out by using various known image matching methods and utilizing various color, brightness, energy and other characteristics in pixels, pixel matrixes or windows. After a matching object is searched, the positional difference L of the object on the two images of the i-th lens group and the j-th lens group is calculated_i-L_jThen the depth of field can be determined as

z＝EFL[1+(B_i/L_i-L_j)]，

Where i is 1,2,3 …, j is 1,2,3 … and is not equal to i, and EFL is the focal length. By analogy, a depth profile of the entire image is obtained, which can be used to insert the 3D object of the AR and to better match it to the environment.

Since the position difference has its limits, e.g. too small, it is prone to errors, and at minimum it is also unlikely to exceed the scale of one pixel, the detected depth of field has its upper limit, and is prone to errors when approaching the upper limit. Conversely, too large indicates that an improper baseline length is used, or that the object is too close. At this time, too large difference between the images may cause a determination error, and may increase the time for finding the matching features. Therefore, in a mobile phone with three-shot or more camera modules, the distance between any two of the lens groups can be optimized properly, so that the lens groups used in the depth detection can be changed to perform distance measurement with the most suitable combination. For example, in the present embodiment, the accuracy, resolution, and edge uniformity of the plurality of depth maps, such as three, may have different errors due to different internal parameter factors of the camera and external parameter factors such as mutual occlusion and illumination, and particularly, a hole may be generated in which a depth is not correctly calculated and a blank value is left. The depth map with the best quality and the smallest error can be selected from the multiple depth maps for use, the part with the smaller error in other depth maps can be used for making up the part with the larger error in a certain depth map, in particular for filling up the hole, and the depth maps can be combined in other modes.

The following examples illustrate the present invention in detail.

Fig. 3 is a schematic diagram of measuring a position difference between images according to an embodiment of the present invention, and as shown in fig. 3, when i is 1 and j is 2, a position difference L of the object is calculated on two images formed by the first lens group and the second lens group₁-L₂Then the depth of field can be determined as

z＝EFL[1+(B₁/L₁-L₂)]，

EFL is the focal length. By analogy, a depth profile of the entire image is obtained, which can be used to insert the 3D object of the AR and to better match it to the environment.

When 5 is 1 and j is 2, the position difference L of the object on the two images of the first lens group and the second lens group is calculated₅-L₂Then the depth of field can be determined as

z＝EFL[1+(B₅/L₅-L₂)]，

EFL is the focal length. By analogy, a depth profile of the entire image is obtained, which can be used to insert the 3D object of the AR and to better match it to the environment. The foregoing is merely an illustration of embodiments of the present invention and is not intended to limit the embodiments of the present invention.

Due to L₁-L₂With its limitations, e.g. too small, it is prone to errors and at a minimum it is also unlikely that the size of a pixel will be exceeded, so the detected depth of field has an upper limit and is prone to errors when approaching the upper limit. If the difference is too large, it may cause an error in determination and may increase the time for finding a matching feature, such as the use of improper B1 or B2, or the object is too close. Therefore, in a three-camera mobile phone, fig. 4 is a schematic diagram of lens groups according to an embodiment of the present invention, and as shown in fig. 4, the distance between two of the three lens groups can be optimized properly to change the two groups of lenses used in depth detection to perform ranging by using the most suitable combination. For example, the present scheme uses three lens groups simultaneously for imaging, whereby three depth maps will be obtained, including in addition to z 1:

z₂＝EFL[1+(B₂/L₂-L₃)]and

z₃＝EFL[1+(B₁+B₂/L₁-L₃)]。

the accuracy, resolution and edge regularity of the three depth maps have different errors due to different internal parameter factors of the camera and external parameter factors such as mutual shielding and illumination, and particularly, a cavity with a blank value left due to incorrect depth calculation may be generated. The depth map with the best quality and the smallest error can be selected from the three depth maps for use, the part with the smaller error in other depth maps can be used for making up the part with the larger error in a certain depth map, particularly for filling up the hole, and the depth maps can be combined in other modes.

After the depth information of the matched features is obtained, the image information received from another mobile terminal is combined with the depth information, and the generated virtual media information is combined on the real image, preferably, scenes in the interactive scene can be defined by the user himself or several types of specific scenes can be searched according to the pre-stored template. The image information can be either pictures or videos, and the received image information can be made into a virtual 3D image and the depth information in the actual scene can be applied to the 3D image. For example, it is also possible to convert the depth map into a 3D map and arrange a virtual 3D character in the 3D map, whereby the size, position, direction, etc. of the 3D character are determined. The 3D map can be converted by the following formula:

wherein X and Y are image coordinates, X and Y are 3D map coordinates, Z is depth, fx and fy are focal lengths, Cx and Cy are offset parameters, and the parameters of the matrix can be calibrated in advance. The 3D map may be further rotated, scaled and displaced to provide a desired AR playback environment.

The image information received from other terminals comprises a 3D model or a 2D image, and then the terminal generates the 3D model according to the 2D image. The 3D model is then positioned in the 3D map generated in the above step to better blend the virtual character with the environment in the application scene, e.g. remote video.

Example 2

In this embodiment, an image processing apparatus is further provided, and the apparatus is used to implement the foregoing embodiments and preferred embodiments, and the description of the apparatus is omitted for brevity. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 5 is a block diagram of an image processing apparatus according to an embodiment of the present invention, as shown in fig. 5, including:

a shooting module 52, configured to respectively shoot a scene through the first lens group, the second lens group, and the third lens group to obtain a first image, a second image, and a third image;

a determining module 54, configured to determine a first depth map according to an offset between pixels in the first image and the second image, determine a second depth map according to an offset between pixels in the second image and the third image, and determine a third depth map according to an offset between pixels in the first image and the third image;

a generating module 56, configured to combine one or more of the first depth map, the second depth map, and the third depth map with the received image information to generate virtual media information;

a combining module 58 for combining the virtual media information with one of the first image, the second image and the third image.

Optionally, the generating module 56 includes:

optionally, the generating module 56 includes:

a fourth generating unit, configured to determine a depth map with the highest definition in the first depth map, the second depth map, and the third depth map, supplement a corresponding portion with a larger error in the depth map with the highest definition by using a portion with a smaller error in the depth maps except the depth map with the highest definition, obtain depth information of an environmental characteristic in the supplemented depth map with the highest definition, and combine the depth information with received image information to generate virtual media information;

fig. 6 is a block diagram of an image processing apparatus according to a preferred embodiment of the present invention, and as shown in fig. 6, the generating module 56 includes:

an adjusting unit 62, configured to adjust a size, a rotation direction, and a moving direction of the received image information according to one or more depth maps of the first depth map, the second depth map, and the third depth map;

a building unit 64 for building a three-dimensional scene from one or more of the first, second and third depth maps;

and a positioning generating unit 66, configured to position the adjusted image information in the three-dimensional scene, and generate the virtual media information.

Optionally, the apparatus further comprises:

It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.

Example 3

Embodiments of the present invention also provide a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.

Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:

s11, shooting a scene through the first lens group, the second lens group and the third lens group respectively to obtain a first image, a second image and a third image;

s12, determining a first depth map according to the offset of the pixels in the first image and the second image, determining a second depth map according to the offset of the pixels in the second image and the third image, and determining a third depth map according to the offset of the pixels in the first image and the third image;

s13, combining one or more of the first depth map, the second depth map, and the third depth map with the received image information to generate virtual media information;

s14, combining the virtual media information with one of the first image, the second image, and the third image.

Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

Example 4

Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.

Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An image processing method, comprising:

determining a first depth map by the offset of the pixels in the first image and the second image, determining a second depth map by the offset of the pixels in the second image and the third image, and determining a third depth map by the offset of the pixels in the first image and the third image, wherein the offset is the difference between the coordinates of the same pixels in the two images; or the offset is a difference between coordinates of the same pixel in the projected images of the two images, the projected images are obtained by transforming the first image, the second image and the third image according to a pre-stored correction matrix, respectively, wherein the first lens group, the second lens group and the third lens group are located on the same straight line, the second lens group is located between the first lens group and the third lens group, and the distance between the first lens group and the second lens group is smaller than the distance between the second lens group and the third lens group;

determining depth information of a characteristic environment according to one or more of the first depth map, the second depth map and the third depth map, and combining the depth information with the received image information to generate virtual media information;

2. The method of claim 1, wherein combining one or more of the first depth map, the second depth map, and the third depth map with the received image information to generate virtual media information comprises:

Determining a depth map with the minimum error in the first depth map, the second depth map and the third depth map, supplementing a corresponding part with the maximum error in the depth map with the minimum error through a part with the minimum error in the depth maps except the depth map with the minimum error, obtaining depth information of environmental features in the supplemented depth map with the minimum error, and combining the depth information with the received image information to generate virtual media information.

3. The method of claim 1, wherein combining one or more of the first depth map, the second depth map, and the third depth map with the received image information to generate virtual media information comprises:

4. The method of claim 1, wherein combining one or more of the first depth map, the second depth map, and the third depth map with the received image information to generate virtual media information comprises:

5. The method of any one of claims 1 to 4, wherein after capturing the first image, the second image, and the third image of the scene with the first lens group, the second lens group, and the third lens group, respectively, the method further comprises:

6. The method according to any one of claims 1 to 4,

the first lens group, the second lens group and the third lens group have the same angle of view; and/or

7. An image processing apparatus characterized by comprising:

the determining module is used for determining a first depth map according to the offset of pixels in the first image and the second image, determining a second depth map according to the offset of pixels in the second image and the third image, and determining a third depth map according to the offset of pixels in the first image and the third image, wherein the offset is the difference between the coordinates of the same pixels in the two images; or the offset is a difference between coordinates of the same pixel in the projected images of the two images, the projected images are obtained by transforming the first image, the second image and the third image according to a pre-stored correction matrix, respectively, wherein the first lens group, the second lens group and the third lens group are located on the same straight line, the second lens group is located between the first lens group and the third lens group, and the distance between the first lens group and the second lens group is smaller than the distance between the second lens group and the third lens group;

a generating module, configured to determine depth information of a characteristic environment according to one or more of the first depth map, the second depth map, and the third depth map, and combine the depth information with received image information to generate virtual media information;

8. The apparatus of claim 7, wherein the generating module comprises:

a second generating unit, configured to determine a depth map with a minimum error in the first depth map, the second depth map, and the third depth map, supplement a corresponding portion with a larger error in the depth map with the minimum error by using a portion with a smaller error in the depth maps except for the depth map with the minimum error, obtain depth information of an environmental characteristic in the supplemented depth map with the minimum error, and combine the depth information with received image information to generate virtual media information.

9. The apparatus of claim 7, wherein the generating module comprises:

10. The apparatus of claim 7, wherein the generating module comprises:

11. The apparatus of any one of claims 7 to 10, further comprising:

12. The apparatus according to any one of claims 7 to 10,

13. A storage medium, in which a computer program is stored, wherein the computer program is arranged to perform the method of any of claims 1 to 6 when executed.

14. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the computer program to perform the method of any of claims 1 to 6.