CN112927271B

CN112927271B - Image processing method, image processing device, storage medium and electronic apparatus

Info

Publication number: CN112927271B
Application number: CN202110350503.1A
Authority: CN
Inventors: 宫振飞
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2021-03-31
Filing date: 2021-03-31
Publication date: 2024-04-05
Anticipated expiration: 2041-03-31
Also published as: CN112927271A

Abstract

The disclosure provides an image processing method, an image processing device, a computer readable storage medium and electronic equipment, and relates to the technical field of image processing. The image processing method comprises the following steps: acquiring at least two original images, wherein the at least two original images are images acquired by cameras in different poses for a target scene; generating a three-dimensional point cloud of the target scene according to the at least two original images; and rendering the three-dimensional point cloud based on the sampling pose in the preset motion path of the camera to obtain a target image. The method and the device can generate the new target image according to at least two original images in a simple and convenient mode.

Description

Image processing method, image processing device, storage medium and electronic apparatus

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an image processing method, an image processing apparatus, a computer readable storage medium, and an electronic device.

Background

With the development of computer technology and multimedia technology, many image capturing devices and applications have emerged, thereby generating a large amount of image data. In order to meet the personalized demands of people in various application scenes, such as making expression packages or making special effect short videos, images need to be processed. In such scenes where images are processed, a user is often required to capture or collect a large number of continuous frame images related to a target scene or a captured object through a camera, and generate corresponding video data based on the synthesis of the continuous frame images. However, in this manner, the number of acquired images is required to be high, and processing is performed based on a large number of continuous frame images, which also increases the processing pressure of the terminal device on the images, and the processing process is complex and not simple enough.

Disclosure of Invention

The present disclosure provides an image processing method, an image processing apparatus, a computer-readable storage medium, and an electronic device, thereby improving the problem of complexity of the image processing method in the prior art at least to some extent.

Other features and advantages of the present disclosure will be apparent from the following detailed description, or may be learned in part by the practice of the disclosure.

According to a first aspect of the present disclosure, there is provided an image processing method including: acquiring at least two original images, wherein the at least two original images are images acquired by cameras in different poses for a target scene; generating a three-dimensional point cloud of the target scene according to the at least two original images; and rendering the three-dimensional point cloud based on the sampling pose in the preset motion path of the camera to obtain a target image.

According to a second aspect of the present disclosure, there is provided an image processing apparatus including: the original image acquisition module is used for acquiring at least two original images, wherein the at least two original images are images acquired by cameras in different poses for a target scene; the three-dimensional point cloud establishing module is used for generating a three-dimensional point cloud of the target scene according to the at least two original images; and the target image acquisition module is used for rendering the three-dimensional point cloud based on the sampling pose in the preset motion path of the camera to obtain a target image.

According to a third aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image processing method of the first aspect described above and possible implementations thereof.

According to a fourth aspect of the present disclosure, there is provided an electronic device comprising: a processor; and the memory is used for storing executable instructions of the processor. Wherein the processor is configured to perform the image processing method of the first aspect and possible implementations thereof via execution of the executable instructions.

The technical scheme of the present disclosure has the following beneficial effects:

acquiring at least two original images, wherein the at least two original images are images acquired by cameras in different poses for a target scene; generating a three-dimensional point cloud of the target scene according to at least two original images; and rendering the three-dimensional point cloud based on the sampling pose in the preset motion path of the camera to obtain a target image. On the one hand, the present exemplary embodiment proposes a new image processing method, which can construct a three-dimensional point cloud of a target scene by collecting fewer original images, and generate a new target image based on the three-dimensional point cloud, wherein the collection of original image data is easier, the number is smaller, and a new image can be generated from a small number of existing images in a simple and convenient manner, so that the new image can be applied to other application scenes, and the availability and the interestingness of the original image are increased; on the other hand, the flow of the embodiment is simpler, the calculation complexity is lower, and the method and the device can be applied to various types of terminal equipment; on the other hand, the target image and the original image are in the preset motion path of the same camera, and the video of the camera for carrying out the mirror in the target scene can be simply and rapidly generated based on the target image, so that the method has good application in the generation scene of the expression package or the special effect video.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort.

Fig. 1 shows a schematic diagram of a system architecture in the present exemplary embodiment;

fig. 2 shows a structural diagram of an electronic device in the present exemplary embodiment;

fig. 3 shows a flowchart of an image processing method in the present exemplary embodiment;

fig. 4 shows a sub-flowchart of an image processing method in the present exemplary embodiment;

fig. 5 shows a flowchart of generating a three-dimensional point cloud of a target scene in the present exemplary embodiment;

fig. 6 shows a sub-flowchart of another image processing method in the present exemplary embodiment;

fig. 7 shows a schematic diagram of a camera performing virtual mirror in the present exemplary embodiment;

FIG. 8 is a flowchart showing a hole area repair in the present exemplary embodiment;

fig. 9 shows a schematic diagram of another camera performing virtual mirror in the present exemplary embodiment;

fig. 10 shows a configuration diagram of an image processing apparatus in the present exemplary embodiment.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the present disclosure. However, those skilled in the art will recognize that the aspects of the present disclosure may be practiced with one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.

In view of one or more of the problems described above, exemplary embodiments of the present disclosure provide an image processing method. Fig. 1 shows a system architecture diagram of an operating environment of the present exemplary embodiment. As shown in fig. 1, the system architecture 100 may include a server 110 and a terminal 120, where communication interaction is formed between the two through a network, for example, the terminal 120 sends an acquired original image to the server 110, and the server 110 returns the obtained target image to the terminal 120. The server 110 refers to a background server that provides internet services; the terminal 120 may include, but is not limited to, a smart phone, a tablet computer, a gaming machine, a wearable device, etc.

It should be understood that the number of devices in fig. 1 is merely exemplary. Any number of clients may be set, or the server may be a cluster formed by a plurality of servers, according to implementation requirements.

The image processing method provided in the embodiments of the present disclosure may be executed by the server 110, for example, after the server 110 acquires the original image, the server processes the original image to obtain the target image and returns the target image to the terminal 120; or may be performed by the terminal 120, for example, by directly capturing an original image by the terminal 120, performing processing to obtain a target image, etc., which is not limited in this disclosure.

The exemplary embodiments of the present disclosure provide an electronic device for implementing an image processing method, which may be the server 110 or the terminal 120 in fig. 1. The electronic device comprises at least a processor and a memory for storing executable instructions of the processor, the processor being configured to perform the image processing method via execution of the executable instructions.

The configuration of the above-described electronic device will be exemplarily described below taking the mobile terminal 200 in fig. 2 as an example. It will be appreciated by those skilled in the art that the configuration of fig. 2 can also be applied to stationary type devices in addition to components specifically for mobile purposes.

As shown in fig. 2, the mobile terminal 200 may specifically include: processor 210, internal memory 221, external memory interface 222, USB (Universal Serial Bus ) interface 230, charge management module 240, power management module 241, battery 242, antenna 1, antenna 2, mobile communication module 250, wireless communication module 260, audio module 270, speaker 271, receiver 272, microphone 273, headset interface 274, sensor module 280, display screen 290, camera module 291, indicator 292, motor 293, keys 294, and SIM (Subscriber Identification Module, subscriber identity module) card interface 295, and the like.

Processor 210 may include one or more processing units such as, for example: the processor 210 may include an AP (Application Processor ), modem processor, GPU (Graphics Processing Unit, graphics processor), ISP (Image Signal Processor ), controller, encoder, decoder, DSP (Digital Signal Processor ), baseband processor and/or NPU (Neural-Network Processing Unit, neural network processor), and the like. An encoder may encode (i.e., compress) image or video data; the decoder may decode (i.e., decompress) the code stream data of the image or video to restore the image or video data.

In some embodiments, processor 210 may include one or more interfaces through which connections are made with other components of mobile terminal 200.

The internal memory 221 may be used to store computer executable program code that includes instructions. The internal memory 221 may include a volatile memory, a nonvolatile memory, and the like. The processor 210 performs various functional applications of the mobile terminal 200 and data processing by executing instructions stored in the internal memory 221 and/or instructions stored in a memory provided in the processor.

The external memory interface 222 may be used to connect an external memory, such as a Micro SD card, to enable expansion of the memory capabilities of the mobile terminal 200. The external memory communicates with the processor 210 through the external memory interface 222 to implement data storage functions, such as storing files of music, video, etc.

The USB interface 230 is an interface conforming to the USB standard specification, and may be used to connect a charger to charge the mobile terminal 200, or may be connected to a headset or other electronic device.

The charge management module 240 is configured to receive a charge input from a charger. The charging management module 240 may also supply power to the device through the power management module 241 while charging the battery 242; the power management module 241 may also monitor the status of the battery.

The wireless communication function of the mobile terminal 200 may be implemented by the antenna 1, the antenna 2, the mobile communication module 250, the wireless communication module 260, a modem processor, a baseband processor, and the like. The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. The mobile communication module 250 may provide a solution including 2G/3G/4G/5G wireless communication applied on the mobile terminal 200. The wireless communication module 260 may provide wireless communication solutions including WLAN (Wireless Local Area Networks, wireless local area network) (e.g., wi-Fi (Wireless Fidelity, wireless fidelity) network), BT (Bluetooth), GNSS (Global Navigation Satellite System ), FM (Frequency Modulation, frequency modulation), NFC (Near Field Communication, short range wireless communication technology), IR (Infrared technology), etc. applied on the mobile terminal 200.

The mobile terminal 200 may implement a display function through a GPU, a display screen 290, an AP, and the like, and display a user interface. The mobile terminal 200 may implement a photographing function through an ISP, an image capturing module 291, an encoder, a decoder, a GPU, a display screen 290, an AP, etc., and may implement an audio function through an audio module 270, a speaker 271, a receiver 272, a microphone 273, an earphone interface 274, an AP, etc.

The sensor module 280 may include a depth sensor 2801, a pressure sensor 2802, a gyroscope sensor 2803, a barometric pressure sensor 2804, etc. to implement different sensing functions.

The indicator 292 may be an indicator light, which may be used to indicate a state of charge, a change in power, a message indicating a missed call, a notification, etc. The motor 293 may generate vibration cues, may also be used for touch vibration feedback, or the like. The keys 294 include a power on key, a volume key, etc.

The mobile terminal 200 may support one or more SIM card interfaces 295 for interfacing with a SIM card to enable telephony and data communications, among other functions.

Fig. 3 shows an exemplary flow of an image processing method, which may be performed by the server 110 or the terminal 120 described above, including the following steps S310 to S330:

step S310, at least two original images are acquired, wherein the at least two original images are images acquired by cameras in different poses for a target scene.

The target scene is a scene for image acquisition, and may include a plurality of different types of objects and backgrounds, where different objects may have the same or different backgrounds, for example, buildings, characters, animals, plants, or vehicles may all be used as objects in the target scene; pure color curtains, buildings, sky, vegetation, etc. can also be used as a background. The at least two original images are images acquired through cameras with different poses under the same target scene, wherein the cameras can be cameras or camera modules and the like configured in the terminal equipment, for example, the at least two cameras can be configured at different positions of the same horizontal line or vertical line in the terminal equipment, and the at least two original images acquired in the same target scene; or in the target scene, adopting terminal equipment provided with a monocular camera, and acquiring images while fine-adjusting the moving position so as to acquire at least two original images and the like. In particular, the at least two original images may be two original images, for example, a binocular camera is configured in the terminal device, and the main image and the auxiliary image are directly acquired as the original images through the binocular camera in the target scene.

The present exemplary embodiment may obtain at least two original images by configuring a corresponding camera in the terminal device, for example, configuring a binocular camera; the original image may also be obtained from a preset image source, for example, a main image and a sub-image captured by a binocular camera history may be obtained from an album, which is not particularly limited in this disclosure.

Step S320, generating a three-dimensional point cloud of the target scene according to at least two original images.

The three-dimensional point cloud is a basic three-dimensional model capable of expressing three-dimensional space information, and compared with a planar two-dimensional graph, the three-dimensional point cloud has depth direction information, so that any object in a target scene and a scene background can be decoupled in the depth direction. The present exemplary embodiment may determine depth information of a target scene from at least two original images, and further determine a three-dimensional point cloud of the target scene from the depth information.

In an exemplary embodiment, as shown in fig. 4, the step S320 may include the steps of:

step S410, determining the relative pose between two original images, and correcting the two original images by using the relative pose;

step S420, determining the depth value of the pixel point according to the parallax of the pixel point with the corresponding relation between the two original images;

Step S430, generating a three-dimensional point cloud of the target scene based on the original image and the depth value.

In particular, the present exemplary embodiment may generate a three-dimensional point cloud of a target scene by performing stereo matching processing on two original images and determining a depth value according to the parallax of the image feature points in the two images. Taking a main image and a secondary image collected by a binocular camera as an example for explanation, the specific process can include: firstly, calibrating data such as internal parameters and external parameters of a camera, wherein the internal parameters of the camera reflect a projection relation between a camera coordinate system and an image coordinate system, the external parameters of the camera reflect a relation of a rotation matrix R and a translation vector T between the camera coordinate system and a world coordinate system, and the position conversion relation between one camera coordinate system and the other camera coordinate system can be determined according to the external parameters and the internal parameters of the camera; secondly, correcting the relative pose of the main image and the auxiliary image, which can comprise distortion correction, stereo correction and the like, wherein the distortion correction can be directly carried out through a distortion coefficient, the stereo correction needs to determine parameters of a camera, such as a rotation matrix R and a translation vector T, and further carries out polar line correction or parallel correction and the like of a stereo image pair on the main image and the auxiliary image; further, according to the pixel points with the corresponding relation between the main image and the auxiliary image, namely the image feature points, calculating the parallax through a specific algorithm, such as an SGBM (Semi-Global-block matching) algorithm or a BM (block matching) algorithm, then determining a conversion formula of the parallax and the depth value according to the geometric relation of parallel binocular vision, and converting the parallax into the depth value of the corresponding pixel point based on the conversion formula; and finally, generating a three-dimensional point cloud of the target scene according to the main image or the auxiliary image and the depth value of the corresponding pixel point.

In an exemplary embodiment, after determining the depth value of the pixel point, the image processing method may further include:

and performing scene segmentation on at least one original image in the two original images, and performing optimization processing on the depth value according to a scene segmentation result.

In order to improve the accuracy of the depth value of each pixel point and avoid abnormal situations such as deformation or blurring of an object in a target scene in the target image due to inaccurate depth estimation, the present exemplary embodiment may segment at least one original image of two original images, and optimize the depth value according to a scene segmentation result. The scene segmentation of the original image may refer to segmentation of a foreground object and a background in the original image; the method can also be that semantic segmentation is carried out on each object in the target scene, the image is understood from the pixel level, pixels belonging to the same category are classified into one category, which objects are included in the target scene and which positions are located by each object are determined, for example, pixels belonging to people can be classified into a first category, pixels belonging to trees are classified into a second category, pixels belonging to vehicles are classified into a third category and the like, and therefore objects of different categories in the image are identified. The scene segmentation is performed on the original image, which is essentially to segment the region where each object contained in the original image is located, so as to identify the category where the object is located. When the original image is subjected to semantic segmentation, a semantic tag, such as a semantic tag of an object such as a road, sky, human, or cat and dog, can be assigned to each pixel point in the image, and the semantic tag can be regarded as category information of one or more objects.

The present exemplary embodiment may perform scene segmentation on an original image by pre-training with a semantic segmentation model, and may use a lightweight semantic segmentation model, which may include an encoder and a decoder, in order to enable the present exemplary embodiment to have a wider application range, enabling it to be applied to a portable mobile terminal; the encoder is used for downsampling the original image to obtain intermediate characteristic data corresponding to the original image, and the decoder is used for upsampling the intermediate characteristic data to obtain category information of each object in the original image. The encoder and decoder may be of symmetrical or asymmetrical construction. In this exemplary embodiment, the encoder may downsamples the input original image by using a convolutional neural network through a convolutional pooling operation to extract features from the perspective of image semantics and perform feature learning, and the decoder may gradually recover detailed features of the image through deconvolution and other operations, further learn the features on different scales, and finally output scene classification results with the same resolution as the original image.

In addition, in order to improve the segmentation and recognition capability of the semantic segmentation model on the infrared image, an attention layer can be added in the decoder, so that the obtained output result has higher accuracy, and the degree of distinction between similar images and the generalization capability of the model are improved.

Further, the depth value may be optimized according to the scene segmentation result, and specifically, the depth value may be optimized from the following two aspects.

First, according to the scene segmentation result, optimization operations such as smoothing processing are performed on the depth values of the pixel points where the objects included in the target scene are located, so as to avoid deformation of the subsequent objects.

Smoothing the depth values may reduce image noise, and the present exemplary embodiment may perform smoothing on the depth values of the pixels in various manners, such as Box (Box) template denoising smoothing (i.e., homogenizing), gaussian template denoising smoothing or median filtering denoising smoothing, etc., which is not particularly limited in this disclosure.

Secondly, sharpening is carried out on the depth values of the pixel points where the objects included in the target scene are located according to the scene segmentation result, so that the edges of the objects in the depth image are more obvious, and the depths of the objects are more clear and visible.

Wherein the sharpening process is to reduce blurring in the image by enhancing high frequency components, which can enhance depth edges of individual objects in the image. The present exemplary embodiment can sharpen the depth value of a pixel point by bilateral filtering.

It should be noted that, in the present exemplary embodiment, any one of the above-described optimization processes may be performed, or a combination of the above-described two optimization processes may be performed, for example, only the depth value may be subjected to smoothing or sharpening, or the depth value may be subjected to smoothing and then sharpening, or the like, which is not particularly limited in this disclosure.

Fig. 5 shows a flowchart of generating a three-dimensional point cloud of a target scene in the present exemplary embodiment, which may specifically include the following steps:

step S510, acquiring two original images acquired by a binocular camera;

step S520, determining initial depth information of a target scene according to the original image;

step S530, performing scene segmentation on the original image to obtain a scene segmentation result;

step S540, optimizing the initial depth information according to the scene segmentation result to obtain optimized depth information;

step S550, generating a three-dimensional point cloud of the target scene according to the original image and the optimized depth information.

The two original images are a main image and a secondary image acquired by the binocular camera respectively. In addition, when the original image is subjected to scene segmentation, either one of the main image or the sub image may be selected for scene segmentation. The present exemplary embodiment can establish a three-dimensional point cloud of a target scene based on a main image and a sub-image acquired by a binocular camera.

Step S330, rendering the three-dimensional point cloud based on the sampling pose in the preset motion path of the camera to obtain a target image.

The preset motion path of the camera is not a motion path generated by a real moving camera, but is a virtual lens motion path of the camera, after an original image related to a target scene is acquired, the exemplary embodiment may determine a position of one or more objects, such as a portrait, in the target scene, where the preset motion path may be a motion path generated when the camera is assumed to be in a preset direction or a lens is moved in a preset manner, specifically, the preset motion path may be a motion path generated when the camera position for acquiring the original image is taken as a reference, and a camera pose in the virtual lens motion process is a sampling pose. The three-dimensional point cloud can be projected and rendered under the different sampling positions to obtain target images under the different sampling positions. The target images can be one or a plurality of target images, and the specific number of the target images can be determined in a self-defined way according to the needs, for example, 10 target images are preset and generated; the adaptation may also be performed according to an application scenario, for example, in a scenario where a video is generated according to a target image, the number of target images may be determined according to a duration requirement of the video, which is not specifically limited in the present disclosure.

In an exemplary embodiment, the image processing method may further include:

generating a target video based on the target image; or alternatively

A target video is generated based on the original image and the target image.

In practical application, after the target image is determined, the target video can be generated in two ways, in the first way, the target video can be generated according to the target image, specifically, the generated multiple target images can be arranged according to a certain sequence, the target video is generated, and the generated target video does not contain the original image; second, the original images and the target images may be arranged in a certain order to determine the target video through the original images and the target images together, wherein one original image or all original images of at least one original image may be adopted, which is not particularly limited in this disclosure. According to a preset motion path of the camera, the generated target video can be a video which moves towards a preset motion direction around any one or more objects in the target scene.

In this exemplary embodiment, the number of target images may be determined according to the duration of the target video, for example, when 5 seconds of 30fps (transmission frame number per second) of the target video is required, if the preset motion path is 150mm (millimeters) from left to right, the camera may include 150 gradually increasing translation positions, such as 1mm, 2mm, …, and 150mm, during the mirror motion, and then the three-dimensional point cloud may be rendered in step S330, so that 150 target images may be obtained, and further a final target video may be obtained. According to the method and the device, the 3D (3-division) target video can be generated through at least two original images, the generation process is simple, the method and the device can be applied to application scenes such as the production of expression packages or special effect production, and the interestingness and the application range of the original images can be improved.

In summary, in this exemplary embodiment, at least two original images are acquired, where the at least two original images are images acquired by cameras in different poses for a target scene; generating a three-dimensional point cloud of the target scene according to at least two original images; and rendering the three-dimensional point cloud based on the sampling pose in the preset motion path of the camera to obtain a target image. On the one hand, the present exemplary embodiment proposes a new image processing method, which can construct a three-dimensional point cloud of a target scene by collecting fewer original images, and generate a new target image based on the three-dimensional point cloud, wherein the collection of original image data is easier, the number is smaller, and a new image can be generated from a small number of existing images in a simple and convenient manner, so that the new image can be applied to other application scenes, and the availability and the interestingness of the original image are increased; on the other hand, the flow of the embodiment is simpler, the calculation complexity is lower, and the method and the device can be applied to various types of terminal equipment; on the other hand, the target image and the original image are in the preset motion path of the same camera, and the video of the camera for carrying out the mirror in the target scene can be simply and rapidly generated based on the target image, so that the method has good application in the generation scene of the expression package or the special effect video.

In an exemplary embodiment, as shown in fig. 6, the step S430 may include the steps of:

step S610, repairing the cavity area of the target scene by using the original image and the depth value;

step S620, based on the repaired hole area, generating a three-dimensional point cloud of the target scene containing the point cloud data of the hole area according to the original image and the depth value.

In practical applications, many objects are included in the target scene, and the objects in the target scene may block a part of the background image, for example, when a portrait stands in front of a building, a part of the building may be blocked by the portrait, and when the camera moves in a certain direction, the blocked image will be exposed. As shown in fig. 7, a schematic diagram of a camera performing virtual mirror rightward is simulated, a triangle represents an object T in a target scene, a left diagram of fig. 7 is a schematic diagram of a relationship between the object T and a background before the object T does not move, and a right diagram of fig. 7 is a schematic diagram of a relationship between the object T and the background when the camera performing virtual mirror rightward, wherein a gray area represents a cavity area generated when the camera performing virtual mirror.

For example, in a binocular camera, in general, during the movement of the camera, the object at a close distance moves to a greater extent than the object at a far distance, for example, the image at a closer distance from the camera moves to a greater extent than the background at a far distance, and the two are inversely related, the relationship can be expressed by the following formula:

wherein s is the base line of the binocular camera, and f is the camera focusDistance, Y ₁ For the coordinates of a point of the image in the base line direction of the main image (assuming that the binocular camera is horizontally placed, the original image acquired by the left camera is the main image), Y ₂ For this purpose, the coordinates of the point in the direction of the base line of the secondary image, Y ₂ -Y ₁ From this formula, the mathematical relationship of the object depth value in the target scene inversely related to the distance moved can be determined for the distance that the object with depth value d needs to be moved. Therefore, the cavity area in the target scene is related to the distance between the object and the camera and the preset motion path of the camera.

In an exemplary embodiment, the image processing method may further include the steps of:

and determining a cavity area according to the preset motion path and the depth value.

Considering that the motion degrees of objects at different depths in the target scene are not the same, for example, the motion degree of objects closer to the target scene is larger, the present exemplary embodiment may determine the cavity area according to the preset motion path of the camera, that is, the virtual mirror path, and the depth value of the object. The position and the area of the cavity area in the target scene can be roughly determined according to the preset motion path; according to the depth value of the object, the size of the cavity area can be determined, the closer the object is, the larger the cavity area is generated, and the farther the object is, the smaller the cavity area is generated.

After determining the hole area, the hole area may be repaired to determine the image information of the hole area, and in particular, in an exemplary embodiment, as shown in fig. 8, the step S720 may include the following steps:

step S810, performing color restoration on the empty hole area by using the original image;

in step S820, the depth value is used to complement the depth value of the hollow region.

In the present exemplary embodiment, the restoration of the hole area may include two aspects including color restoration and depth value restoration, wherein the color restoration may be implemented based on color data of the original image; the restoration of the depth value may be completed by the depth value of the pixel determined in step S420. Specifically, the present exemplary embodiment may employ various image restoration algorithms, for example, a criinisi (image restoration algorithm) algorithm, and fill textures of the hole area, that is, color values and depth values, according to a change state of a background area of the camera in a virtual mirror operation process.

In practical application, the camera carries out mirror transportation in different preset motion paths or carries out mirror transportation in different amplitudes or degrees, so that the generated cavity areas have differences, and a repairing strategy of the cavity areas can be determined according to different cavity areas in a targeted manner, and the image processing efficiency is improved.

determining depth differences between different objects in a target scene according to a scene segmentation result and a depth value obtained by performing scene segmentation on at least one of the two original images;

and when the depth difference is determined to meet the restoration condition, restoration of the cavity area of the target scene by using the original image and the depth value is executed.

In order to obtain better image effect and simultaneously consider timeliness of algorithm processing, a repairing strategy of the cavity area needs pertinence and flexibility. According to the method and the device for determining the depth value of the object in the target scene, the difference of the depth values between the object in the target scene and the corresponding background can be determined according to the scene segmentation result and the depth value of the pixel point, and further, a corresponding hole area restoration strategy is determined according to the depth difference. The restoration condition is a determination condition of whether to execute the image restoration process described in the step S620 or the steps S810 to S820 on the hole area, and may be a depth difference threshold. When the depth difference exceeds a certain threshold value condition, the object is larger than the background, a more perfect repairing process is needed, and the process of repairing the cavity area of the target scene by using the original image and the depth value is triggered. When the depth difference does not exceed the threshold condition, the depth difference between the object and the background is smaller, or the displacement difference between the object and the background is not large, the scope of the hole area is smaller, and the like, so that the hole area has little influence on the generation of the current three-dimensional point cloud, the repair process of the hole area can be omitted, or other simple and convenient repair modes, such as a direct interpolation mode, are adopted to determine the color value and the depth value of the hole area, and the hole area is smaller, so that the interpolation is not obvious.

And finally, generating a three-dimensional point cloud of the target scene containing the point cloud data of the hole area according to the original image and the depth value based on the repaired hole area. The three-dimensional point cloud may include data of all points of the original image and the hole area, so that the hole area will not appear when the camera performs virtual mirror operation on a preset motion path, as shown in fig. 9, taking the preset motion path from left to right as an example, the camera performs virtual mirror operation, in fact, the direction of the object T relative to the virtual mirror operation of the camera is opposite, that is, the object T moves from right to left, and 3 quadrilaterals with different gray scales represent the change of the position of the object T in the virtual mirror operation process of the camera. Based on the restoration process of the cavity area, the background area corresponding to the quadrangle is restored, so that the generated target image can be complete and unbroken, and the complete and smooth target video generated according to the target image is further ensured.

Exemplary embodiments of the present disclosure also provide an image processing apparatus. As shown in fig. 10, the image processing apparatus 1000 may include: the original image obtaining module 1010 is configured to obtain at least two original images, where the at least two original images are images collected by cameras in different poses for a target scene; the three-dimensional point cloud establishing module 1020 is used for generating a three-dimensional point cloud of the target scene according to at least two original images; the target image obtaining module 1030 is configured to render a three-dimensional point cloud based on a sampling pose in a preset motion path of the camera, so as to obtain a target image.

In an exemplary embodiment, the image processing apparatus further includes: the video generation module is used for generating a target video based on the target image; alternatively, the target video is generated based on the original image and the target image.

In an exemplary embodiment, the three-dimensional point cloud establishment module includes: the image correction unit is used for determining the relative pose between the two original images and correcting the two original images by using the relative pose; the depth value determining unit is used for determining the depth value of the pixel point according to the parallax of the pixel point with the corresponding relation between the two original images; and the point cloud generating unit is used for generating a three-dimensional point cloud of the target scene based on the original image and the depth value.

In an exemplary embodiment, the image processing apparatus further includes: the depth value optimizing module is used for carrying out scene segmentation on at least one original image in the two original images after determining the depth value of the pixel point, and carrying out optimizing processing on the depth value according to a scene segmentation result.

In an exemplary embodiment, the point cloud generating unit includes: the region restoration subunit is used for restoring the cavity region of the target scene by utilizing the original image and the depth value; and the point cloud generation subunit is used for generating a three-dimensional point cloud of the target scene containing the point cloud data of the hole area according to the original image and the depth value based on the repaired hole area.

In an exemplary embodiment, the image processing apparatus further includes: and the cavity area determining module is used for determining a cavity area according to the preset motion path and the depth value.

In an exemplary embodiment, the image processing apparatus further includes: the depth difference determining module is used for determining the depth difference between different objects in the target scene according to a scene segmentation result obtained by carrying out scene segmentation on at least one original image in the two original images and a depth value; and the depth difference judging module is used for executing restoration of the cavity area of the target scene by using the original image and the depth value when the depth difference meets the restoration condition.

In an exemplary embodiment, the area repair subunit includes: a color restoration subunit, configured to perform color restoration on the hole area by using the original image; and the depth restoration subunit is used for supplementing the depth value to the hollow area by utilizing the depth value.

The specific details of each part in the above apparatus are already described in the method part embodiments, and thus will not be repeated.

Exemplary embodiments of the present disclosure also provide a computer readable storage medium, which may be implemented in the form of a program product, comprising program code for causing a terminal device to perform the steps according to the various exemplary embodiments of the present disclosure described in the above section of the "exemplary method" when the program product is run on the terminal device, e.g. any one or more of the steps of fig. 3, 4, 5, 6 or 8 may be performed. The program product may employ a portable compact disc read-only memory (CD-ROM) and comprise program code and may be run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, a random access memory, a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

Those skilled in the art will appreciate that the various aspects of the present disclosure may be implemented as a system, method, or program product. Accordingly, various aspects of the disclosure may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system. Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An image processing method, comprising:

acquiring at least two original images, wherein the at least two original images are images acquired by cameras in different poses for a target scene;

determining the relative pose between two original images, and correcting the two original images by using the relative pose; determining a depth value of a pixel point according to the parallax of the pixel point with the corresponding relation between the two original images; generating a three-dimensional point cloud of the target scene based on the original image and the depth value;

based on a sampling pose in a preset motion path of a camera, projecting and rendering the three-dimensional point cloud to obtain a target image under the sampling pose; the preset motion path is a virtual lens-carrying path of a camera, and the sampling pose is a camera pose in the virtual lens-carrying path;

after determining the depth value of the pixel point, the method further includes:

Performing scene segmentation on at least one original image in the two original images, and performing optimization processing on the depth value according to a scene segmentation result;

the optimizing the depth value according to the scene segmentation result includes:

according to a scene segmentation result, smoothing the depth value of a pixel point where an object included in a target scene is located; and/or

And according to a scene segmentation result, sharpening the depth value of the pixel point where the object included in the target scene is located.

2. The method according to claim 1, wherein the method further comprises:

generating a target video based on the target image; or alternatively

Generating a target video based on the original image and the target image.

3. The method of claim 1, wherein the generating a three-dimensional point cloud of the target scene based on the original image and the depth values comprises:

repairing a cavity area of the target scene by using the original image and the depth value;

and generating a three-dimensional point cloud of the target scene containing the point cloud data of the hole area according to the original image and the depth value based on the repaired hole area.

4. A method according to claim 3, characterized in that the method further comprises:

and determining the cavity area according to the preset motion path and the depth value.

5. A method according to claim 3, characterized in that the method further comprises:

determining depth differences between different objects in the target scene according to a scene segmentation result obtained by performing scene segmentation on at least one of the two original images and the depth value;

and when the depth difference is determined to meet a repairing condition, repairing the cavity area of the target scene by utilizing the original image and the depth value.

6. A method according to claim 3, wherein repairing the hole area of the target scene using the original image and the depth values of the pixels comprises:

performing color restoration on the cavity area by using the original image;

and carrying out depth value complementation on the cavity area by utilizing the depth value.

7. An image processing apparatus, comprising:

the original image acquisition module is used for acquiring at least two original images, wherein the at least two original images are images acquired by cameras in different poses for a target scene;

The three-dimensional point cloud establishing module is used for determining the relative pose between two original images and correcting the two original images by utilizing the relative pose; determining a depth value of a pixel point according to the parallax of the pixel point with the corresponding relation between the two original images; generating a three-dimensional point cloud of the target scene based on the original image and the depth value;

the target image acquisition module is used for projecting and rendering the three-dimensional point cloud based on a sampling pose in a preset motion path of the camera to obtain a target image under the sampling pose; the preset motion path is a virtual lens-carrying path of a camera, and the sampling pose is a camera pose in the virtual lens-carrying path;

after determining the depth value of the pixel point, the apparatus is further configured to perform:

wherein, according to the scene segmentation result, the depth value is optimized and configured to:

8. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of any one of claims 1 to 6.

9. An electronic device, comprising:

a processor;

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the method of any one of claims 1 to 6 via execution of the executable instructions.