CN111815666A

CN111815666A - Image processing method and device, computer readable storage medium and electronic device

Info

Publication number: CN111815666A
Application number: CN202010796552.3A
Authority: CN
Inventors: 樊欢欢; 李姬俊男
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2020-08-10
Filing date: 2020-08-10
Publication date: 2020-10-23
Anticipated expiration: 2040-08-10
Also published as: CN111815666B

Abstract

The disclosure provides an image processing method, an image processing device, a computer readable storage medium and an electronic device, and relates to the technical field of image processing. The image processing method comprises the following steps: acquiring a two-dimensional image, performing semantic segmentation on the two-dimensional image, and determining a foreground area and a background area of the two-dimensional image; determining depth information of a background area; determining pixel information and depth information of an occlusion area by using the pixel information and the depth information of a background area; the position of the shielding area corresponds to the position of the foreground area on the two-dimensional image; and generating a three-dimensional image corresponding to the two-dimensional image by combining the pixel information and the depth information of the shielding area. The method and the device can convert the two-dimensional image into the three-dimensional image so as to improve the three-dimensional representation capability of the image.

Description

Image processing method and device, computer readable storage medium and electronic device

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an image processing method, an image processing apparatus, a computer-readable storage medium, and an electronic device.

Background

With the popularization of electronic devices such as mobile phones and tablet computers and the increasing arrangement of camera modules, the pursuit of users for the photographing effect is also increasing.

At present, the problems of no liveness and lack of stereoscopic impression exist in photos shot by a user through electronic equipment, and particularly, the effect of showing cannot be fully played in scenes teaching through lively activities.

Disclosure of Invention

The present disclosure provides an image processing method, an image processing apparatus, a computer-readable storage medium, and an electronic device, thereby overcoming, at least to some extent, the problem of a weak stereoscopic impression of a photographed photograph.

According to a first aspect of the present disclosure, there is provided an image processing method including: acquiring a two-dimensional image, performing semantic segmentation on the two-dimensional image, and determining a foreground area and a background area of the two-dimensional image; determining depth information of a background area; determining pixel information and depth information of an occlusion area by using the pixel information and the depth information of a background area; the position of the shielding area corresponds to the position of the foreground area on the two-dimensional image; and generating a three-dimensional image corresponding to the two-dimensional image by combining the pixel information and the depth information of the shielding area.

According to a second aspect of the present disclosure, there is provided an image processing apparatus comprising: the semantic segmentation module is used for acquiring a two-dimensional image, performing semantic segmentation on the two-dimensional image and determining a foreground area and a background area of the two-dimensional image; the depth determining module is used for determining the depth information of the background area; the occlusion information determining module is used for determining pixel information and depth information of an occlusion area by utilizing the pixel information and the depth information of the background area; the position of the shielding area corresponds to the position of the foreground area on the two-dimensional image; and the three-dimensional image generation module is used for generating a three-dimensional image corresponding to the two-dimensional image by combining the pixel information and the depth information of the shielding area.

According to a third aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image processing method described above.

According to a fourth aspect of the present disclosure, there is provided an electronic device comprising a processor; a memory for storing one or more programs which, when executed by the processor, cause the processor to implement the image processing method described above.

In the technical solutions provided by some embodiments of the present disclosure, a two-dimensional image is subjected to semantic segmentation to obtain a foreground region and a background region, and depth information of the background region is determined, and pixel information and depth information of an occlusion region are determined by using pixel information and depth information of the background region, and then a three-dimensional image is generated by using the pixel information and depth information of the occlusion region. On one hand, the method can convert the two-dimensional image into the three-dimensional image, so that the stereoscopic impression of image display is improved, and the visual effect is improved; on the other hand, aiming at edutainment scenes, the method can fully display the information of the images, so that the user can know the content of the images more easily; on the other hand, the scheme can be applied to an augmented reality technology or a virtual reality technology to construct application scenes of different types so as to improve the perception degree and the participation degree of the user.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty. In the drawings:

FIG. 1 shows a schematic diagram of an exemplary system architecture for an image processing scheme of an embodiment of the present disclosure;

FIG. 2 illustrates a schematic structural diagram of an electronic device suitable for use in implementing embodiments of the present disclosure;

FIG. 3 schematically shows a flow chart of an image processing method according to an exemplary embodiment of the present disclosure;

FIG. 4 schematically illustrates an effect graph of semantic segmentation according to an embodiment of the present disclosure;

FIG. 5 schematically illustrates a network architecture diagram for determining pixel information and depth information for an occlusion region using a neural network;

FIG. 6 schematically shows a flow diagram of an overall image processing procedure according to an embodiment of the disclosure;

fig. 7 schematically shows a block diagram of an image processing apparatus according to an exemplary embodiment of the present disclosure;

fig. 8 schematically shows a block diagram of an image processing apparatus according to another exemplary embodiment of the present disclosure;

fig. 9 schematically shows a block diagram of an image processing apparatus according to yet another exemplary embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the steps. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

With the development of terminal technology and camera technology, users have higher and higher requirements for images. The two-dimensional image in the photo album has the problems of no vividness and lack of stereoscopic impression, and if the two-dimensional image is converted into the three-dimensional image, the content of the image can be richer, and the entertainment and the user experience can be improved.

In an exemplary embodiment of the disclosure, a semantic segmentation technology and a depth estimation technology are combined, so that a two-dimensional image can be converted into a three-dimensional image, and a two-dimensional photo album can be further converted into a three-dimensional photo album. In some scenarios, animation can also be formed using three-dimensional images for the purpose of edutainment. In addition, the generated three-dimensional image can be applied to an augmented reality scene or a virtual reality scene, and the application range of the generated three-dimensional image is not limited by the disclosure.

FIG. 1 shows a schematic diagram of an exemplary system architecture for an image processing scheme of an embodiment of the present disclosure.

As shown in fig. 1, the system architecture 1000 may include one or more of

terminal devices

1001, 1002, 1003, a network 1004, and a server 1005. The network 1004 is used to provide a medium for communication links between the

terminal devices

1001, 1002, 1003 and the server 1005. Network 1004 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, the server 1005 may be a server cluster composed of a plurality of servers.

A user may use the

terminal devices

1001, 1002, 1003 to interact with a server 1005 via a network 1004 to receive or transmit messages or the like. The

terminal devices

1001, 1002, 1003 may be various electronic devices having a display screen, including but not limited to smart phones, tablet computers, portable computers, desktop computers, and the like.

In an example of implementing the image processing scheme according to the exemplary embodiment of the present disclosure only using the

terminal devices

1001, 1002, and 1003, when the

terminal devices

1001, 1002, and 1003 determine a two-dimensional image that needs to be converted into a three-dimensional image, firstly, on one hand, the two-dimensional image may be subjected to semantic segmentation to determine a foreground region and a background region of the two-dimensional image, and on the other hand, the two-dimensional image may be subjected to depth estimation to obtain depth information of each pixel on the two-dimensional image, and then determine depth information of the background region; next, determining pixel information and depth information of an occlusion region by using the pixel information and depth information of the background region, wherein the position of the occlusion region corresponds to the position of the foreground region on the two-dimensional image; then, a three-dimensional image corresponding to the two-dimensional image is generated by combining the pixel information and the depth information of the occlusion region.

In this case, an image processing apparatus described below may be configured in the

terminal devices

1001, 1002, 1003.

The image processing scheme of the present disclosure may also be performed by the server 1005. First, the server 1005 acquires a two-dimensional image from the

terminal devices

1001, 1002, 1003 via the network 1004, or the server 1005 may acquire a two-dimensional image from another server or storage device; next, the server 1005 may perform semantic segmentation on the two-dimensional image to determine a foreground region and a background region of the two-dimensional image, and may further perform depth estimation on the two-dimensional image and determine depth information of the background region based on a result of the depth estimation; subsequently, the server 1005 may determine the pixel information and the depth information of the occlusion region using the pixel information and the depth information of the background region, and generate a three-dimensional image corresponding to the two-dimensional image in combination with the pixel information and the depth information of the occlusion region. Further, the server 1005 may also generate a three-dimensional album using the three-dimensional image and/or transmit the three-dimensional image to the

terminal devices

1001, 1002, 1003.

In this case, an image processing apparatus described below may be configured in the server 1005.

Fig. 2 shows a schematic diagram of an electronic device suitable for implementing exemplary embodiments of the present disclosure, which may be configured in the form of the electronic device shown in fig. 2. It should be further noted that the electronic device shown in fig. 2 is only an example, and should not bring any limitation to the functions and the scope of the embodiments of the present disclosure.

The electronic device of the present disclosure includes at least a processor and a memory for storing one or more programs, which when executed by the processor, cause the processor to implement the image processing method of the exemplary embodiments of the present disclosure.

Specifically, as shown in fig. 2, the electronic device 200 may include: a processor 210, an internal memory 221, an external memory interface 222, a Universal Serial Bus (USB) interface 230, a charging management module 240, a power management module 241, a battery 242, an antenna 1, an antenna 2, a mobile communication module 250, a wireless communication module 260, an audio module 270, a speaker 271, a microphone 272, a microphone 273, an earphone interface 274, a sensor module 280, a display screen 290, a camera module 291, an indicator 292, a motor 293, a key 294, and a Subscriber Identity Module (SIM) card interface 295, and the like. The sensor module 280 may include a depth sensor, a pressure sensor, a gyroscope sensor, an air pressure sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity light sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, and the like.

It is to be understood that the illustrated structure of the embodiment of the present application does not specifically limit the electronic device 200. In other embodiments of the present application, the electronic device 200 may include more or fewer components than shown, or combine certain components, or split certain components, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Processor 210 may include one or more processing units, such as: the Processor 210 may include an Application Processor (AP), a modem Processor, a Graphics Processor (GPU), an Image Signal Processor (ISP), a controller, a video codec, a Digital Signal Processor (DSP), a baseband Processor, and/or a Neural Network Processor (NPU), and the like. The different processing units may be separate devices or may be integrated into one or more processors. Additionally, a memory may be provided in processor 210 for storing instructions and data.

The USB interface 230 is an interface conforming to the USB standard specification, and may specifically be a MiniUSB interface, a microsusb interface, a USB type c interface, or the like. The USB interface 230 may be used to connect a charger to charge the electronic device 200, and may also be used to transmit data between the electronic device 200 and a peripheral device. And the earphone can also be used for connecting an earphone and playing audio through the earphone. The interface may also be used to connect other electronic devices, such as AR devices and the like.

The charge management module 240 is configured to receive a charging input from a charger. The charger may be a wireless charger or a wired charger. The power management module 241 is used for connecting the battery 242, the charging management module 240 and the processor 210. The power management module 241 receives the input of the battery 242 and/or the charging management module 240, and supplies power to the processor 210, the internal memory 221, the display screen 290, the camera module 291, the wireless communication module 260, and the like.

The wireless communication function of the electronic device 200 may be implemented by the antenna 1, the antenna 2, the mobile communication module 250, the wireless communication module 260, a modem processor, a baseband processor, and the like.

The mobile communication module 250 may provide a solution including 2G/3G/4G/5G wireless communication applied on the electronic device 200.

The Wireless Communication module 260 may provide a solution for Wireless Communication applied to the electronic device 200, including Wireless Local Area Networks (WLANs) (e.g., Wireless Fidelity (Wi-Fi) network), Bluetooth (BT), Global Navigation Satellite System (GNSS), Frequency Modulation (FM), Near Field Communication (NFC), Infrared (IR), and the like.

The electronic device 200 implements a display function through the GPU, the display screen 290, the application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display screen 290 and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 210 may include one or more GPUs that execute program instructions to generate or alter display information.

The electronic device 200 may implement a shooting function through the ISP, the camera module 291, the video codec, the GPU, the display screen 290, the application processor, and the like. In some embodiments, the electronic device 200 may include 1 or N camera modules 291, where N is a positive integer greater than 1, and if the electronic device 200 includes N cameras, one of the N cameras is a main camera.

Internal memory 221 may be used to store computer-executable program code, including instructions. The internal memory 221 may include a program storage area and a data storage area. The external memory interface 222 may be used to connect an external memory card, such as a Micro SD card, to extend the memory capability of the electronic device 200.

The electronic device 200 may implement an audio function through the audio module 270, the speaker 271, the receiver 272, the microphone 273, the headphone interface 274, the application processor, and the like. Such as music playing, recording, etc.

Audio module 270 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. Audio module 270 may also be used to encode and decode audio signals. In some embodiments, the audio module 270 may be disposed in the processor 210, or some functional modules of the audio module 270 may be disposed in the processor 210.

The speaker 271, also called "horn", is used to convert the audio electrical signal into a sound signal. The electronic apparatus 200 can listen to music through the speaker 271 or listen to a handsfree phone call. The receiver 272, also called "earpiece", is used to convert the electrical audio signal into an acoustic signal. When the electronic device 200 receives a call or voice information, it can receive the voice by placing the receiver 272 close to the ear of the person. The microphone 273, also known as a "microphone," is used to convert sound signals into electrical signals. When making a call or transmitting voice information, the user can input a voice signal to the microphone 273 by sounding a voice signal near the microphone 273 through the mouth. The electronic device 200 may be provided with at least one microphone 273. The earphone interface 274 is used to connect wired earphones.

For sensors that the sensor module 280 may include in the electronic device 200, a depth sensor is used to obtain depth information of a scene. The pressure sensor is used for sensing a pressure signal and converting the pressure signal into an electric signal. The gyro sensor may be used to determine the motion pose of the electronic device 200. The air pressure sensor is used for measuring air pressure. The magnetic sensor includes a hall sensor. The electronic device 200 may detect the opening and closing of the flip holster using a magnetic sensor. The acceleration sensor may detect the magnitude of acceleration of the electronic device 200 in various directions (typically three axes). The distance sensor is used for measuring distance. The proximity light sensor may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode. The fingerprint sensor is used for collecting fingerprints. The temperature sensor is used for detecting temperature. The touch sensor can communicate the detected touch operation to the application processor to determine the touch event type. Visual output related to the touch operation may be provided through the display screen 290. The ambient light sensor is used for sensing the ambient light brightness. The bone conduction sensor may acquire a vibration signal.

The keys 294 include a power-on key, a volume key, and the like. The keys 294 may be mechanical keys. Or may be touch keys. The motor 293 may generate a vibration indication. The motor 293 may be used for both electrical vibration prompting and touch vibration feedback. Indicator 292 may be an indicator light that may be used to indicate a state of charge, a change in charge, or may be used to indicate a message, missed call, notification, etc. The SIM card interface 295 is used to connect a SIM card. The electronic device 200 interacts with the network through the SIM card to implement functions such as communication and data communication.

The present application also provides a computer-readable storage medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device.

A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable storage medium may transmit, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The computer-readable storage medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method as described in the embodiments below.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

The following description will be given taking as an example that the terminal device executes the image processing scheme of the present disclosure.

Fig. 3 schematically shows a flowchart of an image processing method of an exemplary embodiment of the present disclosure. Referring to fig. 3, the image processing method may include the steps of:

and S32, acquiring a two-dimensional image, performing semantic segmentation on the two-dimensional image, and determining a foreground area and a background area of the two-dimensional image.

In an exemplary embodiment of the present disclosure, the two-dimensional image may be an image captured by a camera module of the terminal device, or may be an image acquired from another device or a server, and the present disclosure does not limit the format, size, source, and the like of the two-dimensional image.

The two-dimensional images may be stored in a two-dimensional album from which the user may sort out the two-dimensional images to be three-dimensionally transformed to perform the steps of the disclosed scheme. The terminal device can also classify the two-dimensional images in the photo album according to time sequence, shooting places and the like, and the scheme of converting the two-dimensional images into the three-dimensional images is executed according to categories.

In other embodiments, each time the terminal device takes a two-dimensional image, the terminal device executes the disclosure to obtain a corresponding three-dimensional image.

After the terminal equipment acquires the two-dimensional image to be subjected to three-dimensional conversion, semantic segmentation can be performed on the two-dimensional image. Semantic segmentation refers to classification at the pixel level, where pixels belonging to the same class are classified into one class.

According to some embodiments of the present disclosure, semantic segmentation of a two-dimensional image may be implemented using a semantic segmentation model, which may be implemented based on a deep neural network. Firstly, a semantic segmentation model can be trained by using a training data set, then, a two-dimensional image is input into the trained semantic segmentation model, and a foreground region and a background region of the two-dimensional image can be obtained according to the output of the model. For example, the foreground region may include objects corresponding to the user interest points, such as people, animals, and cars, and the background region may correspond to the background where the people, animals, and cars are located, such as grassland, trees, sky, and the like.

The implementation of semantic segmentation is not specifically limited by the present disclosure, however, it should be noted that the scheme of applying the concept of semantic segmentation to the conversion of two-dimensional images into three-dimensional images is included in the present disclosure.

Fig. 4 schematically shows an effect diagram of semantic segmentation according to an embodiment of the present disclosure. Referring to fig. 4, after semantic segmentation is performed on the two-dimensional image 40, a background region 41 and a foreground region 42 can be obtained.

And S34, determining the depth information of the background area.

After the terminal device acquires the two-dimensional image, depth estimation can be performed on the two-dimensional image. Depth estimation is to determine depth information of each pixel point on a two-dimensional image.

According to some embodiments of the present disclosure, depth estimation of a two-dimensional image may be implemented using a depth estimation model, which may also be implemented based on a neural network. Firstly, a depth estimation model can be trained by utilizing a large number of images with pixel-level depth labels to obtain a trained depth estimation model; next, the two-dimensional image may be input into the trained depth estimation model, and a result of depth estimation of the two-dimensional image, that is, depth information of the two-dimensional image, may be obtained according to an output of the model.

It should be noted that the present disclosure does not limit the order of the process of performing the depth estimation and the process of performing the semantic segmentation at step S32. That is, the process of semantic segmentation may be performed first and then the process of depth estimation, the process of depth estimation may be performed first and then the process of semantic segmentation, or the processes of semantic segmentation and depth estimation may be performed simultaneously.

After depth estimation is performed on the two-dimensional image, depth information of the background area may be determined based on the result of the depth estimation.

For example, after the background area of the two-dimensional image is determined, the coordinates of the background area may be obtained. Next, depth information of the background region may be determined from the depth information of the two-dimensional image using the coordinates of the foreground region.

Similarly, the terminal device may also determine depth information of the foreground region.

In addition, in other embodiments of the present disclosure, before performing depth estimation on the two-dimensional image, whether a target object is included in the foreground region may also be identified. In case the foreground region contains the target object, then a depth estimation is performed on the two-dimensional image. If the foreground region contains the target object, the two-dimensional image is not processed.

The target object may be set by the user in advance, for example, in the case where the user desires to perform three-dimensional image conversion only on an image containing a person (or a specific person such as the person himself) in the two-dimensional album, the user may set the target object as the person. Specifically, the setting function may be configured in the album, and the user may set the target object by sliding, clicking, checking, and the like. By adding the setting function in the photo album, the requirements of different users can be met.

Specifically, in the process of identifying whether the foreground region contains the target object, under the condition that the semantic segmentation algorithm can directly determine the object type contained in the segmented region, whether the foreground region contains the target object can be directly determined according to the result of the semantic segmentation.

In the case that the semantic segmentation algorithm cannot directly determine the type of the object included in the segmented region, the foreground region may be additionally identified to obtain a result of whether the foreground region includes the target object. The process of image recognition on the foreground region can also be implemented in a neural network manner, which is not limited by the present disclosure.

The process of determining the depth information of the background area is described above by taking depth estimation of the two-dimensional image as an example. However, in other embodiments of the present disclosure, a depth sensor may be configured on the terminal device, and when a two-dimensional image is captured, the depth information of the two-dimensional image may be directly obtained by the depth sensor, so that the depth information of the background area may be directly determined.

S36, determining pixel information and depth information of an occlusion area by using the pixel information and the depth information of the background area; and the position of the shielding area corresponds to the position of the foreground area on the two-dimensional image.

In an exemplary embodiment of the present disclosure, an occlusion region refers to a region where a foreground region occludes a background. The position of the occlusion region corresponds to the position of the foreground region on the two-dimensional image, that is, the occlusion region may be an image region missing from the two-dimensional image after the foreground region is removed from the two-dimensional image, that is, the occlusion region is located at a position corresponding to the foreground region. Referring to fig. 4, the occlusion region is a region occluded by a puppy.

Under the condition that the mobile terminal determines the pixel information and the depth information of the background area, the pixel information and the depth information of the shielding area can be predicted.

First, the pixel information and the depth information of the background region may be feature extracted to generate intermediate information. Next, in one aspect, a pixel information prediction process may be performed on the intermediate information to determine pixel information for the occlusion region; in another aspect, a depth information prediction process may be performed on the intermediate information to determine depth information for the occlusion region.

Specifically, the pixel information prediction process may be implemented by one Convolutional Neural Network (CNN), and the depth information prediction process may be implemented by another Convolutional Neural network.

Referring to fig. 5, first, pixel information and depth information of a background region may be input to the first neural network 51 for feature extraction, generating intermediate information. Specifically, the first neural network 51 may be configured by using a VGG16 network, and the first neural network 51 may also be configured by using a CNN network, which is not limited by the present disclosure.

Next, in one aspect, the intermediate information may be input to a second neural network 52, where the second neural network 52 may be a CNN network, to predict pixel information of the occlusion region and output the pixel information of the occlusion region.

On the other hand, the intermediate information may be input to a third neural network 53, which may be another CNN network, to predict the depth information of the occlusion region and output the depth information of the occlusion region.

The network structure and the training process of the neural network involved in fig. 5 are not limited by the present disclosure.

In addition, considering that the depth difference between the foreground region and the background region of some two-dimensional images is small, resources are not required to be consumed for three-dimensional conversion. Therefore, before determining the pixel information and the depth information of the occlusion region, a process of determining a depth difference between the foreground region and the background region may also be included.

Firstly, the terminal equipment can determine the depth information of a foreground area; next, determining a depth difference between the foreground region and the background region based on the depth information of the foreground region and the depth information of the background region; the depth difference is then compared to a depth threshold. The depth threshold may be set in advance, for example, to 10cm, 0.5m, or the like.

If the depth difference is greater than a depth threshold, a process of determining pixel information and depth information of the occlusion region is performed. If the depth difference is not larger than the depth threshold, the processing procedure of the scheme is stopped, and a prompt of 'conversion is not recommended because the depth difference is small' can be fed back to the user.

And S38, combining the pixel information and the depth information of the shielding area to generate a three-dimensional image corresponding to the two-dimensional image.

Firstly, depth information of a foreground region can be determined based on a depth estimation result, and pixel information of the foreground region is obtained; next, a three-dimensional image corresponding to the two-dimensional image may be generated in combination with the pixel information and the depth information of the occlusion region and the pixel information and the depth information of the foreground region.

In some embodiments of the present disclosure, the three-dimensional image of the present disclosure may be an image of the same size as the two-dimensional image on the two-dimensional plane.

In this case, the process of generating the three-dimensional image needs to utilize the pixel information and the depth information of the background region in addition to the pixel information and the depth information of the occlusion region and the pixel information and the depth information of the foreground region.

In other embodiments of the present disclosure, the three-dimensional image of the present disclosure may be a three-dimensional image for only the foreground region. In the case of the two-dimensional image as shown in fig. 4, the generated three-dimensional image may be a three-dimensional image including only the puppy without including the background region.

Specifically, a three-dimensional image of an object corresponding to the foreground region may be generated as a three-dimensional image corresponding to the two-dimensional image by using the pixel information and the depth information of the occlusion region and the pixel information and the depth information of the foreground region.

It should be understood that the process of generating a three-dimensional image includes a process of three-dimensional rendering. In addition, because the three-dimensional image is adopted, the shielding relation among objects (objects) in the image can be mapped according to different viewing angles, and the viewing effects under different viewing angles can be obtained according to the shielding relation. On the basis of the three-dimensional animation, three-dimensional animation can be generated so that a user can watch three-dimensional images at different angles.

The entire image processing procedure of the embodiment of the present disclosure will be explained with reference to fig. 6.

In step S602, the terminal device may acquire a two-dimensional image; in step S604, the terminal device may perform semantic segmentation on the two-dimensional image; in step S606, the terminal device may perform depth estimation on the two-dimensional image.

Based on the result of the semantic segmentation at step S604, a foreground region may be determined at step S608, and a background region may be determined at step S610. Based on the result of the semantic segmentation at step S606, a depth value (i.e., depth information) of each pixel on the two-dimensional image may be determined at step S612.

In step S614, pixel estimation and depth estimation may be performed on the occlusion part according to the pixel information of the background area and the depth information of the background area.

For the neural network based pixel estimation process, in step S616, pixel information of the occlusion portion may be determined.

For another neural network based depth estimation process, in step S618, depth information of the occlusion portion may be determined.

In step S620, three-dimensional rendering is performed by combining the depth information of the occlusion part and the information of the foreground region.

In step S622, the terminal device may output the rendered three-dimensional image. In addition, a three-dimensional animation can be generated for displaying, a three-dimensional photo album can be generated based on the three-dimensional image, and specifically, the three-dimensional photo album can be configured into a cloud photo album, so that the storage space of the terminal device is saved.

It should be noted that although the various steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that these steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.

Further, an image processing apparatus is also provided in the present exemplary embodiment.

Fig. 7 schematically shows a block diagram of an image processing apparatus of an exemplary embodiment of the present disclosure. Referring to fig. 7, the image processing apparatus 7 according to an exemplary embodiment of the present disclosure may include a semantic segmentation module 71, a depth determination module 73, an occlusion information determination module 75, and a three-dimensional image generation module 77.

Specifically, the semantic segmentation module 71 may be configured to obtain a two-dimensional image, perform semantic segmentation on the two-dimensional image, and determine a foreground region and a background region of the two-dimensional image; the depth determination module 73 may be used to determine depth information of the background region; the occlusion information determining module 75 may be configured to determine pixel information and depth information of an occlusion region by using the pixel information and depth information of the background region; the position of the shielding area corresponds to the position of the foreground area on the two-dimensional image; the three-dimensional image generation module 77 may be configured to generate a three-dimensional image corresponding to the two-dimensional image in combination with the pixel information and the depth information of the occlusion region.

Based on the image processing device of the exemplary embodiment of the disclosure, on one hand, the two-dimensional image can be converted into the three-dimensional image, the stereoscopic impression of image display is improved, and the visual effect is improved; on the other hand, aiming at edutainment scenes, the method can fully display the information of the images, so that the user can know the content of the images more easily; on the other hand, the scheme can be applied to an augmented reality technology or a virtual reality technology to construct application scenes of different types so as to improve the perception degree and the participation degree of the user.

According to an exemplary embodiment of the present disclosure, the occlusion information determination module 75 may be configured to perform: extracting the characteristics of the pixel information and the depth information of the background area to generate intermediate information; performing a pixel information prediction process on the intermediate information to determine pixel information of the occlusion region; a depth information prediction process is performed on the intermediate information to determine depth information for the occlusion region.

According to an exemplary embodiment of the present disclosure, referring to fig. 8, the image processing apparatus 8 may further include a depth difference comparing module 81 compared to the image processing apparatus 7.

In particular, the depth difference comparison module 81 may be configured to perform: determining depth information of a foreground region; determining a depth difference between the foreground region and the background region based on the depth information of the foreground region and the depth information of the background region; comparing the depth difference to a depth threshold; wherein if the depth difference is greater than the depth threshold, the occlusion information determination module 75 is controlled to perform a process of determining pixel information and depth information of the occlusion region.

According to an exemplary embodiment of the present disclosure, the three-dimensional image generation module 77 may be configured to perform: acquiring pixel information and depth information of a foreground area; and generating a three-dimensional image corresponding to the two-dimensional image by combining the pixel information and the depth information of the shielding area and the pixel information and the depth information of the foreground area.

According to an exemplary embodiment of the present disclosure, the process of generating the three-dimensional image by the three-dimensional image generation module 77 may be configured to perform: and generating a three-dimensional image of the object corresponding to the foreground region as a three-dimensional image corresponding to the two-dimensional image by using the pixel information and the depth information of the occlusion region and the pixel information and the depth information of the foreground region.

According to an exemplary embodiment of the present disclosure, the process of generating the three-dimensional image by the three-dimensional image generation module 77 may be further configured to perform: and generating a three-dimensional image corresponding to the two-dimensional image by using the pixel information and the depth information of the shielding area, the pixel information and the depth information of the foreground area and the pixel information and the depth information of the background area.

According to an exemplary embodiment of the present disclosure, the depth determination module 73 may be configured to perform: depth estimation is performed on the two-dimensional image, and depth information of the background area is determined based on the result of the depth estimation.

According to an exemplary embodiment of the present disclosure, referring to fig. 9, the image processing apparatus 9 may further include an object recognition module 91 compared to the image processing apparatus 7.

In particular, the object recognition module 91 may be configured to perform: identifying whether a target object is contained in the foreground area; wherein if the foreground region contains the target object, the depth determination module 73 is controlled to perform a process of depth estimation on the two-dimensional image.

It should be understood that the object recognition module 91 may also be configured in the image processing apparatus 8 described above. Similarly, the depth difference comparing module 81 included in the image processing apparatus 8 may also be configured in the image processing apparatus 9.

Since each functional module of the image processing apparatus according to the embodiment of the present disclosure is the same as that in the embodiment of the method described above, it is not described herein again.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

Furthermore, the above-described figures are merely schematic illustrations of processes included in methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the terms of the appended claims.

Claims

1. An image processing method, comprising:

acquiring a two-dimensional image, performing semantic segmentation on the two-dimensional image, and determining a foreground area and a background area of the two-dimensional image;

determining depth information of the background area;

determining pixel information and depth information of an occlusion area by using the pixel information and the depth information of the background area; wherein the position of the occlusion region corresponds to the position of the foreground region on the two-dimensional image;

and generating a three-dimensional image corresponding to the two-dimensional image by combining the pixel information and the depth information of the shielding area.

2. The image processing method according to claim 1, wherein determining pixel information and depth information of an occlusion region using the pixel information and depth information of the background region comprises:

extracting the characteristics of the pixel information and the depth information of the background area to generate intermediate information;

performing a pixel information prediction process on the intermediate information to determine pixel information of the occlusion region;

performing a depth information prediction process on the intermediate information to determine depth information of the occlusion region.

3. The image processing method according to claim 2, wherein before determining the pixel information and the depth information of the occlusion region, the image processing method further comprises:

determining depth information of the foreground region;

determining a depth difference between the foreground region and the background region based on the depth information of the foreground region and the depth information of the background region;

comparing the depth difference to a depth threshold;

wherein if the depth difference is greater than the depth threshold, performing a process of determining pixel information and depth information of an occlusion region.

4. The image processing method according to claim 1, wherein generating a three-dimensional image corresponding to the two-dimensional image in combination with pixel information and depth information of the occlusion region comprises:

acquiring pixel information and depth information of the foreground area;

and generating a three-dimensional image corresponding to the two-dimensional image by combining the pixel information and the depth information of the shielding area and the pixel information and the depth information of the foreground area.

5. The image processing method according to claim 4, wherein generating a three-dimensional image corresponding to the two-dimensional image by combining pixel information and depth information of the occlusion region and pixel information and depth information of the foreground region comprises:

and generating a three-dimensional image of the object corresponding to the foreground region as a three-dimensional image corresponding to the two-dimensional image by using the pixel information and the depth information of the shielding region and the pixel information and the depth information of the foreground region.

6. The image processing method according to claim 4, wherein generating a three-dimensional image corresponding to the two-dimensional image by combining pixel information and depth information of the occlusion region and pixel information and depth information of the foreground region comprises:

and generating a three-dimensional image corresponding to the two-dimensional image by using the pixel information and the depth information of the shielding area, the pixel information and the depth information of the foreground area and the pixel information and the depth information of the background area.

7. The image processing method of claim 1, wherein determining depth information for the background region comprises:

and performing depth estimation on the two-dimensional image, and determining depth information of the background area based on the depth estimation result.

8. The image processing method according to claim 7, wherein before the depth estimation of the two-dimensional image, the image processing method further comprises:

identifying whether a target object is contained in the foreground area;

wherein if the foreground region contains a target object, depth estimation is performed on the two-dimensional image.

9. An image processing apparatus characterized by comprising:

the semantic segmentation module is used for acquiring a two-dimensional image, performing semantic segmentation on the two-dimensional image and determining a foreground area and a background area of the two-dimensional image;

a depth determination module for determining depth information of the background region;

the occlusion information determining module is used for determining pixel information and depth information of an occlusion area by utilizing the pixel information and the depth information of the background area; wherein the position of the occlusion region corresponds to the position of the foreground region on the two-dimensional image;

and the three-dimensional image generation module is used for generating a three-dimensional image corresponding to the two-dimensional image by combining the pixel information and the depth information of the shielding area.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out an image processing method according to any one of claims 1 to 8.

11. An electronic device, comprising:

a processor;

a memory for storing one or more programs which, when executed by the processor, cause the processor to implement the image processing method of any one of claims 1 to 8.