CN111815666B

CN111815666B - Image processing method and device, computer readable storage medium and electronic equipment

Info

Publication number: CN111815666B
Application number: CN202010796552.3A
Authority: CN
Inventors: 樊欢欢; 李姬俊男
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2020-08-10
Filing date: 2020-08-10
Publication date: 2024-04-02
Anticipated expiration: 2040-08-10
Also published as: CN111815666A

Abstract

The present disclosure provides an image processing method, an image processing apparatus, a computer-readable storage medium, and an electronic device, and relates to the technical field of image processing. The image processing method comprises the following steps: acquiring a two-dimensional image, performing semantic segmentation on the two-dimensional image, and determining a foreground area and a background area of the two-dimensional image; determining depth information of a background area; determining pixel information and depth information of a shielding area by using the pixel information and the depth information of a background area; the position of the shielding area corresponds to the position of the foreground area on the two-dimensional image; and combining the pixel information and the depth information of the shielding area to generate a three-dimensional image corresponding to the two-dimensional image. The method and the device can convert the two-dimensional image into the three-dimensional image so as to improve the stereoscopic representation capability of the image.

Description

Image processing method and device, computer readable storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of image processing technology, and in particular, to an image processing method, an image processing apparatus, a computer-readable storage medium, and an electronic device.

Background

With the popularization of electronic devices such as mobile phones and tablet computers and the continuous improvement of the configuration of camera modules, the pursuit of users on photographing effects is also continuously improved.

At present, a photograph shot by a user by using electronic equipment has the problems of incapacity and lack of stereoscopic impression, and particularly, the photograph cannot fully play a role in displaying in a scene of lively education.

Disclosure of Invention

The present disclosure provides an image processing method, an image processing apparatus, a computer-readable storage medium, and an electronic device, and further overcomes the problem of weak stereoscopic impression of a photographed photo at least to some extent.

According to a first aspect of the present disclosure, there is provided an image processing method including: acquiring a two-dimensional image, performing semantic segmentation on the two-dimensional image, and determining a foreground area and a background area of the two-dimensional image; determining depth information of a background area; determining pixel information and depth information of a shielding area by using the pixel information and the depth information of a background area; the position of the shielding area corresponds to the position of the foreground area on the two-dimensional image; and combining the pixel information and the depth information of the shielding area to generate a three-dimensional image corresponding to the two-dimensional image.

According to a second aspect of the present disclosure, there is provided an image processing apparatus including: the semantic segmentation module is used for acquiring a two-dimensional image, carrying out semantic segmentation on the two-dimensional image and determining a foreground area and a background area of the two-dimensional image; the depth determining module is used for determining depth information of the background area; the shielding information determining module is used for determining the pixel information and the depth information of the shielding area by utilizing the pixel information and the depth information of the background area; the position of the shielding area corresponds to the position of the foreground area on the two-dimensional image; and the three-dimensional image generation module is used for combining the pixel information and the depth information of the shielding area to generate a three-dimensional image corresponding to the two-dimensional image.

According to a third aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described image processing method.

According to a fourth aspect of the present disclosure, there is provided an electronic device comprising a processor; and a memory for storing one or more programs which, when executed by the processor, cause the processor to implement the image processing method described above.

In some embodiments of the present disclosure, semantic segmentation is performed on a two-dimensional image to obtain a foreground region and a background region, depth information of the background region is determined, pixel information and depth information of an occlusion region are determined by using pixel information and depth information of the background region, and a three-dimensional image is generated by using pixel information and depth information of the occlusion region. On one hand, the method and the device can convert the two-dimensional image into the three-dimensional image, improve the stereoscopic impression of image display and improve the visual effect; on the other hand, aiming at the scene of lively education, the method and the device can fully display the information of the image, so that a user can know the content of the image more easily; in still another aspect, the scheme can be applied to an augmented reality technology or a virtual reality technology to construct application scenes of different types so as to improve the perception degree and participation degree of a user.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort. In the drawings:

FIG. 1 shows a schematic diagram of an exemplary system architecture of an image processing scheme of an embodiment of the present disclosure;

FIG. 2 illustrates a schematic diagram of an electronic device suitable for use in implementing embodiments of the present disclosure;

fig. 3 schematically illustrates a flowchart of an image processing method according to an exemplary embodiment of the present disclosure;

FIG. 4 schematically illustrates an effect diagram of semantic segmentation according to an embodiment of the present disclosure;

FIG. 5 schematically illustrates a network structure diagram of pixel information and depth information for determining occlusion regions using a neural network;

FIG. 6 schematically illustrates a flowchart of an overall image processing procedure according to an embodiment of the present disclosure;

Fig. 7 schematically illustrates a block diagram of an image processing apparatus according to an exemplary embodiment of the present disclosure;

fig. 8 schematically illustrates a block diagram of an image processing apparatus according to another exemplary embodiment of the present disclosure;

fig. 9 schematically shows a block diagram of an image processing apparatus according to still another exemplary embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the present disclosure. One skilled in the relevant art will recognize, however, that the aspects of the disclosure may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.

The flow diagrams depicted in the figures are exemplary only and not necessarily all steps are included. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.

With the development of terminal technology and camera technology, the requirements of users on images are higher and higher. The two-dimensional images in the album have the problems of no liveness and lacking of stereoscopic impression, and if the two-dimensional images are converted into three-dimensional images, the content of the images can be richer, and entertainment and user experience can be improved.

In an exemplary embodiment of the present disclosure, a two-dimensional image may be converted into a three-dimensional image in combination with a semantic segmentation technique and a depth estimation technique, and thus a two-dimensional album may be converted into a three-dimensional album. In some scenarios, three-dimensional images may also be used to animate for the purpose of teaching through lively activities. In addition, the generated three-dimensional image can be applied to an augmented reality scene or a virtual reality scene, and the application range of the generated three-dimensional image is not limited by the present disclosure.

Fig. 1 shows a schematic diagram of an exemplary system architecture of an image processing scheme of an embodiment of the present disclosure.

As shown in fig. 1, system architecture 1000 may include one or more of terminal devices 1001, 1002, 1003, a network 1004, and a server 1005. The network 1004 serves as a medium for providing a communication link between the terminal apparatuses 1001, 1002, 1003 and the server 1005. The network 1004 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, the server 1005 may be a server cluster formed by a plurality of servers.

A user can interact with a server 1005 via a network 1004 using terminal apparatuses 1001, 1002, 1003 to receive or transmit messages or the like. The terminal devices 1001, 1002, 1003 may be various electronic devices having a display screen including, but not limited to, smartphones, tablet computers, portable computers, desktop computers, and the like.

In an example of implementing the image processing scheme of the exemplary embodiment of the present disclosure by using only the terminal devices 1001, 1002, 1003, when the terminal devices 1001, 1002, 1003 determine a two-dimensional image that needs to be converted into a three-dimensional image, firstly, on one hand, semantic segmentation may be performed on the two-dimensional image to determine a foreground area and a background area of the two-dimensional image, and on the other hand, depth estimation may be performed on the two-dimensional image to obtain depth information of each pixel on the two-dimensional image, and further determine depth information of the background area; next, pixel information and depth information of an occlusion region may be determined using the pixel information and depth information of the background region, wherein a position of the occlusion region corresponds to a position of the foreground region on the two-dimensional image; then, a three-dimensional image corresponding to the two-dimensional image is generated by combining the pixel information and the depth information of the occlusion region.

In this case, an image processing apparatus described below may be configured in the terminal devices 1001, 1002, 1003.

The image processing scheme described in the present disclosure may also be executed by the server 1005. First, the server 1005 acquires a two-dimensional image from the terminal apparatuses 1001, 1002, 1003 via the network 1004, or the server 1005 may acquire a two-dimensional image from another server or a storage apparatus; next, the server 1005 may perform semantic segmentation on the two-dimensional image, determine a foreground region and a background region of the two-dimensional image, and further may perform depth estimation on the two-dimensional image, and determine depth information of the background region based on a result of the depth estimation; subsequently, the server 1005 may determine pixel information and depth information of the occlusion region using the pixel information and depth information of the background region, and generate a three-dimensional image corresponding to the two-dimensional image in combination with the pixel information and depth information of the occlusion region. In addition, the server 1005 may also generate a three-dimensional album using the three-dimensional image and/or transmit the three-dimensional image to the terminal apparatuses 1001, 1002, 1003.

In this case, an image processing apparatus described below may be configured in the server 1005.

Fig. 2 shows a schematic diagram of an electronic device suitable for implementing an exemplary embodiment of the present disclosure, which may be configured in the form of the electronic device shown in fig. 2. It should be further noted that the electronic device shown in fig. 2 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiments of the present disclosure.

The electronic device of the present disclosure includes at least a processor and a memory for storing one or more programs, which when executed by the processor, enable the processor to implement the image processing method of the exemplary embodiments of the present disclosure.

Specifically, as shown in fig. 2, the electronic device 200 may include: processor 210, internal memory 221, external memory interface 222, universal serial bus (Universal Serial Bus, USB) interface 230, charge management module 240, power management module 241, battery 242, antenna 1, antenna 2, mobile communication module 250, wireless communication module 260, audio module 270, speaker 271, receiver 272, microphone 273, headset interface 274, sensor module 280, display screen 290, camera module 291, indicator 292, motor 293, keys 294, and subscriber identity module (Subscriber Identification Module, SIM) card interface 295, and the like. The sensor module 280 may include a depth sensor, a pressure sensor, a gyroscope sensor, a barometric sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, and the like.

It is to be understood that the structure illustrated in the embodiments of the present application does not constitute a specific limitation on the electronic device 200. In other embodiments of the present application, electronic device 200 may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Processor 210 may include one or more processing units such as, for example: the processor 210 may include an application processor (Application Processor, AP), a modem processor, a graphics processor (Graphics Processing Unit, GPU), an image signal processor (Image Signal Processor, ISP), a controller, a video codec, a digital signal processor (Digital Signal Processor, DSP), a baseband processor, and/or a Neural network processor (Neural-etwork Processing Unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors. In addition, a memory may be provided in the processor 210 for storing instructions and data.

The USB interface 230 is an interface conforming to the USB standard specification, and may specifically be a MiniUSB interface, a micro USB interface, a USB type c interface, or the like. The USB interface 230 may be used to connect a charger to charge the electronic device 200, or may be used to transfer data between the electronic device 200 and a peripheral device. And can also be used for connecting with a headset, and playing audio through the headset. The interface may also be used to connect other electronic devices, such as AR devices, etc.

The charge management module 240 is configured to receive a charge input from a charger. The charger can be a wireless charger or a wired charger. The power management module 241 is used for connecting the battery 242, the charge management module 240 and the processor 210. The power management module 241 receives input from the battery 242 and/or the charge management module 240 and provides power to the processor 210, the internal memory 221, the display 290, the camera module 291, the wireless communication module 260, and the like.

The wireless communication function of the electronic device 200 may be implemented by the antenna 1, the antenna 2, the mobile communication module 250, the wireless communication module 260, a modem processor, a baseband processor, and the like.

The mobile communication module 250 may provide a solution for wireless communication including 2G/3G/4G/5G, etc., applied on the electronic device 200.

The wireless communication module 260 may provide solutions for wireless communication including wireless local area network (Wireless Local Area Networks, WLAN) (e.g., wireless fidelity (Wireless Fidelity, wi-Fi) network), bluetooth (BT), global navigation satellite system (Global Navigation Satellite System, GNSS), frequency modulation (Frequency Modulation, FM), near field wireless communication technology (Near Field Communication, NFC), infrared technology (IR), etc., as applied on the electronic device 200.

The electronic device 200 implements display functions through a GPU, a display screen 290, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display screen 290 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 210 may include one or more GPUs that execute program instructions to generate or change display information.

The electronic device 200 may implement a photographing function through an ISP, a camera module 291, a video codec, a GPU, a display screen 290, an application processor, and the like. In some embodiments, the electronic device 200 may include 1 or N camera modules 291, where N is a positive integer greater than 1, and if the electronic device 200 includes N cameras, one of the N cameras is a master camera.

Internal memory 221 may be used to store computer executable program code that includes instructions. The internal memory 221 may include a storage program area and a storage data area. The external memory interface 222 may be used to connect an external memory card, such as a Micro SD card, to enable expansion of the memory capabilities of the electronic device 200.

The electronic device 200 may implement audio functions through an audio module 270, a speaker 271, a receiver 272, a microphone 273, a headphone interface 274, an application processor, and the like. Such as music playing, recording, etc.

The audio module 270 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module 270 may also be used to encode and decode audio signals. In some embodiments, the audio module 270 may be disposed in the processor 210, or some functional modules of the audio module 270 may be disposed in the processor 210.

A speaker 271, also called "horn", is used to convert the audio electrical signal into a sound signal. The electronic device 200 may listen to music through the speaker 271 or to hands-free conversation. A receiver 272, also referred to as a "earpiece", is used to convert the audio electrical signal into a sound signal. When the electronic device 200 is answering a telephone call or voice message, the voice can be heard by placing the receiver 272 close to the human ear. A microphone 273, also called "microphone" or "microphone", is used to convert sound signals into electrical signals. When making a call or transmitting voice information, the user can sound near the microphone 273 through the mouth, inputting a sound signal to the microphone 273. The electronic device 200 may be provided with at least one microphone 273. The earphone interface 274 is used to connect a wired earphone.

The depth sensor is used to obtain depth information for a scene for a sensor that the sensor module 280 in the electronic device 200 may include. The pressure sensor is used for sensing a pressure signal and can convert the pressure signal into an electric signal. The gyroscopic sensor may be used to determine a motion pose of the electronic device 200. The air pressure sensor is used for measuring air pressure. The magnetic sensor includes a hall sensor. The electronic device 200 may detect the opening and closing of the flip cover using a magnetic sensor. The acceleration sensor may detect the magnitude of acceleration of the electronic device 200 in various directions (typically three axes). The distance sensor is used to measure distance. The proximity light sensor may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode. The fingerprint sensor is used for collecting fingerprints. The temperature sensor is used for detecting temperature. The touch sensor may communicate the detected touch operation to the application processor to determine the touch event type. Visual output related to touch operations may be provided through display screen 290. The ambient light sensor is used for sensing ambient light brightness. The bone conduction sensor may acquire a vibration signal.

The keys 294 include a power on key, a volume key, etc. The keys 294 may be mechanical keys. Or may be a touch key. The motor 293 may generate a vibratory alert. The motor 293 may be used for incoming call vibration alerting as well as for touch vibration feedback. The indicator 292 may be an indicator light, which may be used to indicate a state of charge, a change in power, a message indicating a missed call, a notification, etc. The SIM card interface 295 is for interfacing with a SIM card. The electronic device 200 interacts with the network through the SIM card to realize functions such as communication and data communication.

The present application also provides a computer-readable storage medium that may be included in the electronic device described in the above embodiments; or may exist alone without being incorporated into the electronic device.

The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable storage medium may transmit, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The computer-readable storage medium carries one or more programs which, when executed by one of the electronic devices, cause the electronic device to implement the methods described in the embodiments below.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware, and the described units may also be provided in a processor. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.

The following will describe an example in which the terminal device executes the image processing scheme of the present disclosure.

Fig. 3 schematically shows a flowchart of an image processing method of an exemplary embodiment of the present disclosure. Referring to fig. 3, the image processing method may include the steps of:

s32, acquiring a two-dimensional image, performing semantic segmentation on the two-dimensional image, and determining a foreground area and a background area of the two-dimensional image.

In the exemplary embodiment of the present disclosure, the two-dimensional image may be an image captured by the camera module of the terminal device, or may be an image obtained from another device or a server, and the format, size, source, and the like of the two-dimensional image are not limited in the present disclosure.

The two-dimensional images may be stored in a two-dimensional album from which a user may select two-dimensional images to be three-dimensionally converted to perform the steps of the disclosed scheme. The terminal device can also classify the two-dimensional images in the album according to time sequence, shooting place and the like, and execute the scheme of converting the two-dimensional images into the three-dimensional images according to the classification.

In other embodiments, each time a two-dimensional image is taken by a terminal device, the terminal device performs the disclosed aspects to obtain a corresponding three-dimensional image.

After the terminal equipment acquires the two-dimensional image to be subjected to three-dimensional conversion, semantic segmentation can be performed on the two-dimensional image. By semantic segmentation, it is meant a classification at the pixel level, classifying pixels belonging to the same class into one class.

According to some embodiments of the present disclosure, semantic segmentation of a two-dimensional image may be implemented using a semantic segmentation model, which may be implemented based on a deep neural network. Firstly, training a semantic segmentation model by using a training data set, then inputting a two-dimensional image into the trained semantic segmentation model, and obtaining a foreground region and a background region of the two-dimensional image according to the output of the model. For example, the foreground region may contain objects corresponding to points of interest of the user, such as humans, animals, automobiles, etc., while the background region corresponds to the background in which humans, animals, automobiles, etc., are located, such as grass, trees, sky, etc.

The present disclosure is not particularly limited to the implementation of semantic segmentation, however, it should be noted that a scheme of applying the concept of semantic segmentation to two-dimensional images into three-dimensional images is all within the present disclosure.

Fig. 4 schematically illustrates an effect diagram of semantic segmentation according to an embodiment of the present disclosure. Referring to fig. 4, after semantic segmentation of a two-dimensional image 40, a background region 41 and a foreground region 42 may be obtained.

S34, determining depth information of the background area.

After the terminal device acquires the two-dimensional image, the depth estimation can also be performed on the two-dimensional image. The depth estimation is to determine depth information of each pixel point on a two-dimensional image.

According to some embodiments of the present disclosure, depth estimation of a two-dimensional image may be implemented using a depth estimation model, which may also be implemented based on a neural network. Firstly, training a depth estimation model by utilizing a large number of images with pixel-level depth labels to obtain a trained depth estimation model; then, the two-dimensional image can be input into a trained depth estimation model, and the result of the depth estimation of the two-dimensional image, namely the depth information of the two-dimensional image, can be obtained according to the output of the model.

It should be noted that the present disclosure does not limit the order of the process of performing depth estimation and the process of performing semantic segmentation at step S32. That is, the process of semantic segmentation may be performed first and then the process of depth estimation may be performed, the process of depth estimation may be performed first and then the process of semantic segmentation may be performed, and the process of semantic segmentation and depth estimation may be performed simultaneously.

After the depth estimation is performed on the two-dimensional image, depth information of the background area may be determined based on the result of the depth estimation.

For example, after determining the background area of the two-dimensional image, the coordinates of the background area may be obtained. Next, depth information of the background region may be determined from the depth information of the two-dimensional image using coordinates of the foreground region.

Similarly, the terminal device may also determine depth information for the foreground region.

Furthermore, in other embodiments of the present disclosure, it may also be identified whether the target object is contained within the foreground region prior to depth estimation of the two-dimensional image. In case the foreground region contains the target object, then a depth estimation is performed on the two-dimensional image. In the case where the foreground region contains the target object, the two-dimensional image is not processed.

The target object may be set in advance by the user, for example, in the case where the user desires to perform three-dimensional image conversion only on an image containing a person (or a specific person such as himself) in the two-dimensional album, the user may set the target object as the person. Specifically, the setting function may be configured in the album, and the user may set the target object by sliding, clicking, hooking, or the like. By adding the setting function in the album, the requirements of different users can be met.

Specifically, for the process of identifying whether the foreground region contains the target object, in the case that the semantic segmentation algorithm can directly determine the type of the object contained in the segmented region, whether the foreground region contains the target object can be directly determined according to the semantic segmentation result.

Under the condition that the semantic segmentation algorithm cannot directly determine the types of the objects contained in the segmented regions, the recognition operation of the foreground region can be additionally executed to obtain a result of whether the foreground region contains the target object. The process of image recognition of the foreground region may also be implemented by a neural network, which is not limited by the present disclosure.

The procedure of determining the depth information of the background area is described above taking the case of depth estimation of a two-dimensional image. However, in other embodiments of the present disclosure, a depth sensor may be configured on the terminal device, and when capturing a two-dimensional image, depth information of the two-dimensional image may be directly obtained through the depth sensor, and further depth information of a background area may be directly determined.

S36, determining pixel information and depth information of a shielding area by using the pixel information and the depth information of a background area; wherein the position of the occlusion region corresponds to the position of the foreground region on the two-dimensional image.

In an exemplary embodiment of the present disclosure, the occlusion region refers to a region where the foreground region occludes the background. The position of the shielding region corresponds to the position of the foreground region on the two-dimensional image, that is, the shielding region may be the missing image region in the two-dimensional image after the foreground region is removed from the two-dimensional image, that is, the position is the position corresponding to the foreground region. Referring to fig. 4, the occlusion area is an area occluded by a puppy.

Under the condition that the mobile terminal determines the pixel information and the depth information of the background area, the pixel information and the depth information of the shielding area can be predicted.

First, feature extraction may be performed on pixel information and depth information of a background area, generating intermediate information. Next, on the one hand, a pixel information prediction process may be performed on the intermediate information to determine pixel information of the occlusion region; on the other hand, a depth information prediction process may be performed on the intermediate information to determine depth information of the occlusion region.

Specifically, the pixel information prediction process may be implemented by one convolutional neural network (Convolutional Neural Networks, CNN) and the depth information prediction process may be implemented by another convolutional neural network.

Referring to fig. 5, first, pixel information and depth information of a background area may be input to the first neural network 51 for feature extraction, generating intermediate information. Specifically, the first neural network 51 may be configured using the VGG16 network, or the first neural network 51 may be configured using one CNN network, which is not limited in the present disclosure.

Next, in one aspect, the intermediate information may be input to a second neural network 52, which may be a CNN network, to predict pixel information of the occlusion region and output the pixel information of the occlusion region.

On the other hand, the intermediate information may be input to the third neural network 53, which may be another CNN network, to predict depth information of the occlusion region and output the depth information of the occlusion region.

The present disclosure is not limited to the network structure and training process of the neural network of fig. 5.

In addition, considering that the depth difference between the foreground area and the background area of some two-dimensional images is small, resources are not required to be consumed for three-dimensional conversion. Thus, before determining the pixel information and the depth information of the occlusion region, a process of determining a depth difference between the foreground region and the background region may also be included.

First, the terminal device may determine depth information of a foreground region; next, determining a depth difference between the foreground region and the background region based on the depth information of the foreground region and the depth information of the background region; the depth difference is then compared to a depth threshold. The depth threshold value may be set in advance, for example, to 10cm, 0.5m, or the like.

If the depth difference is greater than the depth threshold, a process of determining pixel information and depth information of the occlusion region is performed. If the depth difference is not greater than the depth threshold, stopping the processing procedure of the scheme, and feeding back a prompt of 'no proposal for conversion due to smaller depth difference' to the user.

S38, combining pixel information and depth information of the shielding area to generate a three-dimensional image corresponding to the two-dimensional image.

First, depth information of a foreground region may be determined based on a result of depth estimation, and pixel information of the foreground region may be acquired; next, a three-dimensional image corresponding to the two-dimensional image may be generated in combination with the pixel information and the depth information of the occlusion region and the pixel information and the depth information of the foreground region.

In some embodiments of the present disclosure, the three-dimensional image of the present disclosure may be an image of the same size as the two-dimensional image in a two-dimensional plane.

In this case, the process of generating the three-dimensional image needs to use the pixel information and the depth information of the background area in addition to the pixel information and the depth information of the occlusion area and the pixel information and the depth information of the foreground area.

In other embodiments of the present disclosure, the three-dimensional image of the present disclosure may be a three-dimensional image for only a foreground region. With respect to the two-dimensional image as shown in fig. 4, the generated three-dimensional image may be a three-dimensional image including only a puppy and not including a background area.

Specifically, the three-dimensional image of the object corresponding to the foreground region may be generated as the three-dimensional image corresponding to the two-dimensional image by using the pixel information and the depth information of the shielding region and the pixel information and the depth information of the foreground region.

It should be understood that the process of generating a three-dimensional image includes a process of three-dimensional rendering. In addition, since the image is a three-dimensional image, it is possible to map the occlusion relationship between objects (objects) in the image according to the viewing angle, and obtain the viewing effect at different viewing angles according to the occlusion relationship. On this basis, a three-dimensional animation can be generated so that the user views three-dimensional images at different angles.

The entire image processing procedure of the embodiment of the present disclosure will be described below with reference to fig. 6.

In step S602, the terminal device may acquire a two-dimensional image; in step S604, the terminal device may perform semantic segmentation on the two-dimensional image; in step S606, the terminal device may perform depth estimation on the two-dimensional image.

Based on the result of the semantic segmentation of step S604, in step S608, a foreground region may be determined, and in step S610, a background region may be determined. Based on the result of the semantic segmentation in step S606, in step S612, the depth value (i.e., depth information) of each pixel on the two-dimensional image can be determined.

In step S614, pixel estimation and depth estimation may be performed on the occlusion part according to the pixel information of the background area and the depth information of the background area.

For the neural network-based pixel estimation process, in step S616, pixel information of the occlusion part may be determined.

For a depth estimation process based on another neural network, depth information of the occlusion part may be determined in step S618.

In step S620, three-dimensional rendering is performed in combination with the depth information of the occlusion part and the information of the foreground region.

In step S622, the terminal device may output the rendered three-dimensional image. In addition, the three-dimensional animation can be generated for display, the three-dimensional album can be generated based on the three-dimensional images, and specifically, the three-dimensional album can be configured as a cloud album, so that the storage space of the terminal equipment is saved.

It should be noted that although the steps of the methods in the present disclosure are depicted in the accompanying drawings in a particular order, this does not require or imply that the steps must be performed in that particular order, or that all illustrated steps be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.

Further, an image processing apparatus is also provided in the present exemplary embodiment.

Fig. 7 schematically shows a block diagram of an image processing apparatus of an exemplary embodiment of the present disclosure. Referring to fig. 7, the image processing apparatus 7 according to an exemplary embodiment of the present disclosure may include a semantic segmentation module 71, a depth determination module 73, an occlusion information determination module 75, and a three-dimensional image generation module 77.

Specifically, the semantic segmentation module 71 may be configured to acquire a two-dimensional image, perform semantic segmentation on the two-dimensional image, and determine a foreground area and a background area of the two-dimensional image; the depth determination module 73 may be configured to determine depth information of a background region; the occlusion information determining module 75 may be configured to determine pixel information and depth information of an occlusion region using pixel information and depth information of a background region; the position of the shielding area corresponds to the position of the foreground area on the two-dimensional image; the three-dimensional image generation module 77 may be used to combine pixel information and depth information of the occlusion region to generate a three-dimensional image corresponding to the two-dimensional image.

According to the image processing device based on the exemplary embodiment of the disclosure, on one hand, the two-dimensional image can be converted into the three-dimensional image, so that the stereoscopic impression of image display is improved, and the visual effect is improved; on the other hand, aiming at the scene of lively education, the method and the device can fully display the information of the image, so that a user can know the content of the image more easily; in still another aspect, the scheme can be applied to an augmented reality technology or a virtual reality technology to construct application scenes of different types so as to improve the perception degree and participation degree of a user.

According to an example embodiment of the present disclosure, the occlusion information determination module 75 may be configured to perform: extracting characteristics of pixel information and depth information of a background area to generate intermediate information; performing a pixel information prediction process on the intermediate information to determine pixel information of the occlusion region; a depth information prediction process is performed on the intermediate information to determine depth information of the occlusion region.

According to an exemplary embodiment of the present disclosure, referring to fig. 8, the image processing apparatus 8 may further include a depth difference comparing module 81, compared to the image processing apparatus 7.

Specifically, the depth difference comparison module 81 may be configured to perform: determining depth information of a foreground region; determining a depth difference between the foreground region and the background region based on the depth information of the foreground region and the depth information of the background region; comparing the depth difference with a depth threshold; wherein if the depth difference is greater than the depth threshold, the occlusion information determining module 75 is controlled to perform a process of determining pixel information and depth information of the occlusion region.

According to an exemplary embodiment of the present disclosure, the three-dimensional image generation module 77 may be configured to perform: acquiring pixel information and depth information of a foreground region; and combining the pixel information and the depth information of the shielding area and the pixel information and the depth information of the foreground area to generate a three-dimensional image corresponding to the two-dimensional image.

According to an exemplary embodiment of the present disclosure, the process of generating a three-dimensional image by the three-dimensional image generation module 77 may be configured to perform: and generating a three-dimensional image of the object corresponding to the foreground region as a three-dimensional image corresponding to the two-dimensional image by using the pixel information and the depth information of the shielding region and the pixel information and the depth information of the foreground region.

According to an exemplary embodiment of the present disclosure, the process of generating a three-dimensional image by the three-dimensional image generation module 77 may be further configured to perform: and generating a three-dimensional image corresponding to the two-dimensional image by using the pixel information and the depth information of the shielding region, the pixel information and the depth information of the foreground region and the pixel information and the depth information of the background region.

According to an example embodiment of the present disclosure, the depth determination module 73 may be configured to perform: and carrying out depth estimation on the two-dimensional image, and determining depth information of the background area based on the result of the depth estimation.

According to an exemplary embodiment of the present disclosure, referring to fig. 9, the image processing apparatus 9 may further include an object recognition module 91, compared to the image processing apparatus 7.

Specifically, the object recognition module 91 may be configured to perform: identifying whether a target object is contained in the foreground region; wherein if the foreground region contains a target object, the control depth determination module 73 performs a process of depth estimation of the two-dimensional image.

It should be understood that the object recognition module 91 may also be configured in the image processing apparatus 8 described above. Similarly, the depth difference comparing module 81 included in the image processing apparatus 8 may also be configured in the image processing apparatus 9.

Since each functional module of the image processing apparatus according to the embodiment of the present disclosure is the same as that of the above-described method embodiment, a detailed description thereof will be omitted.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.

Furthermore, the above-described figures are only schematic illustrations of processes included in the method according to the exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An image processing method, comprising:

acquiring a two-dimensional image, performing semantic segmentation on the two-dimensional image, and determining a foreground area and a background area of the two-dimensional image;

determining depth information of the background area;

determining pixel information and depth information of a shielding area by using the pixel information and the depth information of the background area; wherein the position of the shielding area corresponds to the position of the foreground area on the two-dimensional image;

and combining the pixel information and the depth information of the shielding area to generate a three-dimensional image corresponding to the two-dimensional image.

2. The image processing method according to claim 1, wherein determining pixel information and depth information of an occlusion region using the pixel information and depth information of the background region, comprises:

extracting features of the pixel information and the depth information of the background area to generate intermediate information;

performing a pixel information prediction process on the intermediate information to determine pixel information of the occlusion region;

a depth information prediction process is performed on the intermediate information to determine depth information of the occlusion region.

3. The image processing method according to claim 2, characterized in that before determining the pixel information and the depth information of the occlusion region, the image processing method further comprises:

Determining depth information of the foreground region;

determining a depth difference between the foreground region and the background region based on the depth information of the foreground region and the depth information of the background region;

comparing the depth difference to a depth threshold;

wherein if the depth difference is greater than the depth threshold, a process of determining pixel information and depth information of an occlusion region is performed.

4. The image processing method according to claim 1, wherein generating a three-dimensional image corresponding to the two-dimensional image in combination with the pixel information and the depth information of the occlusion region includes:

acquiring pixel information and depth information of the foreground region;

and combining the pixel information and the depth information of the shielding area and the pixel information and the depth information of the foreground area to generate a three-dimensional image corresponding to the two-dimensional image.

5. The image processing method according to claim 4, wherein generating a three-dimensional image corresponding to the two-dimensional image in combination with the pixel information and the depth information of the occlusion region and the pixel information and the depth information of the foreground region includes:

and generating a three-dimensional image of the object corresponding to the foreground region by using the pixel information and the depth information of the shielding region and the pixel information and the depth information of the foreground region, and using the three-dimensional image as a three-dimensional image corresponding to the two-dimensional image.

6. The image processing method according to claim 4, wherein generating a three-dimensional image corresponding to the two-dimensional image in combination with the pixel information and the depth information of the occlusion region and the pixel information and the depth information of the foreground region includes:

and generating a three-dimensional image corresponding to the two-dimensional image by using the pixel information and the depth information of the shielding area, the pixel information and the depth information of the foreground area and the pixel information and the depth information of the background area.

7. The image processing method according to claim 1, wherein determining depth information of the background area includes:

and carrying out depth estimation on the two-dimensional image, and determining the depth information of the background area based on the result of the depth estimation.

8. The image processing method according to claim 7, characterized in that before the depth estimation of the two-dimensional image, the image processing method further comprises:

identifying whether a target object is contained within the foreground region;

and if the foreground region contains a target object, performing depth estimation on the two-dimensional image.

9. An image processing apparatus, comprising:

The semantic segmentation module is used for acquiring a two-dimensional image, carrying out semantic segmentation on the two-dimensional image and determining a foreground area and a background area of the two-dimensional image;

the depth determining module is used for determining the depth information of the background area;

the shielding information determining module is used for determining the pixel information and the depth information of the shielding area by utilizing the pixel information and the depth information of the background area; wherein the position of the shielding area corresponds to the position of the foreground area on the two-dimensional image;

and the three-dimensional image generation module is used for combining the pixel information and the depth information of the shielding area to generate a three-dimensional image corresponding to the two-dimensional image.

10. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the image processing method according to any one of claims 1 to 8.

11. An electronic device, comprising:

a processor;

a memory for storing one or more programs that, when executed by the processor, cause the processor to implement the image processing method of any of claims 1 to 8.