CN114391259A

CN114391259A - Information processing method, terminal device and storage medium

Info

Publication number: CN114391259A
Application number: CN201980100362.9A
Authority: CN
Inventors: 贾玉虎
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2019-11-06
Filing date: 2019-11-06
Publication date: 2022-04-22
Also published as: WO2021087819A1

Abstract

The invention discloses an information processing method, which comprises the following steps: acquiring original depth information corresponding to the depth information under the condition that the depth information of a target object is acquired through a depth information sensor, wherein the original depth information represents the acquisition state of the depth information acquired by the depth information sensor or information except the acquired depth information; acquiring video image data of the target object through an image sensor; and merging and coding the original depth information and the video image data to obtain a video image code stream, and outputting the video image code stream. The invention also discloses another information processing method, terminal equipment and a storage medium.

Description

Information processing method, terminal device and storage medium

Technical Field

The present invention relates to computer technologies, and in particular, to an information processing method, a terminal device, and a storage medium.

Background

In the current society, more and more terminals are provided with camera devices, so that users can take pictures or take videos at any time and any place conveniently. In practical applications, the encoding side acquires depth information Of a target object by using a depth information sensor such as a Time Of Flight (TOF) camera or a binocular camera through an existing imaging device, and the decoding side recovers a depth image Of the target object by using the depth information. However, the depth image provides only depth information of the target object, and cannot improve the image quality of the video image of the target object.

Disclosure of Invention

Embodiments of the present invention provide an information processing method, a terminal device, and a storage medium, which can improve image quality of a video image of a target object.

In a first aspect, an embodiment of the present invention provides an information processing method, including:

acquiring original depth information corresponding to the depth information under the condition that the depth information of a target object is acquired through a depth information sensor, wherein the original depth information represents the acquisition state of the depth information acquired by the depth information sensor or information except the acquired depth information;

acquiring video image data of the target object through an image sensor;

and merging and coding the original depth information and the video image data to obtain a video image code stream, and outputting the video image code stream.

In a second aspect, an embodiment of the present invention provides an information processing method, including:

receiving a video image code stream, wherein the video image code stream is obtained by merging and coding original depth information and video image data, the original depth information is obtained under the condition that a depth information sensor obtains depth information of a target object, the video image data is of the target object obtained through an image sensor, and the original depth information represents the acquisition state of the depth information acquired by the depth information sensor or information except the acquired depth information;

decoding the video image code stream to obtain the original depth information and a video image corresponding to the video image data;

and carrying out image processing on the original depth information and the video image to obtain a target video image.

In a third aspect, an embodiment of the present invention provides a terminal device, including:

the first acquisition unit is configured to acquire original depth information corresponding to the depth information under the condition that the depth information of a target object is acquired through the depth information sensing unit, wherein the original depth information represents the acquisition state of the depth information acquired by the depth information sensing unit or information except the acquired depth information;

a second acquisition unit configured to acquire video image data of the target object by an image sensing unit;

the encoding unit is configured to carry out merging encoding on the original depth information and the video image data to obtain a video image code stream;

and the output unit is configured to output the video image code stream.

In a fourth aspect, an embodiment of the present invention provides a terminal device, including:

the video image code stream is obtained by merging and coding original depth information and video image data, the original depth information is obtained under the condition that the depth information of a target object is obtained through a depth information sensing unit, and the video image data is of the target object obtained through an image sensing unit; the original depth information represents the acquisition state of the depth information acquired by the depth information sensing unit or information except the acquired depth information;

the decoding unit is configured to decode the video image code stream to obtain the original depth information and a video image corresponding to the video image data;

and the image processing unit is configured to perform image processing on the original depth information and the video image to obtain a target video image.

In a fifth aspect, an embodiment of the present invention provides a terminal device, including a processor and a memory configured to store a computer program that is executable on the processor, where the processor is configured to execute, when executing the computer program, the steps of the information processing method executed by the terminal device.

In a sixth aspect, an embodiment of the present invention provides a storage medium, which stores an executable program, and when the executable program is executed by a processor, the information processing method executed by the terminal device is implemented.

The information processing method provided by the embodiment of the invention comprises the following steps: at a coding end, acquiring original depth information corresponding to depth information under the condition that the depth information of a target object is acquired through a depth information sensor; acquiring video image data of the target object through an image sensor; and merging and coding the original depth information and the video image data to obtain a video image code stream, and outputting the video image code stream. Receiving a video image code stream at a decoding end; decoding the video image code stream to obtain the original depth information and a video image corresponding to the video image data; and carrying out image processing on the original depth information and the video image to obtain a target video image. Therefore, the original depth information obtained by the depth sensor is directly written into the video image code stream at the encoding end, and is analyzed at the decoding end to obtain the video image obtained by the original depth information on the image data collected by the image sensor, so that the target video image is obtained, the quality of the video image is improved, and more real image video experience is brought to a user.

Drawings

FIG. 1A is a schematic diagram of an alternative configuration of an information handling system according to an embodiment of the present invention;

FIG. 1B is a schematic diagram of an alternative structure of an encoding end according to an embodiment of the present invention;

FIG. 1C is a schematic diagram of an alternative structure of a decoding end according to an embodiment of the present invention

FIG. 2 is a schematic view of an alternative processing flow of an information processing method according to an embodiment of the present invention;

FIG. 3 is a schematic view of an alternative processing flow of an information processing method according to an embodiment of the present invention;

FIG. 4 is a schematic view of an alternative processing flow of an information processing method according to an embodiment of the present invention;

FIG. 5 is a schematic view of an alternative processing flow of an information processing method according to an embodiment of the present invention;

FIG. 6 is a schematic view of an alternative processing flow of an information processing method according to an embodiment of the present invention;

FIG. 7 is a schematic view of an alternative processing flow of an information processing method according to an embodiment of the present invention;

FIG. 8A is a block diagram of an alternative embodiment of an information handling system;

FIG. 8B is a block diagram of an alternative embodiment of an information handling system;

FIG. 9A is a block diagram of an alternative embodiment of a decoding end;

FIG. 9B is a block diagram of an alternative embodiment of a decoding end;

FIG. 9C is a block diagram of an alternative decoding end according to an embodiment of the present invention;

FIG. 9D is a block diagram of an alternative embodiment of a decoder according to the present invention;

FIG. 10 is a schematic diagram of sampling original depth information according to an embodiment of the present invention;

fig. 11 is an alternative structural diagram of a terminal device in which the present invention is implemented;

fig. 12 is an alternative structural diagram of a terminal device according to an embodiment of the present invention;

fig. 13 is an alternative structural schematic diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

So that the manner in which the features and technical contents of the embodiments of the present invention can be understood in detail, a more particular description of the embodiments of the present invention will be rendered by reference to the appended drawings, which are included for purposes of illustration and not limitation.

Before describing the information processing method provided by the embodiment of the present invention in detail, a depth image process is described.

Depth images (depth images), also called range images, are images that have the distance (depth) from an image sensor to each point in a scene as a pixel value, and can directly reflect the geometric shape of the visible surface of a target object. The depth image can be calculated into point cloud data through coordinate conversion, and the point cloud data with regular and necessary information can also be inversely calculated into depth image data.

Here, the encoding end performs video encoding on the depth image captured and formed by the depth information sensor to obtain encoded depth image information, and the decoder end can only restore the depth image according to the encoded depth image information. However, the amount of information received by the depth information sensor far exceeds the amount of information of the depth image. These huge amounts of information are discarded as redundancy after the depth image is generated. Therefore, in the above scheme, other roles of these redundant information, such as image enhancement at the decoding end, etc., are not considered.

In view of the above, embodiments of the present invention provide an information processing method, which can be applied to an information processing system,

for example, an information processing system 100 to which an embodiment of the present invention is applied may be as shown in fig. 1A. The information processing 100 may include an encoding side 101 and a decoding side 102. The encoding terminal 101 is configured to acquire video image data and original depth information, encode the video image data and the original depth information, and form a video image code stream. The decoding end 120 is configured to decode the image video code stream to obtain video image data and original depth information, and perform image processing on the video image data and the original depth information to obtain a target video image.

Encoding end 101 and decoding end 102 may include handsets including desktop computers, mobile computing devices, notebook (e.g., laptop) computers, tablet computers, set-top boxes, smart phones, televisions, cameras, display devices, digital media players, video game consoles, in-vehicle computers, or the like.

As shown in fig. 1A, the decoding end 102 may receive the encoded video image code stream from the encoding end 101 via the link 103. Link 103 may include one or more media and/or devices capable of moving a video image bitstream from encoding end 101 to decoding end 102.

In an example, the link 103 can include one or more communication media that enable the encoding end 101 to transmit encoded video data directly to the decoding end 102 in real-time. In this example, the encoding end 101 may modulate the video image code stream according to a communication standard (e.g., a wireless communication protocol), and may send the modulated video image code stream to the decoding end 102.

In one example, the link 103 may comprise a storage medium storing a video image bitstream formed by the encoding end 101. In this example, the decode end 102 may access the storage medium via disk access or card access. The storage medium may comprise a variety of locally-accessed data storage media such as blu-ray discs, DVDs, CD-ROMs, flash memory, or other suitable digital storage media for storing a bitstream of video images.

In yet another example, the link 103 may comprise a file server or another intermediate storage device that stores the video image codestream formed by the encoding end 101. In this example, the decoding end 102 may access a video image codestream stored at a file server or other intermediate storage device via streaming or download. The file server may be of a type capable of storing the video image bitstream and transmitting the video image bitstream to the decoding end 102. File servers include web servers (e.g., for web sites), file transfer protocol servers, network attached storage, and local disk drives, among others.

The decoding end 102 may access the video image codestream via a standard data connection, such as an internet connection. Example types of data connections include a wireless link (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of both, suitable for accessing a stream of video images stored on a file server.

As shown in fig. 1B, the encoding end 101 includes: the depth information sensor 1011 is used for acquiring original depth information, the image sensor 1012 is used for acquiring video image data, and the video image encoder 1013 is used for encoding the original depth information and the video image data to form a video image code stream.

As shown in fig. 1C, the decoding end 102 includes: the video image decoder 1021 is used for decoding a video image code stream to obtain video images corresponding to the original depth information and the video image data, and the image processor 1022 is used for processing the original depth information and the video images to obtain target video images. Here, the original depth information is applied to the video image, and a high-quality video image with high definition, low noise, and the like can be obtained.

In one example, as shown in fig. 1C, the decoding end 102 further includes: a depth image generator 1023 for generating a depth image based on the original depth information.

An optional processing flow of the information processing method provided in the embodiment of the present invention is applied to an encoding end, and as shown in fig. 2, includes the following steps:

s201, under the condition that the depth information of the target object is acquired through the depth information sensor, acquiring original depth information corresponding to the depth information.

The original depth information represents the acquisition state of the depth information acquired by the depth information sensor or information except the acquired depth information.

The depth information sensor is a sensor capable of acquiring depth information of a target object. In one example, the depth information sensor is a TOF module that employs a TOF ranging method. In one example, the depth information sensor is a binocular camera.

In the embodiment of the invention, a coding end acquires original depth information through a depth information sensor under the condition that the depth information sensor acquires the depth information, wherein the original depth information comprises at least one of the following information: charge information, phase information, and attribute parameters of the depth information sensor. The charge information and the phase information are information except the depth information acquired by the depth information sensor, and the attribute parameters of the depth information sensor represent the acquisition state of the depth information sensor for acquiring the depth information.

Taking the original depth information as the charge information, the charge information at a time point can be embodied as a charge image. Here, the depth information sensor acquires a light signal received when the depth information sensor acquires depth information, and converts the light signal into an electric signal through photoelectric conversion, and the electric signal is quantized to generate a charge image.

Taking the original depth information as the phase information, the phase information at a time point can be embodied as a phase image.

Taking the original depth information as the attribute parameter of the depth information sensor as an example, the original depth information may include: temperature, pose and other attribute parameters.

S202, acquiring video image data of the target object through an image sensor.

The encoding end acquires video image data of a target object through an image sensor in an image preview or video shooting scene, wherein the video image data comprises at least one frame of image frame.

In the embodiment of the invention, the original depth information corresponds to the video frames one by one. In one example, different charge images or phase images correspond to different image frames, respectively.

S203, merging and coding the original depth information and the video image data to obtain a video image code stream, and outputting the video image code stream.

The encoding end carries out merging encoding on the original depth information and the video image data through a video image encoder, the video image encoder outputs a video image code stream, and outputs the video image code stream output by the video image encoder to the decoding end, so that the decoding end carries out image processing on a video image corresponding to the video image data based on the original depth information.

Optionally, the video image encoder encodes the video image frame or the original depth information by using a video image encoding and decoding protocol to obtain video image code stream information; the video codec protocol may be h.264, h.265, h.266, VP9, AV1, or the like.

Optionally, the original depth information and the image video data are encoded by using a video image coding and decoding protocol, and at this time, the data carried by the video image information does not include the depth information.

In the embodiment of the present invention, when the data carried by the video image information does not include depth information, the encoding end may only obtain the original depth information of the depth information sensor when obtaining the depth information, and does not obtain the depth information acquired by the depth information sensor, or discards the acquired depth information.

Optionally, a video image coding and decoding protocol is used to code the original depth information and the image video data, and the video image coding and decoding protocol is used to code the depth information collected by the depth information sensor, at this time, the data carried by the video image information includes: raw depth information, and video image data.

In the embodiment of the present invention, the processing of the depth information acquired by the depth information sensor is not limited at all.

Optionally, the video image encoder encodes the video image frame or the original depth information by using an industry standard or a specific standard of a specific organization to obtain a video image code stream.

The encoding end can input all the original depth information into the video image encoder so as to encode all the original depth information, or can input only part of the original depth information into the video image encoder so as to encode part of the original depth information. Optionally, the partial original depth information is original depth information corresponding to the specified image frame. Optionally, the partial original depth information is original depth information corresponding to a specified image position.

Taking part of original depth information as original depth information corresponding to a specified image video as an example, the merging and encoding the original depth information and the video image data to obtain a video image code stream includes: and carrying out merging coding on the original depth information corresponding to the specified image frame in the image frame corresponding to the video image data and the video image data to obtain a video image code stream.

Optionally, the image frame is designated as one of the image frames corresponding to the video image data. Optionally, the designated image frame includes a plurality of image frames of the image frames corresponding to the video image data.

The embodiment of the invention does not set any limit to the number of the designated image frames.

And the coding end only performs merging coding on the original depth information corresponding to the specified image frame and the video image data, and does not perform coding on the original depth information corresponding to non-specified video frames except the specified image frame in the image frame corresponding to the video image data.

Taking part of the original depth information as the original depth information corresponding to the designated image position as an example, the merging and encoding the original depth information and the video image data to obtain a video image code stream includes: and merging and coding the original depth information corresponding to the specified image position and the video image data to obtain a video image code stream.

The designated image position is the position of the designated point in the image acquisition range. Optionally, the designated image position is a position where the designated area is located within the image acquisition range. The embodiment of the invention does not limit the range size or the position of the designated image position at all.

And the encoding end only carries out merging encoding on the original depth information corresponding to the specified image position and the video image data, and does not carry out encoding on the original depth information corresponding to the non-specified video position except the specified image position in the image frame.

In this embodiment of the present invention, a coding method for performing merging coding on the original depth information and the video image data by a coding end includes one of the following:

according to the correlation between the original depth information and the video image data, performing mixed coding on the original depth information and the video image data to obtain a video image code stream;

and a second coding mode, independently coding the original depth information and the video image data respectively to obtain an image video code stream comprising a first code stream and a second code stream, wherein the first code stream is a code stream obtained after the original depth information is coded, and the second code stream is a code stream obtained after the image video data is coded.

In the first encoding mode, the encoding and decoding protocols used for encoding the original depth information and the video image data are the same.

Optionally, in the first encoding mode, the encoding information in the video image code stream is mixed encoding information obtained by jointly encoding the original depth information and the video image data. The video image encoder may jointly encode the original depth information and the video image data by using spatial correlation or temporal correlation between the original depth information and the image video data.

Optionally, in the first encoding mode, the first encoding information corresponding to the original depth information is written into a specified position of the second encoding information corresponding to the video image data. Alternatively, the designated location may be a picture header, a sequence header, an additional parameter set, or any other location.

Optionally, in the first encoding mode, the original depth information is encoded by using spatial correlation or temporal correlation between the original depth information and image video data, so as to obtain first encoding information, the video image data is encoded, so as to obtain second encoding information, and the first encoding information is written into a specified position of the first encoding information, so as to obtain a video image code stream.

In the second encoding mode, the encoding and decoding protocol used for encoding the original depth information is independent of the encoding and decoding protocol used for encoding the video image data. Optionally, the original depth information is encoded using the same codec protocol as used for encoding the video image data. Optionally, the original depth information is encoded using a different codec protocol than the codec protocol used to encode the video image data.

In one embodiment, as shown in fig. 3, before S203, the method includes:

204A, preprocessing the original depth information.

Merging and encoding the original depth information and the video image data in S203, which may be performed as S203A: and merging and coding the preprocessed original depth information and the video image data to obtain a video image code stream.

In the embodiment of the present invention, the preprocessing may be one or two processing modes of phase calibration such as filtering, denoising, signal amplification, and the like, and may also be other processing modes, and the specific preprocessing may be determined according to an actual situation, which is not limited in the embodiment of the present invention.

Optionally, the encoding end performs preprocessing on the raw depth information through a depth information sensor.

In one embodiment, as shown in fig. 4, before S203, the method includes:

204B, performing redundancy elimination processing on the original depth information to eliminate redundant information in the original depth information.

Merging and encoding the original depth information and the video image data in S203, which may be performed as S203B: and merging and coding the original depth information subjected to redundancy elimination and the video image data to obtain a video image code stream.

The coding end can eliminate redundant information in the original depth information by carrying out redundant elimination processing on the original depth information, thereby compressing the information quantity of the original depth information and reducing the size of a video data code stream.

In this embodiment of the present invention, the performing redundancy elimination processing on the original depth information includes at least one of:

performing redundant elimination processing on the original depth information based on phase correlation;

performing redundancy elimination processing on the original depth information based on spatial correlation;

performing redundancy elimination processing on the original depth information based on time correlation;

performing redundancy elimination processing on the original depth information based on the specified depth;

performing redundancy elimination processing on the original depth information based on frequency domain correlation;

and performing redundancy elimination processing on the coded bits of the original depth information based on the correlation between the coded binary data.

Optionally, the original depth information is converted into a frequency domain, and the original depth information converted into the frequency domain is subjected to redundancy elimination processing based on frequency domain correlation.

Optionally, the specified depth is a range of a scene-sensitive depth where the target object is located, the original depth information is subjected to redundancy elimination processing based on the specified depth, and original depth information corresponding to a depth outside the range of the scene-sensitive depth is eliminated as redundancy.

Optionally, the original depth information is entropy-encoded, and redundancy elimination processing is performed on the coded bits of the entropy-encoded result of the original depth information based on the correlation between the coded binary data.

Taking redundancy elimination processing on the original depth information based on spatial correlation as an example, the original depth information of the encoding end corresponds to at least one viewpoint; determining an interval viewpoint from at least one viewpoint, and taking original depth information corresponding to the interval viewpoint as interval original depth information; and eliminating the original depth information except the interval original depth in the original depth information as redundancy, and carrying out merging coding on the interval original depth information and the video image data to obtain a video data code stream.

Taking the example of performing redundancy elimination processing on the original depth information based on time correlation, a coding end acquires the original depth information within one end of time, samples the acquired original depth information based on a sampling interval, retains the sampled original depth information, eliminates original depth information except the sampled original depth information in the acquired original depth information as redundancy, and performs merging coding on the sampled original depth information and video image data to obtain a video data code stream.

An optional processing flow of the information processing method provided in the embodiment of the present invention is applied to a decoding end, and as shown in fig. 5, includes the following steps:

s501, receiving a video image code stream.

And the decoding end receives the video image code stream sent by the encoding end through the link. The video image code stream is obtained by merging and coding original depth information and video image data, wherein the original depth information is obtained under the condition that a depth information sensor obtains the depth information of a target object, and the video image data is obtained from the target object through an image sensor; the original depth information represents the acquisition state of the depth information acquired by the depth information sensor or information except the acquired depth information.

S502, decoding the video image code stream to obtain the original depth information and the video image corresponding to the video image data.

Here, the video image code stream is decoded by a video image decoder to obtain the original depth information and a video image corresponding to the video image data.

And the decoding end sends the received video image code stream to a video image decoder, and the video image decoder decodes the video image code stream.

Optionally, the video image decoder and the video encoder at the encoding end support the same video image codec protocol.

Alternatively, in the case where the video image encoder performs hybrid encoding of the original depth information and the video image data,

and the video image decoder performs mixed decoding on the video image code stream to obtain the original depth information and the video image corresponding to the video image data.

Optionally, when the video image encoder independently encodes the original depth information and the video image data, the video image decoder independently decodes a first code stream and a second code stream in the video image code stream, decodes the first code stream of the video image data to obtain the original depth information, and decodes the second code stream to obtain the video image corresponding to the video image data. Here, a video image corresponding to the video image data may also be referred to as an original video image. The original video image obtained by decoding the image video code stream may include one or more frames of original video images.

S503, carrying out image processing on the original depth information and the video image to obtain a target video image.

And carrying out image processing on the original depth information and the video image through an image processor to obtain a target video image.

After the decoding end decodes to obtain the original depth information and the video image data, the image processor acts the original depth information on the video image to process the video image to obtain the target video image. The image quality of the target video image is higher than that of the original video image.

Optionally, the decoding end may perform redundancy restoration on original depth information obtained by decoding based on phase correlation, spatial correlation, temporal correlation, specified depth, frequency domain correlation, and correlation between coded binary data to obtain original depth information after redundancy restoration, and perform image processing on a video image based on the original depth information after redundancy restoration to obtain a target video image.

Taking the original depth information obtained by decoding based on spatial correlation and performing redundancy restoration to obtain the original depth information after the redundancy restoration as an example, the decoding end performs independent decoding or mixed decoding on a video image code stream to obtain the original depth information of the interval viewpoints and the video image of at least one viewpoint; performing difference on the original depth information of the interval viewpoints to obtain the original depth information of other viewpoints except the interval viewpoints in at least one viewpoint; and carrying out image processing on the video image by using the original depth information of the interval view points and the original depth information of other view points to obtain a target video image.

Taking the example of performing redundancy restoration on original depth information obtained by decoding based on time correlation to obtain original depth information after redundancy restoration, a decoding end performs independent decoding or mixed decoding on a video image code stream to obtain sampled original depth information, restores original depth information between adjacent sampled original depth information based on time-adjacent sampled original depth information, and performs image processing on a video image by using the decoded original depth information and the restored original depth information to obtain a target video image.

Optionally, the video image decoder and the image processor are independent of each other. Optionally, the image processor is integrated within the video image decoder.

In an example, taking the original depth information as charge information as an example, the performing image processing on the original depth information and the video image to obtain a target video image includes: and denoising or white balance adjustment is carried out on the video image according to the original depth information to obtain the target video image.

In an example, taking the original depth information as phase information as an example, the performing image processing on the original depth information and the video image to obtain a target video image includes: and deblurring the video image according to the original depth information to obtain the target video image.

And an image processor in the decoding end analyzes each phase information to obtain an analysis result, and deblurrs the video frame corresponding to the analysis result by using the analysis result to obtain a target video image.

In an example, in a High Dynamic Range (HDR) video, each frame of HDR image is obtained by fusing 1 long exposure image and 1 short exposure image, and at the current time, for the same scene, the image sensor is controlled to capture the long exposure image and the short exposure image, and the depth information sensor is controlled to capture a phase image, and the phase image is used as original depth information; carrying out mixed coding or independent coding on the phase image and the long exposure image, and carrying out mixed coding or independent coding on the phase image and the short exposure image to obtain a video image code stream; and outputting the video image code stream to a decoding end; decoding the long exposure image, the short exposure image and the phase image from the video image code stream by a decoding end; then, the phase image is utilized to deblur the long exposure image and the short exposure image respectively to obtain a deblurred long exposure image and a deblurred short exposure image; and fusing the deblurred long exposure image and the deblurred short exposure image to obtain a clearer HDR image.

As shown in fig. 6, after S502, the method further includes:

s504, restoring the original depth information to obtain a depth image.

Optionally, the original depth information is restored by a depth image generator, so as to obtain the depth image.

It should be noted that, in the embodiment of the present invention, in fig. 6, taking the example that S504 is located after S503, the sequence of obtaining the target video image and obtaining the depth image is exemplarily described, and in practical application, the execution of S504 and S503 is not in sequence.

Optionally, the depth image generator is independent of the video image decoder. Optionally, the depth image generator is integrated within the video image decoder.

In one example, the video image decoder, the depth image generator, and the image processor are independent of each other, and at this time, the video image code stream is input to the video image decoder, the video image decoder outputs the original depth information and the video image, and the original depth information and the video image are input to the image processor, the original depth information is input to the depth image generator, the image processor outputs the target video image, and the depth image generator outputs the depth image.

In one example, the depth image generator and the image processor are integrated in a video image decoder, at which time a video image bitstream is input to the video image decoder, which outputs the target video image and the depth image.

In one example, the depth image generator is integrated within a video image decoder, the image processor and the video image decoder are independent of each other, and at this time, a video image code stream is input to the video image decoder, the video image decoder outputs original depth information and a target video image, and the original depth information is input to the depth image generator, and the depth image generator outputs a depth image.

In one example, the image processor is integrated within a video image decoder, and the depth image generator is independent of the video image decoder, and at this time, the video image code stream is input to the video image decoder, and the video image decoder outputs the original depth information, the video image, and the depth image, and the original depth information and the video image are input to the image processor, and the image processor outputs the target video image.

An embodiment of the present invention further provides an information processing method, which is applied to an information processing system including an encoding end and a decoding end, and as shown in fig. 7, the information processing method includes:

s701, the encoding end acquires original depth information corresponding to the depth information under the condition that the depth information sensor acquires the depth information of the target object.

The original depth information represents the acquisition state of the depth information acquired by the depth information sensor or information except the acquired depth information;

s702, the encoding end acquires video image data of a target object through an image sensor;

and S703, the coding end performs merging coding on the original depth information and the video image data to obtain a video image code stream, and outputs the video image code stream.

S704, the decoding end receives the video image code stream.

S705, the decoding end decodes the video image code stream to obtain the original depth information and the video image corresponding to the video image data.

And S706, the decoding end performs image processing on the original depth information and the video image to obtain a target video image.

In the embodiment of the invention, the decoding end receives the video image code stream comprising the coding information of the original depth information and the coding information of the image video information, so that the decoding end can decode the original depth information and the video image from the video image code stream, and further, the decoding end can recover the depth image by utilizing the original depth information and can also perform optimized processing such as denoising, white balance adjustment and deblurring on the video image by utilizing the original depth information, the information utilization rate is improved, and the image quality of the target video image obtained after the optimized processing is higher than that of the original video image.

The information processing method provided by the embodiment of the present invention is illustrated by a scene example.

The framework of the information system of the present invention is shown in fig. 8A and 8B. The video image encoder 1013 performs merging encoding on the original depth information 801 acquired by the depth information sensor 1011 and the video image data 802 acquired by the image sensor 1012 to form a video image code stream 803; after the video image code stream 803 is obtained by the video image decoder 1021, the video image code stream 803 is parsed to obtain the original depth information 804 and the video image 805, the depth image generator 1023 restores the original depth information 804 to obtain the depth image 806, and the image processor 1022 processes the video image 805 obtained by the video image decoder 1021 through the original depth information 804 to obtain the target video image 807. Wherein, the depth image generator 1023, the image processor 1022 and the video image decoder 1021 may be independent, as shown in fig. 8A; the depth image generator 1023 and the image processor 1022 may also be included as a part of the video image decoder 1021, as shown in fig. 8B.

The original depth information output by the depth information sensor can be initial data information acquired by the depth information sensor, namely, the original depth information which is not preprocessed, or intermediate data information acquired by preprocessing the initial data information, namely, the original depth information which is preprocessed; when the output information is the initial data information, the output information can be electric signals after photoelectric conversion, such as charge information or phase information; when the output information is intermediate data information, the output information may be intermediate video image data that can generate a depth image after performing phase alignment or other processing on the first data information.

And the video image encoder encodes the input original depth information to form a video image code stream. Wherein, the coding mode includes:

the encoding method 1 mixes the video image data and the depth original information for encoding by utilizing the correlation of the video image data and the depth original information;

and 2, independently coding the video image data and the depth original information respectively.

In the encoding method 1, the encoding information of the original depth information is located at an arbitrary position such as a header, a sequence header, an additional parameter set, or the like of the encoding information of the video image data.

In the encoding method 2, the original depth information itself is separately encoded by using other correlations such as spatial correlation and temporal correlation of the original depth information.

In the video image encoder, the original depth information corresponding to each video image may be encoded, or only the original depth information corresponding to the designated image or designated image position may be encoded, and the original depth information corresponding to other non-designated images or non-designated image positions may not be encoded.

For the image processor, in a shooting or previewing scene, for the generation of the depth of field, the original depth information can be directly utilized to act on the video image to form a target video image with the depth of field, and the target video image with the depth of field is generated without overlapping the depth image and the video image.

In the process of encoding the original depth information by the encoding end, in order to compress the data volume, the redundancy can be eliminated by using, but not limited to, the following correlation:

1. if the original depth information comprises phase information of a plurality of video images, eliminating phase data redundancy by utilizing correlation among phases; if the original depth information is other data, eliminating data redundancy by utilizing the spatial correlation and other correlations among the data;

2. eliminating data redundancy by utilizing the time correlation of the original depth information;

3. eliminating scene-based data redundancy with a specified depth;

4. converting the original depth information into a frequency domain, and eliminating data redundancy of the frequency domain by utilizing frequency domain correlation;

5. eliminating coded bit redundancy by utilizing correlation between coded binary data; here, the encoding may be entropy encoding.

In the embodiment of the invention, in the video image code stream containing the original depth information formed by the video image encoder, the original depth information and the video image data can be independently decoded, namely the video image code stream has decouplability or independence, so that a video image decoder adopting various video image standard coding and decoding protocols can only extract a video image from the video image code stream without extracting the original depth information or extracting the original depth information without extracting the video image.

As shown in fig. 9A to 9D, for the video image decoder, the depth image generator, and the image processor, the three cooperate with each other to decode the video image code stream according to the video image standard encoding and decoding protocol, and generate the processed image and the original depth information; the video image standard coding and decoding protocol can be a proprietary standard customized by a manufacturer and can also be an industry standard. The video image decoder, the depth image generator and the image processor are formed by the following three modes:

in the composition mode 1, as shown in fig. 9A, the video image decoder 1021, the depth image generator 1023 and the image processor 1022 are independent of each other, and after the video image decoder 1021 analyzes the video image code stream 803 to obtain the video image 805 and the original depth information 804, the original depth information 804 is sent to the depth image generator 1023 to generate the depth image 806, and the video image 805 and the original depth information 804 are sent to the image processor 1022 to generate the processed target video image 807;

as shown in fig. 9B, in configuration mode 2, the depth image generator 1023 and the image processor 1022 are embedded inside the video image decoder 1021, the video image stream 803 is processed inside the video image decoder 1021, and the depth image 806 and the processed target video image 807 are directly output.

In the composition mode 3, as shown in fig. 9C, the depth image generator 1023 is embedded inside the video image decoder 1021, the video image code stream 803 is processed inside the video image decoder 1021, the depth image 806 and the video image 805 are output, the video image 805 and the original depth information 804 are sent to the image processor 1022, and the processed target video image 807 is output;

in the composition mode 4, as shown in fig. 9D, the image processor 1022 is embedded inside the video image decoder 1021, processes the image video code stream 803 inside the video image decoder 1021, outputs the original depth information 804 and the processed target video image 807, and then sends the original depth information 804 to the depth image generator 1023 to output the depth image 806.

In the information processing method provided by the embodiment of the invention, at an encoding end, original depth information obtained by a depth information sensor is subjected to video image encoding to form a video image code stream for transmission; at a decoding end, the depth image can be recovered through the video image code stream, and the original video image can be processed through the original depth information obtained through analysis, so that a target video image with higher image quality is obtained.

In an example, the original depth information is phase information, a depth image can be recovered from a plurality of phase images sampled at different time points, and when the original video image is blurred due to motion, the blurred original video image can be recovered by motion estimation based on the phase information because the plurality of phase images can carry more information at different time points, so as to obtain a clearer target video image.

In another example, the depth information sensor is a TOF architecture or module, the original depth information is charge information, and not only can a depth image be generated, but also noise and external visible light of a shooting scene can be judged according to the charge information, and the charge information is used for performing de-drying and white balance adjustment on the original video image to obtain a video image with better image quality, so as to provide a more beautiful and realistic image video experience for a user.

In the embodiment of the present invention, the obtaining manner of the original depth information includes, but is not limited to, the following manners:

in a first mode

Adopting a continuous modulation TOF method, under two different emission signal frequencies, obtaining 8 groups of optical signals with different phases through sampling by a TOF sensor by controlling integration time, carrying out photoelectric conversion on the 8 groups of optical signals to obtain 8 groups of charge signals, and carrying out 10-bit quantization on the 8 groups of charge signals to generate 8 original charge images; the decoding end takes the 8 original charge images and the attribute parameters such as temperature of the TOF sensor as original depth information to be coded; or preprocessing the 8 original charge images to generate 2 pieces of process depth data and one piece of background data, and coding the 2 pieces of process depth data and the one piece of background data as original depth information.

Mode two

The method comprises the steps of adopting a binocular imaging principle, utilizing two video images obtained by shooting through a binocular camera, calculating to obtain information such as parallax according to the poses of the two video images, and coding the parallax information and camera parameters as original depth information.

In the embodiment of the present invention, taking 3 dimensional High Efficiency Video Coding (3D HEVC) as an example, when encoding original depth information, as a possible implementation manner, each view and corresponding original depth information are encoded; as another possible implementation, the original depth information may be encoded at intervals based on viewpoints, that is, since there is a strong correlation between different viewpoints at the same time, the original depth information, such as a phase map or a charge image, may reduce the amount of transmitted video image code stream data by using the correlation. In an example, for video coding of three viewpoints, at an encoding end, only original depth data of left and right viewpoints need to be reserved in a video image code stream, and at a decoding end, original depth information of a middle viewpoint can be obtained by performing interpolation processing on the original depth information of the left and right viewpoints.

In the embodiment of the invention, the original depth information is subjected to redundancy elimination based on time correlation as an example, as a possible implementation mode, all the original depth information is not required to be coded, only a sampling mode is required to be adopted to sample the original depth information acquired by a depth information sensor by adopting a fixed step length, and a video image encoder is used for coding the sampling signals; after the decoding end recovers and obtains the sampling signals, original depth information which is not sampled is recovered and obtained through methods such as interpolation.

In one example, as shown in fig. 10, the original depth information includes: the serial numbers are respectively signal 1, signal 2, signal 3, signal 4 … … signal N, the fixed step length 3 is adopted to sample the original depth information, and the obtaining of the sampled original depth information includes: signal 1, signal 4, signal 7 … …, signal N, which encodes and decodes the sampled original depth information, and recovers the decoded non-sampled signal according to its adjacent sampled signal; for example, the interpolation of signal 1 and signal 4 is performed to obtain signal 2, the interpolation of signal 2 and signal 4 is performed to obtain signal 3, and so on.

In the embodiment of the invention, in an AR scene, as a possible implementation manner, the original depth information corresponding to the whole depth image does not need to be coded, and only a part of pictures need to be coded, so that the coded transmission of the specified local original depth information is realized.

In order to implement the information processing method, an embodiment of the present invention further provides a terminal device, where a composition structure of the terminal device is shown in fig. 11, and a terminal device 1100 includes:

a first obtaining unit 1101, configured to obtain original depth information corresponding to depth information when the depth information of a target object is obtained by a depth information sensing unit, where the original depth information represents an acquisition state of the depth information acquired by the depth information sensing unit or information other than the acquired depth information;

a second acquisition unit 1102 configured to acquire video image data of the target object by an image sensing unit;

an encoding unit 1103 configured to perform merging encoding on the original depth information and the video image data to obtain a video image code stream;

an output unit 1104 configured to output the video image code stream.

In this embodiment of the present invention, the encoding unit 1103 is further configured to:

and merging and coding the original depth information corresponding to the appointed image frame view in the image frame corresponding to the video image data and the video image data to obtain the video image code stream.

and merging and coding the original depth information corresponding to the specified image position and the video image data to obtain the video image code stream.

and performing mixed coding on the original depth information and the video image data according to the correlation between the original depth information and the video image data to obtain the video image code stream.

coding the original depth information to obtain first coding information;

writing the first coding information into a designated position of the video image data;

and coding the video image data written in the first coding information to obtain the video image code stream.

coding the original depth information to obtain first coding information;

coding the video image data to obtain second coding information;

and merging the first coding information and the second coding information to obtain the video image code stream.

In this embodiment of the present invention, the terminal device further includes:

a pre-processing unit configured to:

and preprocessing the original depth information before merging and coding the original depth information and the video image data to obtain a video image code stream.

a cancellation unit configured to:

before the original depth information and the video image data are merged and coded to obtain a video image code stream, redundancy elimination processing is carried out on the original depth information to eliminate redundant information in the original depth information.

In this embodiment of the present invention, the eliminating unit is further configured to at least one of:

In this embodiment of the present invention, the original depth information includes at least one of: charge information, phase information and property parameters of the depth information sensing unit.

The embodiment of the present invention further provides a terminal device, which includes a processor and a memory configured to store a computer program capable of running on the processor, wherein when the processor is configured to run the computer program, the steps of the information processing method executed by the terminal device are executed.

It should be noted that the depth information sensing unit, the image sensing unit, and the video image encoding unit in the embodiment of the present invention may be a depth information sensor, an image sensor, and a video image encoder, respectively.

In order to implement the information processing method, an embodiment of the present invention further provides a terminal device, where a composition structure of the terminal device is shown in fig. 12, and the terminal device 1200 includes:

a receiving unit 1201 configured to receive a video image code stream, where the video image code stream is obtained by merging and encoding original depth information and video image data, the original depth information is obtained by a depth information sensing unit when depth information of a target object is obtained, and the video image data is obtained by the image sensing unit from the target object; the original depth information represents the acquisition state of the depth information acquired by the depth information sensing unit or information except the acquired depth information;

a decoding unit 1202, configured to decode the video image code stream to obtain the original depth information and a video image corresponding to the video image data;

a processing unit 1203, configured to perform image processing on the original depth information and the video image, so as to obtain a target video image.

In this embodiment of the present invention, the decoding unit 1202 is further configured to decode the video image code stream by a video image decoding unit to obtain the original depth information and a video image corresponding to the video image data;

the processing unit 1203 is further configured to perform image processing on the original depth information and the video image through the video image decoding unit to obtain a target video image.

In the embodiment of the present invention, the video image decoding unit and the image processing unit are independent from each other, or the image processing unit is integrated in the video image decoding unit.

In this embodiment of the present invention, the processing unit 1203 is further configured to:

and when the original depth information is charge information, carrying out denoising treatment or white balance adjustment on the video image according to the charge information to obtain the target video image.

and when the original depth information is phase information, deblurring processing is carried out on the video image according to the phase information to obtain the target video image.

and the generating unit is configured to restore the original depth information to obtain a depth image.

In this embodiment of the present invention, the generating unit is further configured to restore the original depth information through a depth image generating unit to obtain a depth image, so as to obtain the depth image.

It should be noted that the video image decoding unit, the image processing unit, and the depth image generating unit in the embodiment of the present invention may be a video image decoder, an image processor, and a depth image generator, respectively.

Fig. 13 is a schematic diagram of a hardware composition structure of an electronic device (terminal device) according to an embodiment of the present invention, where the electronic device 1300 includes: at least one processor 1301, memory 1302, and at least one network interface 1304. The various components in the electronic device 1300 are coupled together by a bus system 1305. It is understood that the bus system 1305 is used to implement connective communication between these components. The bus system 1305 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled in FIG. 13 as the bus system 1305.

It will be appreciated that the memory 1302 can be either volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. The non-volatile Memory may be ROM, Programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), magnetic random access Memory (FRAM), Flash Memory (Flash Memory), magnetic surface Memory, optical Disc, or Compact Disc Read-Only Memory (CD-ROM); the magnetic surface storage may be disk storage or tape storage. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of illustration and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Synchronous Static Random Access Memory (SSRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), Double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM), Enhanced Synchronous Dynamic Random Access Memory (ESDRAM), Enhanced Synchronous Dynamic Random Access Memory (Enhanced DRAM), Synchronous Dynamic Random Access Memory (SLDRAM), Direct Memory (DRmb Access), and Random Access Memory (DRAM). The memory 1302 described in connection with the embodiments of the invention is intended to comprise, without being limited to, these and any other suitable types of memory.

The memory 1302 in embodiments of the present invention is used to store various types of data in support of the operation of the electronic device 1300. Examples of such data include: any computer program for operating on the electronic device 1300, such as application 13021. A program for implementing the method of an embodiment of the present invention can be included in the application 13021.

The method disclosed by the above embodiment of the present invention may be applied to the processor 1301, or implemented by the processor 1301. Processor 1301 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 1301. The Processor 1301 described above may be a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. Processor 1301 may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present invention. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed by the embodiment of the invention can be directly implemented by a hardware decoding processor, or can be implemented by combining hardware and software modules in the decoding processor. The software modules may be located in a storage medium that is located in the memory 1302, and the processor 1301 reads the information in the memory 1302 to perform the steps of the aforementioned methods in conjunction with its hardware.

In an exemplary embodiment, the electronic Device 1300 may be implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), FPGAs, general purpose processors, controllers, MCUs, MPUs, or other electronic components for performing the foregoing methods.

The embodiment of the invention also provides a storage medium for storing the computer program.

Optionally, the storage medium may be applied to the terminal device in the embodiment of the present invention, and the computer program enables the computer to execute corresponding processes in each method in the embodiment of the present invention, which is not described herein again for brevity.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps configured to implement the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only exemplary of the present invention and should not be construed as limiting the scope of the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

An information processing method, the method comprising:

acquiring original depth information corresponding to the depth information under the condition that the depth information of a target object is acquired through a depth information sensor, wherein the original depth information represents the acquisition state of the depth information acquired by the depth information sensor or information except the acquired depth information;

acquiring video image data of the target object through an image sensor;

and merging and coding the original depth information and the video image data to obtain a video image code stream, and outputting the video image code stream.
The method of claim 1, wherein said merging and encoding the original depth information and the video image data to obtain a video image bitstream comprises:

and carrying out merging coding on the original depth information corresponding to the specified image frame in the image frame corresponding to the video image data and the video image data to obtain the video image code stream.
The method of claim 1, wherein said merging and encoding the original depth information and the video image data to obtain a video image bitstream comprises:

and merging and coding the original depth information corresponding to the specified image position and the video image data to obtain the video image code stream.
The method according to any one of claims 1 to 3, wherein said merging and encoding the original depth information and the video image data to obtain a video image code stream comprises:

and performing mixed coding on the original depth information and the video image data according to the correlation between the original depth information and the video image data to obtain the video image code stream.
The method of claim 4, wherein the merging and encoding the original depth information and the video image data to obtain a video image code stream further comprises:

and writing first coding information corresponding to the original depth information into a specified position of second coding information corresponding to the video image data.
The method of any of claims 1 to 3, wherein said jointly encoding said original depth information and said video image data comprises:

and independently coding the original depth information and the video image data respectively to obtain an image video code stream comprising a first code stream and a second code stream, wherein the first code stream is a code stream obtained after the original depth information is coded, and the second code stream is a code stream obtained after the image video data is coded.
The method according to any one of claims 1 to 6, wherein before performing the merging encoding on the original depth information and the video image data to obtain a video image code stream, the method further comprises:

preprocessing the original depth information;

the merging and encoding the original depth information and the video image data to obtain a video image code stream includes:

and merging and coding the preprocessed original depth information and the video image data to obtain a video image code stream.
The method according to any one of claims 1 to 7, wherein before performing the merging encoding on the original depth information and the video image data to obtain a video image bitstream, the method further comprises:

and carrying out redundancy elimination processing on the original depth information so as to eliminate redundant information in the original depth information.
The method of claim 8, wherein the performing redundancy elimination processing on the original depth information comprises at least one of:

performing redundant elimination processing on the original depth information based on phase correlation;

performing redundancy elimination processing on the original depth information based on spatial correlation;

performing redundancy elimination processing on the original depth information based on time correlation;

performing redundancy elimination processing on the original depth information based on the specified depth;

performing redundancy elimination processing on the original depth information based on frequency domain correlation;

and performing redundancy elimination processing on the coded bits of the original depth information based on the correlation between the coded binary data.
The method of any of claims 1 to 9, wherein the original depth information comprises at least one of: charge information, phase information, and attribute parameters of the depth information sensor.
An information processing method, the method comprising:

receiving a video image code stream, wherein the video image code stream is obtained by merging and coding original depth information and video image data, the original depth information is obtained under the condition that a depth information sensor obtains depth information of a target object, and the video image data is obtained from the target object through an image sensor; the original depth information represents the acquisition state of the depth information acquired by the depth information sensor or information except the acquired depth information;

decoding the video image code stream to obtain the original depth information and a video image corresponding to the video image data;

and carrying out image processing on the original depth information and the video image to obtain a target video image.
The method of claim 11, wherein,

decoding the video image code stream through a video image decoder to obtain the original depth information and a video image corresponding to the video image data;

and carrying out image processing on the original depth information and the video image through an image processor to obtain a target video image.
The method of claim 12, wherein the video image decoder and the image processor are independent of each other or the image processor is integrated within the video image decoder.
The method of any of claims 11 to 13, wherein the original depth information comprises at least one of: charge information, phase information, and attribute parameters of the depth information sensor.
The method of claim 14, wherein when the original depth information is charge information, the image processing the original depth information and the video image to obtain a target video image comprises:

and denoising or white balance adjustment is carried out on the video image according to the charge information to obtain the target video image.
The method of claim 14, wherein when the original depth information is phase information, the image processing the original depth information and the video image to obtain a target video image comprises:

and deblurring the video image according to the phase information to obtain the target video image.
The method of any of claims 11 to 16, wherein the method further comprises:

and restoring the original depth information to obtain a depth image.
The method of claim 17, wherein the original depth information is recovered by a depth image generator resulting in the depth image.
A terminal device, the terminal device comprising:

the first acquisition unit is configured to acquire original depth information corresponding to the depth information under the condition that the depth information of a target object is acquired through the depth information sensing unit, wherein the original depth information represents the acquisition state of the depth information acquired by the depth information sensing unit or information except the acquired depth information;

a second acquisition unit configured to acquire video image data of the target object by an image sensing unit;

the encoding unit is configured to carry out merging encoding on the original depth information and the video image data to obtain a video image code stream;

and the output unit is configured to output the video image code stream.
The terminal device of claim 19, wherein the encoding unit is further configured to:

and performing mixed coding on the original depth information and the video image data according to the correlation between the original depth information and the video image data to obtain the video image code stream.
The terminal device of claim 19 or 20, wherein the encoding unit is further configured to:

coding the original depth information to obtain first coding information;

coding the video image data to obtain second coding information;

and merging the first coding information and the second coding information to obtain the video image code stream.
The terminal device of any of claims 19 to 21, wherein the terminal device further comprises:

a pre-processing unit configured to:

and preprocessing the original depth information before merging and coding the original depth information and the video image data to obtain a video image code stream.
The terminal device of any of claims 19 to 22, wherein the terminal device further comprises:

a cancellation unit configured to:

before the original depth information and the video image data are merged and coded to obtain a video image code stream, redundancy elimination processing is carried out on the original depth information to eliminate redundant information in the original depth information.
A terminal device, the terminal device comprising:

the video image code stream is obtained by merging and coding original depth information and video image data, the original depth information is obtained under the condition that the depth information of a target object is obtained through a depth information sensing unit, and the video image data is of the target object obtained through an image sensing unit; the original depth information represents the acquisition state of the depth information acquired by the depth information sensing unit or information except the acquired depth information;

the decoding unit is configured to decode the video image code stream to obtain the original depth information and a video image corresponding to the video image data;

and the processing unit is configured to perform image processing on the original depth information and the video image to obtain a target video image.
The terminal device of claim 24, wherein the terminal device further comprises:

and the generating unit is configured to restore the depth information to obtain a depth image.
A terminal device comprising a processor and a memory configured to store a computer program operable on the processor, wherein the processor is configured to execute the steps of the information processing method of any one of claims 1 to 10 or the steps of the information processing method of any one of claims 11 to 18 when executing the computer program.
A storage medium storing an executable program which, when executed by a processor, implements the information processing method of any one of claims 1 to 10 or implements the information processing method of any one of claims 11 to 18.