CN108063894B

CN108063894B - Video processing method and mobile terminal

Info

Publication number: CN108063894B
Application number: CN201711405235.9A
Authority: CN
Inventors: 李兵
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2017-12-22
Filing date: 2017-12-22
Publication date: 2020-05-12
Anticipated expiration: 2037-12-22
Also published as: CN108063894A

Abstract

The invention provides a video processing method and a mobile terminal, wherein the method comprises the following steps: if a video layering instruction is received, acquiring depth-of-field information corresponding to each pixel point in each image frame of the video; layering the video according to depth information corresponding to each pixel point in each image frame of the video to obtain at least two layers of sub-videos; and if a target operation instruction for the sub-video of the target layer in the at least two layers of sub-videos is received, processing the sub-video of the target layer in the at least two layers of sub-videos according to the target operation instruction, wherein the sub-video of the target layer is the sub-video of any one layer in the at least two layers of sub-videos. By the video processing method provided by the invention, the videos are layered according to the depth of field information, and the sub-videos of any layer obtained after the videos are layered can be independently controlled, so that the problem that objects of different layers in a video picture cannot be independently controlled in the prior art is solved.

Description

Video processing method and mobile terminal

Technical Field

The present invention relates to the field of communications technologies, and in particular, to a video processing method and a mobile terminal.

Background

Video recording means that image frames are continuously captured and recorded, and the images are arranged according to a time sequence to obtain a video. At present, video recording becomes an important way for people to record and share life, and compared with photographing, video recording can more vividly reflect picture activities. However, each frame of image captured by video recording is usually a body and a background, and can only be controlled based on the whole frame of the video, for example, the video is played frame by frame according to the whole frame of the video, objects at different levels in the video frame cannot be controlled independently, and the video processing mode is single.

Disclosure of Invention

The embodiment of the invention provides a video processing method and a mobile terminal, and aims to solve the problem that in the prior art, objects of different layers in a video picture cannot be independently controlled, so that the video processing mode is single.

In order to solve the technical problem, the invention is realized as follows:

in a first aspect, an embodiment of the present invention provides a video processing method. The method comprises the following steps:

if a video layering instruction is received, acquiring depth-of-field information corresponding to each pixel point in each image frame of the video;

layering the video according to depth information corresponding to each pixel point in each image frame of the video to obtain at least two layers of sub-videos;

and if a target operation instruction for the sub-video of the target layer in the at least two layers of sub-videos is received, processing the sub-video of the target layer in the at least two layers of sub-videos according to the target operation instruction, wherein the sub-video of the target layer is the sub-video of any one layer in the at least two layers of sub-videos.

In a second aspect, an embodiment of the present invention further provides a mobile terminal. The mobile terminal includes:

the acquisition module is used for acquiring depth-of-field information corresponding to each pixel point in each image frame of the video if a video layering instruction is received;

the layering module is used for layering the video according to the depth of field information corresponding to each pixel point in each image frame of the video to obtain at least two layers of sub-videos;

and the processing module is used for processing the sub-video of the target layer in the at least two layers of sub-videos according to the target operation instruction if the target operation instruction for the sub-video of the target layer in the at least two layers of sub-videos is received, wherein the sub-video of the target layer is the sub-video of any one layer in the at least two layers of sub-videos.

In a third aspect, an embodiment of the present invention further provides a mobile terminal, including a processor, a memory, and a computer program stored on the memory and operable on the processor, where the computer program, when executed by the processor, implements the steps of the video processing method described above.

In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the video processing method are implemented.

In the embodiment of the invention, if a video layering instruction is received, the depth-of-field information corresponding to each pixel point in each image frame of a video is acquired; layering the video according to depth information corresponding to each pixel point in each image frame of the video to obtain at least two layers of sub-videos; and if a target operation instruction for the sub-video of the target layer in the at least two layers of sub-videos is received, processing the sub-video of the target layer in the at least two layers of sub-videos according to the target operation instruction, wherein the sub-video of the target layer is the sub-video of any one layer in the at least two layers of sub-videos. The video is layered according to the depth of field information, and the sub-video of any one layer of the at least two layers of sub-videos obtained after the video is layered can be independently controlled, so that the video control mode is enriched, and the problems that the objects of different layers in the video picture cannot be independently controlled and the video processing mode is single in the prior art are solved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive exercise.

Fig. 1 is a flow chart of a video processing method provided by an embodiment of the invention;

fig. 2 is a flowchart of a video processing method according to another embodiment of the invention;

fig. 3 is a schematic diagram of a first camera and a second camera arranged in parallel with a width direction of a mobile terminal according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a first camera and a second camera arranged in parallel to a length direction of a mobile terminal according to an embodiment of the present invention;

fig. 5 is a schematic diagram of a distance interval between a first camera and a second camera arranged in parallel to a width direction of a mobile terminal according to an embodiment of the present invention;

fig. 6 is a schematic diagram of a first camera and a second camera acquiring image frames according to an embodiment of the present invention;

fig. 7 is a schematic diagram of a first image frame acquired by a first camera according to an embodiment of the present invention;

fig. 8 is a schematic diagram of a second image frame acquired by a second camera according to an embodiment of the present invention;

fig. 9 is a schematic diagram of calculating depth information of an object P according to an embodiment of the present invention;

FIG. 10 is a schematic diagram of image frame layering of a video provided by an embodiment of the invention;

FIG. 11 is a schematic diagram of a video playback interface after video layering according to an embodiment of the present invention;

fig. 12 is a block diagram of a mobile terminal provided in an embodiment of the present invention;

fig. 13 is a block diagram of a mobile terminal according to still another embodiment of the present invention;

fig. 14 is a schematic hardware structure diagram of a mobile terminal implementing various embodiments of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides a video processing method. Referring to fig. 1, fig. 1 is a flowchart of a video processing method according to an embodiment of the present invention, as shown in fig. 1, including the following steps:

step 101, if a video layering instruction is received, acquiring depth-of-field information corresponding to each pixel point in each image frame of a video.

For example, a video layering touch button may be preset in a video playing interface, and if a touch operation for the video layering touch button is received, it is determined that a video layering instruction is received.

Optionally, when the video layering instruction is received, the depth of field information corresponding to each pixel point in each image frame of the video may be obtained through calculation, for example, the depth of field information corresponding to each pixel point in each image frame of the video is calculated through methods such as image shifting, continuous image prediction, and the like; or acquiring depth information corresponding to each pixel point in each image frame of the pre-stored video. The depth information may include a depth value, which may represent a distance of each object in the image frame from the camera.

And 102, layering the video according to the depth information corresponding to each pixel point in each image frame of the video to obtain at least two layers of sub-videos.

Specifically, the depth information corresponding to each pixel point in each image frame of the video may be compared with a threshold value, so as to determine the hierarchy to which each pixel point of each image frame belongs. For example, when the video needs to be divided into three layers, depth information corresponding to the pixel points of each image frame may be respectively compared with a first threshold and a second threshold, where the first threshold is smaller than the second threshold, the pixel points in each image frame whose depth information corresponding to the pixel points is smaller than or equal to the first threshold are divided into a first layer, the pixel points in each image frame whose depth information corresponding to the pixel points is greater than the first threshold and smaller than the second threshold are divided into a second layer, and the pixel points in each image frame whose depth information corresponding to the pixel points is greater than or equal to the second threshold are divided into a third layer.

Step 103, if a target operation instruction for a sub video of a target layer in the at least two layers of sub videos is received, processing the sub video of the target layer in the at least two layers of sub videos according to the target operation instruction, wherein the sub video of the target layer is a sub video of any one layer in the at least two layers of sub videos.

In the embodiment of the present invention, the target operation instruction may be reasonably set according to an actual situation, for example, the target operation instruction may be a pause operation instruction, a play operation instruction, a blurring operation instruction, and the like. It can be understood that, in the embodiment of the present invention, a target operation touch button corresponding to a target operation instruction may be preset in a video playing interface, and when a user touches the target operation touch button, a corresponding target operation instruction is generated.

Specifically, after the video is divided into three layers of sub-videos, each layered sub-video may be controlled separately, for example, after the video is divided into three layers of sub-videos, the sub-video of the first layer may be controlled to play, the sub-videos of the other two layers may be controlled to pause playing, or only the sub-video of the second layer may be subjected to blurring processing, or only the sub-video of the first layer may be subjected to filter processing, and the like.

In the embodiment of the present invention, the mobile terminal may be a mobile phone, a Tablet personal Computer (Tablet personal Computer), a Laptop Computer (Laptop Computer), a personal digital assistant (PDA for short), a Wearable Device (Wearable Device), or the like.

According to the video processing method, if a video layering instruction is received, depth of field information corresponding to each pixel point in each image frame of a video is obtained; layering the video according to depth information corresponding to each pixel point in each image frame of the video to obtain at least two layers of sub-videos; and if a target operation instruction for the sub-video of the target layer in the at least two layers of sub-videos is received, processing the sub-video of the target layer in the at least two layers of sub-videos according to the target operation instruction, wherein the sub-video of the target layer is the sub-video of any one layer in the at least two layers of sub-videos. The video is layered according to the depth of field information, and the sub-video of any one layer of the at least two layers of sub-videos obtained after the video is layered can be independently controlled, so that the video control mode is enriched, and the problems that the objects of different layers in the video picture cannot be independently controlled and the video processing mode is single in the prior art are solved.

Referring to fig. 2, fig. 2 is a flowchart of a video processing method according to an embodiment of the present invention. The difference between the embodiment of the present invention and the previous embodiment is mainly that the depth information corresponding to each pixel point in each image frame is further limited to be calculated in the process of recording the video. In an embodiment of the present invention, before the step 101, the method further includes: in the process of recording a video through a camera of the mobile terminal, respectively calculating depth-of-field information corresponding to each pixel point in each image frame acquired by the camera of the mobile terminal; and respectively storing the depth information corresponding to each pixel point in each image frame and the corresponding pixel point in an associated manner.

As shown in fig. 2, the video processing method provided by the embodiment of the present invention includes the following steps:

step 201, in the process of recording a video through a camera of a mobile terminal, respectively calculating depth-of-field information corresponding to each pixel point in each image frame acquired by the camera of the mobile terminal.

In the embodiment of the invention, the depth of field information corresponding to each pixel point in each image frame acquired by the camera of the mobile terminal can be calculated in the process of recording the video.

Optionally, in order to improve the accuracy of the depth-of-field information obtained by the calculation, the mobile terminal at least includes a first camera and a second camera, the first camera and the second camera are arranged in parallel on the same side of the mobile terminal along a target direction, and focal lengths of the first camera and the second camera are the same, the target direction is the width direction of the mobile terminal or the length direction of the mobile terminal, in step 201, that is, in the process of recording a video by the camera of the mobile terminal, the depth-of-field information corresponding to each pixel point in each image frame acquired by the camera of the mobile terminal is calculated respectively, including:

in the process of recording videos through the first camera and the second camera, respectively acquiring image frames, which are acquired by the first camera and the second camera at the same moment and aim at the same object, and acquiring a first image frame and a second image frame;

determining the objects separatelyThe target coordinate X of the corresponding pixel point in the first image frame_aAnd target coordinates X in said second image frame_bWherein the target coordinates are coordinates along the target direction;

by using

Calculating the depth of field value corresponding to the pixel point corresponding to the object;

wherein D represents a depth of field value corresponding to a pixel point corresponding to the object, f represents a focal length of the first camera or a focal length of the second camera, and T represents a distance interval between the first camera and the second camera.

In the embodiment of the invention, the mobile terminal at least comprises a first camera and a second camera, wherein the first camera is arranged on the same side of the mobile terminal in parallel along the target direction. For example, referring to fig. 3, the mobile terminal 1 includes a first camera 11 and a second camera 12, the first camera 11 and the second camera 12 being disposed in parallel on the front or back of the mobile terminal 1 in the width direction of the mobile terminal 1, referring to fig. 4, the mobile terminal 1 includes a first camera 11 and a second camera 12, the first camera 11 and the second camera 12 being disposed in parallel on the front or back of the mobile terminal 1 in the length direction of the mobile terminal 1. The focal lengths of the first camera and the second camera are the same, and as shown in fig. 5, the distance interval between the first camera 11 and the second camera 12 is T.

Taking the first camera and the second camera arranged in parallel in the width direction of the mobile terminal as an example, when receiving a video recording instruction, the first camera and the second camera may be started to simultaneously acquire image frames, as shown in fig. 6, to obtain a first image frame 111 and a second image frame 121 shown in fig. 7 and 8, respectively, where pixel points corresponding to the same object P exist in the first image frame 111 and the second image frame 121. Fig. 9 is a schematic diagram of calculating depth of field information of the object P, see fig. 9, where T is a distance interval between the first camera and the second camera, and f is a focal length of the first camera or the second camera, where the focal lengths of the first camera and the second cameraSame distance, X_aAs the abscissa, X, of the object P in the first image frame_bAs the abscissa of the object P in the second image frame, the triangle PPaPb and the triangle POaOb are two similar triangles as can be seen from fig. 9, and are obtained according to the similar triangle principle:

from the above equation:

that is, the distance from the object P to the camera is D — Z-f, which is the depth of field value corresponding to the pixel point corresponding to the object P.

The embodiment of the invention can adopt the mode to respectively calculate the depth of field value corresponding to each pixel point in the image frame. It can be understood that, in the embodiment of the present invention, after the depth of field value corresponding to each pixel point in the image frame is obtained through calculation, only the image frame acquired by the first camera or the image frame acquired by the second camera may be stored, so as to reduce the occupation of the storage space.

According to the embodiment of the invention, the two cameras are used for collecting the image frames aiming at the same object at the same time so as to calculate the depth of field information corresponding to the pixel points corresponding to all objects in the image frames, and the accuracy of the depth of field information corresponding to each pixel point in each image frame obtained by calculation can be improved.

Step 202, respectively storing each pixel point in each image frame, depth of field information corresponding to each pixel point in each image frame, and an association relationship between the depth of field information corresponding to each pixel point in each image frame and the corresponding pixel point.

Specifically, each image frame collected by the camera and the depth information corresponding to each pixel point of the image frame can be packaged into one frame of data, where the depth information corresponding to each pixel point can be represented as D (x, y), where (x, y) represents a position of the object in the image frame, that is, a position of a pixel point corresponding to the object in the image frame, and D represents the depth information corresponding to the object, that is, the depth information corresponding to the pixel point corresponding to the object.

Step 203, if a video layering instruction is received, acquiring depth-of-field information corresponding to each pixel point in each image frame of the video.

In the embodiment of the invention, when the video layering command is received, the depth of field information corresponding to each pixel point in each stored image frame can be acquired, so that the acquisition efficiency of the depth of field information can be improved, and the video layering efficiency is further improved.

And 204, layering the video according to the depth information corresponding to each pixel point in each image frame of the video to obtain at least two layers of sub-videos.

In the embodiment of the invention, a plurality of threshold values for video layering can be preset, and the layer to which each pixel point in each image frame belongs can be determined by comparing the threshold values with the depth information corresponding to each pixel point in each image frame of the video.

Optionally, in order to improve the flexibility of video layering, in step 204, that is, layering the video according to the depth information corresponding to each pixel point in each image frame of the video, to obtain at least two layers of sub-videos, the method includes:

acquiring the layering number and the layering distance;

and layering the video according to the layering number, the layering distance and depth information corresponding to each pixel point in each image frame of the video to obtain the sub-videos with the layering number.

In the embodiment of the invention, the layering number and the layering distance can be reasonably set by a user. Specifically, when a video layering instruction is received, a setting interface is popped up for a user to set the layering number and the layering distance. It is understood that the number of layers and the distance between layers may be preset.

As an example, the number of layers is n, and the layer distances are d1, d2, d3, and … dn, the depth information (i.e., depth value) corresponding to the i-th layer is:

wherein i is 1,2,3, …, n, d₀＝0，d_nBy this means, all the image regions Σ (x, y) corresponding to the i-th layer can be obtained from D (x, y).

For example, when n is 3, d1 is 1m, d2 is 10m, and d3 is 20m, the depth of field interval of each layer may be as follows:

the depth of field interval of the first layer is: d is more than or equal to 0 and less than 5.5;

the depth of field interval of the second layer is as follows: d is more than or equal to 5.5 and less than 15;

the depth of field interval of the third layer is as follows: 15 is less than or equal to D and less than infinity.

Specifically, each image frame of the video may be divided into 3 layers of sub-images by the 3 layers of depth intervals, for example, as shown in fig. 10, the first image frame 111 is divided into 3 layers of sub-images, which are the first layer sub-image 1111, the second layer sub-image 1112, and the third layer sub-image 1113, respectively. It is to be understood that the first layer, the second layer sub-image 1112 and the third layer sub-image 1113 are only used for representing different image layers, and the order of the layers is not limited. Specifically, the video playing interface after video layering may display an operation interface to receive an operation instruction, where the operation interface may be as shown in fig. 11.

According to the embodiment of the invention, the video is layered according to the layering number, the layering distance and the depth of field information corresponding to each pixel point in each image frame of the video, so that the video layering can be flexibly controlled according to actual requirements.

Optionally, in order to improve the video layering effect, in step 204, that is, layering the video according to the depth information corresponding to each pixel point in each image frame of the video, to obtain at least two layers of sub-videos, the method includes:

dividing image areas of each image frame of the video to obtain at least two image areas corresponding to each image frame of the video;

determining a layer to which each image area in at least two image areas corresponding to each image frame of the video belongs according to depth information corresponding to the pixel point of each image area in at least two image areas corresponding to each image frame of the video;

and layering the video according to the layer to which each image area belongs in at least two image areas corresponding to each image frame of the video to obtain at least two layers of sub-videos.

In an actual situation, when a video is layered according to depth information, if an object is large, there may be a case where a pixel point belonging to the object belongs to different layers, thereby affecting a display effect of a sub-video obtained by layering the video. For example, after videos are layered, one part of a bus in the videos belongs to a sub-video of a first layer, and the other part of the videos belongs to a sub-video of a second layer, so that when the sub-video of the first layer is played alone or the sub-video of the second layer is played alone, the bus picture is incomplete, and the display effect is affected.

In the embodiment of the invention, the image area division can be performed on each image frame, for example, the contour of each object in the image frame can be detected, and the image frame can be divided into different image areas according to the contour of each object. After dividing an image region of a certain image frame to obtain at least two image regions, the layer to which each image region belongs may be determined, for example, the number of pixel points belonging to each layer in each image region may be counted, and the layer including the largest number of pixel points of the image region is determined as the layer to which the image region belongs. Specifically, after a layer to which each image region of each image frame of the video belongs is obtained, each image frame of the video may be layered respectively to obtain a sub-image frame of each image frame, and the sub-image frames belonging to the same layer in all the image frames of the video constitute the sub-video of the layer.

According to the embodiment of the invention, each image frame of the video is divided into the image areas, and the video is layered according to the layer to which each image area belongs in at least two image areas corresponding to each image frame of the video, so that the image areas of the same object can be ensured to be positioned in the same layer, and the display effect of the sub-video obtained after the video is layered can be improved.

Step 205, if a target operation instruction for a sub video of a target layer in the at least two layers of sub videos is received, processing the sub video of the target layer in the at least two layers of sub videos according to the target operation instruction, where the sub video of the target layer is a sub video of any one layer in the at least two layers of sub videos.

Optionally, in step 205, that is, if the target operation instruction for the sub-video of the target layer in the at least two layers of sub-videos is received, processing the sub-video of the target layer in the at least two layers of sub-videos according to the target operation instruction includes:

when the target operation instruction comprises a playing control instruction, playing the sub-video of the target layer;

when the target operation instruction comprises a pause playing control instruction, pausing playing of the sub-video of the target layer;

when the target operation instruction comprises a blurring operation instruction, performing blurring processing on the sub video of the target layer;

and when the target operation instruction comprises a filter operation instruction, performing filter processing on the sub-video of the target layer.

In practical application, after the videos are layered to obtain at least two layers of sub-videos, the sub-video of any one of the at least two layers of sub-videos may be individually played and controlled, and the sub-videos of other layers are controlled to be in a paused state, for example, when a user wants to observe the activity of an object (e.g., a layer where a person in the video is located) in a first layer of a video picture, the sub-video of a second layer and the sub-video of a third layer may be paused, so that the user may concentrate on observing the activity of the person in the first layer, and eliminate interference caused by the objects in other layers. For example, the playback of the second-floor car and the third-floor mountain shown in fig. 11 is paused, and only the moving picture of the person of the first floor is played.

Optionally, the embodiment of the present invention may also perform blurring processing on the sub-videos, for example, when the user touches the blurring button shown in fig. 11, one or more layers of sub-videos may be blurred, and other layers do not perform blurring processing, so that an effect of highlighting a picture of one layer may be achieved, and a car of the second layer and a peak of the third layer are blurred, and only a picture of a person of the first layer is clear.

Optionally, in the embodiment of the present invention, filter processing may be performed on one or more layers of sub-videos, and no filter processing is performed on other layers, so that an effect of highlighting a certain layer of picture may also be achieved, and a display effect may be enhanced. For example, as shown in fig. 11, the second layer of cars and the third layer of peaks are processed by oil painting filters, and the first layer of people is not processed, so that when the video is played, the second layer of cars and the third layer of peaks both have oil painting effects, and the picture of only the first layer of people is clear.

In the embodiment of the invention, the images are collected by the plurality of cameras to calculate the depth of field information of the images, and then the images are divided into a plurality of layers according to the depth of field information, specifically, in the process of recording the video, each frame of data comprises the images (namely the original images) collected by the cameras and the corresponding depth of field information, when the video is played, the original images can be divided into the plurality of layers according to the depth of field information, and different processing can be carried out on the images of different layers, such as no processing is carried out on a main body layer which a user wants to observe, the playing of other layers is paused, or blurring and filter processing are carried out on other layers, so that the activity of the image objects of the main body layer is highlighted, and the interest and operability.

Referring to fig. 12, fig. 12 is a block diagram of a mobile terminal according to an embodiment of the present invention. As shown in fig. 12, the mobile terminal 1200 includes: an obtaining module 1201, a layering module 1202, and a processing module 1203, wherein:

an obtaining module 1201, configured to obtain depth-of-field information corresponding to each pixel point in each image frame of a video if a video layering instruction is received;

the layering module 1202 is configured to layer the video according to depth-of-field information corresponding to each pixel point in each image frame of the video, so as to obtain at least two layers of sub-videos;

the processing module 1203 is configured to, if a target operation instruction for a sub video of a target layer in the at least two layers of sub videos is received, process the sub video of the target layer in the at least two layers of sub videos according to the target operation instruction, where the sub video of the target layer is a sub video of any one layer in the at least two layers of sub videos.

Optionally, referring to fig. 13, the mobile terminal 1200 further includes:

a calculating module 1204, configured to, before obtaining depth-of-field information corresponding to each pixel point in each image frame of the video if the layering instruction is received, respectively calculate depth-of-field information corresponding to each pixel point in each image frame acquired by a camera of the mobile terminal during a process of recording the video by the camera of the mobile terminal;

the storage module 1205 is configured to store each pixel point in each image frame, depth of field information corresponding to each pixel point in each image frame, and an association relationship between the depth of field information corresponding to each pixel point in each image frame and the corresponding pixel point.

Optionally, the mobile terminal includes at least a first camera and a second camera, the first camera and the second camera are arranged in parallel along a target direction at the same side of the mobile terminal, and focal lengths of the first camera and the second camera are the same, the target direction is the width direction of the mobile terminal or the length direction of the mobile terminal, and the calculation module 1204 is specifically configured to:

respectively determining the target coordinates X of the pixel points corresponding to the object in the first image frame_aAnd target coordinates X in said second image frame_bWherein the target coordinates are coordinates along the target direction;

by using

Optionally, the processing module 1203 is specifically configured to:

Optionally, the layer module 1202 is specifically configured to:

acquiring the layering number and the layering distance;

Optionally, the layer module 1202 is specifically configured to:

The mobile terminal 1200 provided in the embodiment of the present invention can implement each process implemented by the mobile terminal in the method embodiments of fig. 1 to fig. 2, and is not described here again to avoid repetition.

The mobile terminal 1200 of the embodiment of the present invention includes an obtaining module 1201, configured to obtain depth-of-field information corresponding to each pixel point in each image frame of a video if a video layering instruction is received; the layering module 1202 is configured to layer the video according to depth-of-field information corresponding to each pixel point in each image frame of the video, so as to obtain at least two layers of sub-videos; the processing module 1203 is configured to, if a target operation instruction for a sub video of a target layer in the at least two layers of sub videos is received, process the sub video of the target layer in the at least two layers of sub videos according to the target operation instruction, where the sub video of the target layer is a sub video of any one layer in the at least two layers of sub videos. The video is layered according to the depth of field information, and the sub-video of any one layer of the at least two layers of sub-videos obtained after the video is layered can be independently controlled, so that the video control mode is enriched, and the problems that the objects of different layers in the video picture cannot be independently controlled and the video processing mode is single in the prior art are solved.

Fig. 14 is a schematic hardware structure diagram of a mobile terminal implementing various embodiments of the present invention. Referring to fig. 14, the mobile terminal 1400 includes, but is not limited to: radio frequency unit 1401, network module 1402, audio output unit 1403, input unit 1404, sensor 1405, display unit 1406, user input unit 1407, interface unit 1408, memory 1409, processor 1410, and power supply 1411. Those skilled in the art will appreciate that the mobile terminal architecture shown in fig. 14 is not intended to be limiting of mobile terminals, and that a mobile terminal may include more or fewer components than shown, or some components may be combined, or a different arrangement of components. In the embodiment of the present invention, the mobile terminal includes, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted terminal, a wearable device, a pedometer, and the like.

The processor 1410 is configured to, if a video layering instruction is received, obtain depth-of-field information corresponding to each pixel point in each image frame of the video; layering the video according to depth information corresponding to each pixel point in each image frame of the video to obtain at least two layers of sub-videos; and if a target operation instruction for the sub-video of the target layer in the at least two layers of sub-videos is received, processing the sub-video of the target layer in the at least two layers of sub-videos according to the target operation instruction, wherein the sub-video of the target layer is the sub-video of any one layer in the at least two layers of sub-videos.

According to the embodiment of the invention, the video is layered according to the depth of field information, and the sub-video of any one layer of the at least two layers of sub-videos obtained after the video is layered can be independently controlled, so that the video control mode is enriched, and the problems that the objects of different layers in the video picture cannot be independently controlled and the video processing mode is single in the prior art are solved.

Optionally, before the obtaining depth information corresponding to each pixel point in each image frame of the video if the layering instruction is received, the method further includes:

in the process of recording a video through a camera of the mobile terminal, respectively calculating depth-of-field information corresponding to each pixel point in each image frame acquired by the camera of the mobile terminal;

and respectively storing each pixel point in each image frame, depth of field information corresponding to each pixel point in each image frame and the incidence relation between the depth of field information corresponding to each pixel point in each image frame and the corresponding pixel point.

Optionally, the mobile terminal includes at least a first camera and a second camera, the first camera and the second camera are arranged in parallel along a target direction at the same side of the mobile terminal, and focal lengths of the first camera and the second camera are the same, the target direction is the width direction of the mobile terminal or the length direction of the mobile terminal, in a process of recording a video through the camera of the mobile terminal, depth of field information corresponding to each pixel point in each image frame collected by the camera of the mobile terminal is calculated respectively, including:

by using

Optionally, if a target operation instruction for a sub video of a target layer in the at least two layers of sub videos is received, processing the sub video of the target layer in the at least two layers of sub videos according to the target operation instruction includes:

Optionally, the layering the video according to the depth information corresponding to each pixel point in each image frame of the video to obtain at least two layers of sub-videos includes:

acquiring the layering number and the layering distance;

It should be understood that, in the embodiment of the present invention, the radio frequency unit 1401 may be configured to receive and transmit signals during a message transmission or call process, and specifically, receive downlink data from a base station and then process the received downlink data to the processor 1410; in addition, the uplink data is transmitted to the base station. In general, radio unit 1401 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. The radio unit 1401 may also communicate with a network and other devices via a wireless communication system.

The mobile terminal provides the user with wireless broadband internet access through the network module 1402, such as helping the user send and receive e-mails, browse webpages, access streaming media, and the like.

The audio output unit 1403 can convert audio data received by the radio frequency unit 1401 or the network module 1402 or stored in the memory 1409 into an audio signal and output as sound. Also, the audio output unit 1403 may also provide audio output related to a specific function performed by the mobile terminal 1400 (e.g., a call signal reception sound, a message reception sound, etc.). The audio output unit 1403 includes a speaker, a buzzer, a receiver, and the like.

The input unit 1404 is for receiving an audio or video signal. The input Unit 1404 may include a Graphics Processing Unit (GPU) 14041 and a microphone 14042, the Graphics processor 14041 Processing image data of still pictures or video obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The processed image frames may be displayed on the display unit 1406. The image frames processed by the graphics processor 14041 may be stored in the memory 1409 (or other storage medium) or transmitted via the radio unit 1401 or the network module 1402. The microphone 14042 may receive sound and may be capable of processing such sound into audio data. The processed audio data may be converted into a format output transmittable to a mobile communication base station via the radio frequency unit 1401 in case of a phone call mode.

The mobile terminal 1400 also includes at least one sensor 1405, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor includes an ambient light sensor that can adjust the brightness of the display panel 14061 according to the brightness of ambient light, and a proximity sensor that can turn off the display panel 14061 and/or the backlight when the mobile terminal 1400 moves to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally three axes), detect the magnitude and direction of gravity when stationary, and can be used to identify the posture of the mobile terminal (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), and vibration identification related functions (such as pedometer, tapping); the sensors 1405 may also include fingerprint sensors, pressure sensors, iris sensors, molecular sensors, gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc., which are not described in detail herein.

The display unit 1406 is used to display information input by the user or information provided to the user. The Display unit 1406 may include a Display panel 14061, and the Display panel 14061 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.

The user input unit 1407 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the mobile terminal. Specifically, the user input unit 1407 includes a touch panel 14071 and other input devices 14072. The touch panel 14071, also referred to as a touch screen, may collect touch operations by a user (e.g., operations by a user on or near the touch panel 14071 using a finger, a stylus, or any other suitable object or attachment). The touch panel 14071 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 1410, receives a command from the processor 1410, and executes the command. In addition, the touch panel 14071 can be implemented by various types such as resistive, capacitive, infrared, and surface acoustic wave. In addition to the touch panel 14071, the user input unit 1407 may include other input devices 14072. In particular, the other input devices 14072 may include, but are not limited to, a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described herein.

Further, the touch panel 14071 may be overlaid on the display panel 14061, and when the touch panel 14071 detects a touch operation on or near the touch panel 14071, the touch operation is transmitted to the processor 1410 to determine the type of the touch event, and then the processor 1410 provides a corresponding visual output on the display panel 14061 according to the type of the touch event. Although in fig. 14, the touch panel 14071 and the display panel 14061 are two independent components to implement the input and output functions of the mobile terminal, in some embodiments, the touch panel 14071 and the display panel 14061 may be integrated to implement the input and output functions of the mobile terminal, which is not limited herein.

The interface unit 1408 is an interface through which an external device is connected to the mobile terminal 1400. For example, the external device may include a wired or wireless headset port, an external power supply (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. Interface unit 1408 may be used to receive input from external devices (e.g., data information, power, etc.) and transmit the received input to one or more elements within mobile terminal 1400 or may be used to transmit data between mobile terminal 1400 and external devices.

The memory 1409 may be used to store software programs as well as various data. The memory 1409 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory 1409 can include high speed random access memory and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The processor 1410 is a control center of the mobile terminal, connects various parts of the entire mobile terminal using various interfaces and lines, performs various functions of the mobile terminal and processes data by operating or executing software programs and/or modules stored in the memory 1409 and calling data stored in the memory 1409, thereby performing overall monitoring of the mobile terminal. Processor 1410 may include one or more processing units; preferably, the processor 1410 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 1410.

The mobile terminal 1400 may further include a power supply 1411 (e.g., a battery) for powering the various components, and preferably, the power supply 1411 may be logically connected to the processor 1410 via a power management system that may enable managing charging, discharging, and power consumption management functions.

In addition, the mobile terminal 1400 includes some functional modules that are not shown, and are not described herein again.

Preferably, an embodiment of the present invention further provides a mobile terminal, including a processor 1410, a memory 1409, and a computer program stored in the memory 1409 and capable of running on the processor 1410, where the computer program, when executed by the processor 1410, implements the processes of the video processing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the video processing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A video processing method is applied to a mobile terminal and is characterized by comprising the following steps:

if a target operation instruction for a sub video of a target layer in the at least two layers of sub videos is received, processing the sub video of the target layer in the at least two layers of sub videos according to the target operation instruction, wherein the sub video of the target layer is a sub video of any one layer in the at least two layers of sub videos;

if a target operation instruction for a sub video of a target layer in the at least two layers of sub videos is received, processing the sub video of the target layer in the at least two layers of sub videos according to the target operation instruction, including:

the layering the video according to the depth information corresponding to each pixel point in each image frame of the video to obtain at least two layers of sub-videos comprises the following steps:

acquiring the layering number and the layering distance;

layering the video according to the layering number, the layering distance and depth information corresponding to each pixel point in each image frame of the video to obtain sub-videos with the layering number;

wherein the number of layers is n, the layer distances are d1, d2, d3, … dn, and the depth information corresponding to the i-th layer is:

wherein i is 1,2,3, …, n, d₀＝0,d_nAnd infinity, all image regions Σ (x, y) corresponding to the i-th layer can be obtained from the depth information of each pixel.

2. The method according to claim 1, wherein before the obtaining depth information corresponding to each pixel point in each image frame of the video if the layering instruction is received, the method further comprises:

3. The method according to claim 2, wherein the mobile terminal includes at least a first camera and a second camera, the first camera and the second camera are disposed in parallel on a same side of the mobile terminal along a target direction, and focal lengths of the first camera and the second camera are the same, the target direction is a width direction of the mobile terminal or a length direction of the mobile terminal, and in a process of recording a video by the camera of the mobile terminal, depth information corresponding to each pixel point in each image frame acquired by the camera of the mobile terminal is calculated respectively, including:

by using

Calculating the saidThe depth of field value corresponding to the pixel point corresponding to the object;

4. The method according to any one of claims 1 to 3, wherein the layering the video according to the depth information corresponding to each pixel point in each image frame of the video to obtain at least two layers of sub-videos comprises: dividing image areas of each image frame of the video to obtain at least two image areas corresponding to each image frame of the video; determining a layer to which each image area in at least two image areas corresponding to each image frame of the video belongs according to depth information corresponding to the pixel point of each image area in at least two image areas corresponding to each image frame of the video; and layering the video according to the layer to which each image area belongs in at least two image areas corresponding to each image frame of the video to obtain at least two layers of sub-videos.

5. A mobile terminal, comprising:

the processing module is used for processing the sub-video of the target layer in the at least two layers of sub-videos according to a target operation instruction if the target operation instruction for the sub-video of the target layer in the at least two layers of sub-videos is received, wherein the sub-video of the target layer is the sub-video of any one layer in the at least two layers of sub-videos;

the processing module is specifically configured to:

the layering module is specifically configured to:

acquiring the layering number and the layering distance;

6. The mobile terminal of claim 5, wherein the mobile terminal further comprises:

the calculation module is used for respectively calculating the depth of field information corresponding to each pixel point in each image frame acquired by the camera of the mobile terminal in the process of recording the video through the camera of the mobile terminal before the depth of field information corresponding to each pixel point in each image frame of the video is acquired if the layering instruction is received;

and the storage module is used for respectively storing each pixel point in each image frame, the depth of field information corresponding to each pixel point in each image frame and the incidence relation between the depth of field information corresponding to each pixel point in each image frame and the corresponding pixel point.

7. The mobile terminal according to claim 6, wherein the mobile terminal includes at least a first camera and a second camera, the first camera and the second camera are disposed in parallel on a same side of the mobile terminal along a target direction, and focal lengths of the first camera and the second camera are the same, the target direction is a width direction of the mobile terminal or a length direction of the mobile terminal, and the calculating module is specifically configured to:

by using

8. The mobile terminal according to any of claims 5 to 7, wherein the layering module is specifically configured to: dividing image areas of each image frame of the video to obtain at least two image areas corresponding to each image frame of the video; determining a layer to which each image area in at least two image areas corresponding to each image frame of the video belongs according to depth information corresponding to the pixel point of each image area in at least two image areas corresponding to each image frame of the video; and layering the video according to the layer to which each image area belongs in at least two image areas corresponding to each image frame of the video to obtain at least two layers of sub-videos.

9. A mobile terminal, characterized in that it comprises a processor, a memory and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the video processing method according to any one of claims 1 to 4.