CN114697517A

CN114697517A - Video processing method and device, terminal equipment and storage medium

Info

Publication number: CN114697517A
Application number: CN202011587224.9A
Authority: CN
Inventors: 梁瑀航
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2020-12-28
Filing date: 2020-12-28
Publication date: 2022-07-01

Abstract

The disclosure relates to a video processing method, a video processing device, a terminal device and a storage medium, wherein the method comprises the following steps: responding to a shooting instruction, and acquiring a plurality of first video frames; wherein the first video frame comprises a target subject and a background image; determining a second video frame after zooming adjustment every time one first video frame is obtained; fusing a target main body in each second video frame with the corresponding first video frame, and determining a plurality of target frames, wherein the target main body is located in a preset area of the target frames; and generating a sliding and zooming target video according to the plurality of target frames. By using the method disclosed by the invention, in the process of obtaining the sliding zooming video, the size of the target main body can be kept unchanged all the time, and the position of the target main body is always in the preset area, so that the stability of the picture is ensured, and the sliding zooming effect is ensured.

Description

Video processing method and device, terminal equipment and storage medium

Technical Field

The present disclosure relates to the field of terminals, and in particular, to a video processing method and apparatus, a terminal device, and a storage medium.

Background

With the development of the camera function of the terminal device, some camera works which can be completed only by professional camera devices originally can be realized by the terminal devices such as mobile phones and the like. Such as video capture of a slide-zoom effect. In the video shooting process, the sliding Zoom (Dolly Zoom) refers to a visual effect that a target object in a picture is kept still and a background in the picture is far away from or close to a subject. The background of the sliding zoom is usually of an extensional or symbolic nature, such as under-bridges or tunnels of the shuttle type, buildings or pylons of the pilchard, open terraces of the background, etc.

In the related art, the following technical problems exist in a manner of obtaining a sliding zoom video by using non-professional equipment such as a mobile phone and the like: the photographer needs to move continuously in the shooting process, and the shooting with the mobile phone is difficult to keep stable, and the shooting picture shakes, which affects the shooting effect.

Disclosure of Invention

To overcome the problems in the related art, the present disclosure provides a video processing method, an apparatus, a terminal device, and a storage medium.

According to a first aspect of an embodiment of the present disclosure, a video processing method is provided, including:

responding to a shooting instruction, and acquiring a plurality of first video frames; wherein the first video frame comprises a target subject and a background image;

determining a second video frame after zooming adjustment every time one first video frame is obtained; the first video frames correspond to the second video frames one by one, and each second video frame comprises a target main body meeting a preset size;

fusing a target main body in each second video frame with the corresponding first video frame, and determining a plurality of target frames, wherein the target main body is located in a preset area of the target frames;

and generating a sliding and zooming target video according to the plurality of target frames.

Optionally, the acquiring a plurality of first video frames includes:

responding to the moving state along the preset direction, and acquiring a plurality of first video frames;

the determining a zoom-adjusted second video frame for each acquired first video frame comprises:

acquiring a target main body in each first video frame, and adjusting the target main body to a preset size to serve as the corresponding second video frame; or,

and after each first video frame is obtained, adjusting to preset zooming parameters and focusing with a target main body, and obtaining a second video frame after zooming adjustment.

Optionally, the target subject is located in a preset region of the target frame, and includes:

adjusting or cropping each first video frame to enable the target main body to be located in a preset area of the target frame; and the preset area is the picture center of the target frame.

Optionally, the adjusting or cropping each of the first video frames comprises:

determining picture offset information of background images in two adjacent first video frames;

cutting the first video frame with more background images in two adjacent first video frames in response to the picture offset information being in a preset range;

and in response to the picture offset information being out of a preset range, filling the first video frames with less background image display in two adjacent first video frames.

Optionally, the method further comprises:

responding to touch operation, and determining a target main body and a target frame in a viewing interface; wherein the target subject is located in a target frame;

and identifying the characteristic points of the target subject in the target frame.

Optionally, the target frame includes a plurality of recognition units, and the recognizing the feature point of the target subject in the target frame includes:

identifying, in each of the first video frames or the second video frames, a feature point of the target subject located in each of the identification units;

and determining a complete target main body in the target frame according to the characteristic points in each identification unit.

Optionally, the method further comprises:

in response to the fact that the picture changes unevenly between two adjacent target frames, determining an insertion image frame;

and controlling the inserted image frame to be displayed between two adjacent target frames.

Optionally, the method further comprises:

determining an adjustment strategy according to the playing speed of the target video, wherein the adjustment strategy comprises the following steps: and inserting a preset frame between the adjacent target frames to reduce the playing speed, or reducing the preset frame in the plurality of target frames to improve the playing speed.

According to a second aspect of the embodiments of the present disclosure, there is provided a video processing apparatus including:

the acquisition module is used for responding to a shooting instruction and acquiring a plurality of first video frames; wherein the first video frame comprises a target subject and a background image;

the first determining module is used for determining a second video frame after zooming adjustment every time one first video frame is obtained; the first video frames correspond to the second video frames one by one, and each second video frame comprises a target main body meeting a preset size;

a second determining module, configured to fuse a target subject in each second video frame with the corresponding first video frame, and determine multiple target frames, where the target subject is located in a preset region of the target frame;

and the generating module is used for generating a sliding and zooming target video according to the plurality of target frames.

Optionally, the obtaining module is configured to:

acquiring a plurality of first video frames in response to a moving state in a preset direction;

the first determination module is to:

Optionally, the second determining module is specifically configured to:

Optionally, the apparatus further comprises:

the third determining module is used for responding to the touch operation and determining a target main body and a target frame in the viewing interface; wherein the target subject is located in a target frame;

and the identification module is used for identifying the characteristic points of the target main body in the target frame.

Optionally, the target frame includes a plurality of identification units, and the identification module is specifically configured to:

identifying, in each of the first video frames or the second video frames, a feature point of the target subject located in each of the identification units; and determining a complete target main body in the target frame according to the characteristic points in each identification unit.

Optionally, the second determining module is further configured to:

According to a third aspect of the embodiments of the present disclosure, a terminal device is provided, including:

a processor;

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the video processing method as described in any one of the above.

According to a fourth aspect of embodiments of the present disclosure, a non-transitory computer-readable storage medium is presented, in which instructions, when executed by a processor of a terminal device, enable the terminal device to perform a video processing method as described in any one of the above.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: by using the method disclosed by the invention, in the process of obtaining the sliding zooming video, the size of the target main body can be kept unchanged all the time, and the position of the target main body is always in the preset area, so that the stability of the picture is ensured, and the sliding zooming effect is ensured.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a flow chart illustrating a method according to an example embodiment.

FIG. 2 is a flow chart illustrating a method according to an example embodiment.

FIG. 3 is a flow chart illustrating a method according to an example embodiment.

FIG. 4 is a flow chart illustrating a method according to an example embodiment.

Fig. 5 is a block diagram illustrating an apparatus according to an example embodiment.

Fig. 6 is a block diagram of a terminal device shown according to an example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

Terminal equipment such as mobile phones and the like are indispensable communication tools in life of people. In the process of using terminal equipment such as a mobile phone, people no longer only pursue practicability, and gradually have higher requirements on the functions and the use experience of the terminal equipment.

With the development of the camera function of the terminal equipment, some camera work which can be completed by professional camera equipment originally can be realized by the terminal equipment such as a mobile phone. Such as video capture of a slide-zoom effect. In the video shooting process, the sliding Zoom (Dolly Zoom) refers to a visual effect that a target object in a picture is kept still and a background in the picture is far away from or close to a subject. The background of the sliding zoom usually has an extension or a landmark, such as a shuttle bridge or a tunnel, a building or a pagoda of a carpel, a terrace with an open background, and the like.

In the related art, there are two main ways of acquiring a slide-zoom video: using professional equipment such as a slide rail, a photographer moves a camera having a zoom camera while zooming. The non-professional equipment such as a mobile phone is used, the mobile phone is used for shooting, and the sliding zoom video is obtained through post-stage picture processing.

To solve the technical problem in the related art, the present disclosure provides a video processing method, including: responding to a shooting instruction, and acquiring a plurality of first video frames; wherein the first video frame includes a target subject and a background image. Determining a second video frame after zooming adjustment every time one first video frame is obtained; the first video frames correspond to the second video frames one to one, and each second video frame comprises a target main body meeting the preset size. Fusing a target subject in each second video frame with the corresponding first video frame to determine a plurality of target frames; the target body is located in a preset area of the target frame. And generating the target video with the sliding zoom according to the plurality of target frames. By using the method disclosed by the invention, in the process of obtaining the sliding zooming video, the size of the target main body can be kept unchanged all the time, and the position of the target main body is always in the preset area, so that the stability of the picture is ensured, and the sliding zooming effect is ensured.

In an exemplary embodiment, the video processing method of the present embodiment is applied to a terminal device. The terminal device may be, for example, an electronic device with a camera module, such as a mobile phone, a notebook computer, a tablet computer, and a smart watch. In this embodiment, the operating system of the terminal device may be provided with a camera application program.

When the camera program is opened, the CPU or the bottom layer driver layer of the terminal device loads the algorithm resource required by shooting. It will be appreciated that the operating system of the terminal device is integrated with a number of image processing algorithms, such as algorithms associated with image data conversion, and further such as algorithms associated with image effect processing (beauty, blurring, watermarking, etc.). Different image processing algorithms may be integrated together by pipelines (pipeline) and different processing algorithms may be integrated in different pipelines. All the required pipelines in the current mode need to be created when the camera program is started or before shooting. In this embodiment, the algorithm resources at least include an image recognition algorithm, an image segmentation algorithm, a matting algorithm, and an image fusion algorithm.

As shown in fig. 1, the method of this embodiment specifically includes the following steps:

and S110, responding to a shooting instruction, and acquiring a plurality of first video frames.

And S120, determining a second video frame after zooming adjustment every time one first video frame is acquired.

S130, fusing the target main body in each second video frame with the corresponding first video frame, and determining a plurality of target frames, wherein the target main body is located in a preset area of the target frames.

And S140, generating a sliding and zooming target video according to the plurality of target frames.

In step S110, the shooting instruction may be, for example, an operation of clicking a shooting key by the user. The target subject refers to a subject to be kept unchanged in the middle of the slide zoom effect, and may be, for example, a person, a building, or the like. The background refers to the environment in which the target subject is located. In this embodiment, the target subject is a person as an example. After the user sends the shooting instruction, the terminal device may be in a moving state with the user.

For example, the terminal device moves along a preset direction with the user, and the preset direction may be, for example, a linear direction away from the target body or a linear direction close to the target body. At this time, the method specifically comprises the following steps: and the processor of the terminal equipment responds to the moving state along the preset direction and acquires a plurality of first video frames acquired by the camera shooting assembly in real time. The first video frame includes a target subject and a background image.

In step S120, the first video frames correspond to the second video frames one to one, that is, each time a first video frame is obtained, a corresponding second video frame is obtained. Each second video frame includes a target subject that is satisfied, and each target subject always satisfies a preset size. The preset dimensions may be, for example: at the start of shooting, the size of the subject in the initial video frame in the viewfinder interface. The initial video frame may be an image before the first video frame was taken, or the first video frame). In the subsequent process of determining the second video frame, the target subject can be always kept at the size, so that the target subject in the synthesized target frame is ensured to be always kept at the size.

In one example, the second video frame may be an image including only the region where the target subject is located. In this example, step S120 may specifically include the following steps:

and acquiring a target main body in each first video frame, and adjusting the target main body to a preset size to be used as a corresponding second video frame.

In this example, a cutting mode may be used to cut out the region including the target subject in each first video frame, or a matting algorithm may be used to extract the target subject in each first video frame.

And adjusting the extracted target subject image to a preset size. For example, if the terminal device moves in a direction away from the target subject, the target subject extracted from the first video frame may be enlarged and adjusted to a preset size, so that the target subject always satisfies the preset size.

In another example, the second video frame may also be a frame including a target subject and a background image. In this example, step S120 may specifically include the following steps:

and after each first video frame is obtained, adjusting to preset zooming parameters and focusing with the target main body to obtain a second video frame after zooming adjustment.

In this example, the processor of the terminal device may be provided with a preset zoom program. After each first video frame is acquired, the processor of the terminal device can control the camera shooting assembly to adjust the zooming parameters, focus with the target main body and acquire a second video frame after zooming adjustment. And after the second video frame is obtained, the processor automatically controls the zoom parameter of the camera shooting assembly to be recovered, and the next first video frame is obtained.

The second video frame and the first video frame in this example both include a target subject and a background. However, through zoom adjustment under preset zoom parameters, the target subject in the second video frame may satisfy the preset size.

The processor may determine the preset zoom parameter according to a moving distance of the user. The corresponding relation between the moving distance and the preset zooming parameters can be pre-established and stored, and the processor determines the corresponding preset zooming parameters according to the distance data detected by the sensor.

In step S130, each target frame includes: a background image captured from a first video frame, and a target subject captured from a corresponding second video frame.

After the target subject in each second video frame which meets the preset size is fused with the corresponding first video frame, the obtained target subject in each target frame also meets the preset size all the time. In addition, in the fusion process, it needs to be ensured that the target subject in each target frame is located in the preset region of the target frame, so that when a target video is subsequently generated, the target subject does not deviate, and the stability of the video picture is ensured.

The preset area may be, for example, a center position of a frame of the target frame. It can be understood that the target frame is adapted to the viewing interface or the display of the terminal device, that is, the center of the target frame may also be the center of the viewing interface or the display. In step S140, a plurality of target frames are combined to generate a target video with a sliding zoom effect in which the target subject size is constant, the target subject is always located at the center of the screen, and the background changes.

In an exemplary embodiment, step S130 of this embodiment specifically includes: and adjusting or cutting each first video frame to enable the target main body to be located in a preset area of the target frame, wherein the preset area is the picture center of the target frame.

During the shooting composition process, there may be a background picture shift caused by shaking, resulting in a target subject shift.

In the present embodiment, the target subject is kept at the center of the screen by adjusting or cropping the background screen portion of the first video frame.

As shown in fig. 2, the step may specifically include:

and S1301, determining picture offset information of background images in two adjacent first video frames.

S1302, in response to the picture offset information being within the preset range, the first video frame with more background images displayed in two adjacent first video frames is cut.

And S1303, in response to the fact that the picture offset information is out of the preset range, the first video frames with less background image display in the two adjacent first video frames are filled.

In step S1301, the screen offset information of the background image in the adjacent first video frame may be determined according to the position of the target subject. For example, the offset information of the target subjects of two adjacent first video frames is determined by taking the target subject in the first video frame as a reference. Or, the offset information of the background pictures in two adjacent first video frames is determined by taking preset pixel points in the background image as reference.

In step S1302, the preset range may be a custom set index. For example, the target subject offset information may be within a predetermined pixel range in the adjacent first video frame.

When the picture offset is within the preset range, the background image offset in the adjacent first video frame is small, and the background image offset can be adjusted in a fine adjustment mode, such as cropping or moving. The processor can determine that the background image of the two adjacent first video frames displays more first video frames, crop the pixel area of the edge of the first video frames, and move the cropped image. Therefore, in two adjacent target frames after synthesis, the target main bodies are located at corresponding positions (preset areas), and the stability of the picture is ensured.

In step S1303, when the picture offset is outside the preset range, it indicates that the offset of the background image in the adjacent first video frame is large. The processor may determine that the background image displays fewer of the two adjacent first video frames. And generating a pixel area according to the background image of the first video frame which displays more. And using the generated pixel regions to fill corresponding regions of the first video frame with less background image display. After the completion, the target bodies of the two synthesized target frames can be ensured to be located at corresponding positions (preset regions) in a moving mode.

In an exemplary embodiment, as shown in fig. 3, the method of the present embodiment further includes the steps of:

s100, responding to touch operation, and determining a target main body and a target frame in a viewing interface; and identifying the characteristic points of the target subject in the target frame.

The touch operation may be, for example, an operation of a user clicking a view interface. After the user opens the camera program, the target subject of the sliding zoom in the framed environment can be determined by himself.

The processor determines a target subject according to the touch operation of the user, and displays a target frame on the determined target subject. I.e. the target subject is located in the target box.

The processor can control the characteristic points of the target subject in the target frame to be identified so as to know or confirm that the target subject is always in the viewing interface in real time. The feature points may be, for example, human key points or human contour points of the target subject.

In an exemplary embodiment, the target frame includes a plurality of identification units, and in this embodiment, the step of identifying the feature points of the target subject in the target frame specifically includes the following steps:

in each of the first video frames or the second video frames, a feature point of the target subject located in each of the recognition units is recognized. And determining a complete target main body in the target frame according to the characteristic points in each identification unit.

And in the process of acquiring each first video frame or each second video frame, the processor controls to identify a target subject in the target frame, and focus or zoom the target subject.

In one example, when the second video frame is cropped from the corresponding first video frame. The processor may perform segmentation recognition on the target subject located in the target frame, for example, the target frame may be a plurality of recognition units (recognition frames) including a grid form. Therefore, the area occupied by the target main body in each identification unit and the background area can be determined, and the target main body in the first video frame can be accurately and completely obtained.

In another example, the zoom adjustment is obtained when the second video frame is subsequent to the corresponding first video frame. The processor may also always focus and zoom with the target subject while acquiring the second video frame. In the process of acquiring the second video frame, the processor can perform segmentation and identification on the target body in the target frame so as to accurately and completely extract the target body in the second video frame and subtract the background.

During shooting, the processor can stop shooting at any time according to a user instruction. Or, automatically controlling to stop shooting according to the characteristic point recognition of the target main body in the target frame.

In one example, the processor of the terminal device may stop shooting according to an instruction of a user to generate the target video. For example, when the user clicks the stop shooting button, the processor stops acquiring the video frames and performs synthesizing the target video.

In another example, the processor of the terminal device may also automatically control to stop shooting and generate the target video.

For example, when the processor learns that the number of the feature points of the target subject in the target frame always meets the preset number and the surface target subject is always in the viewing interface, the processor can control the camera shooting assembly to continuously shoot and collect images.

For another example, when the processor learns that the number of the feature points of the target subject in the target frame is less than the preset number, it indicates that the view finding interface cannot accurately acquire the complete target subject, and the processor may control the camera module to stop shooting to synthesize the target video. For example, the target subject is a human figure, the feature points may be the number of key points of the human body or the number of contour points of the human body, and the preset number may be set to 50, for example.

In an exemplary embodiment, as shown in fig. 4, the method of this embodiment further includes step S150, where step S150 specifically includes the following steps:

s1501, in response to uneven picture changes of two adjacent target frames, determining an insertion image frame.

And S1502, controlling the insertion image frame to be displayed between two adjacent target frames.

After the target video is generated, the processor can automatically check the target video or prompt a user to preview, and receive a corresponding feedback instruction after the user previews.

In step S1501, the processor automatically determines that the picture change in the target video is not uniform, such as the background picture change between any two adjacent target frames is too jumpy. Or when the user finds the non-uniformity in the preview process and sends an instruction needing frame insertion. The processor may perform frame interpolation processing. And determining an inserted image frame positioned between the two target frames according to the information of the two adjacent target frames by utilizing an AI image algorithm.

In step S1502, the processor inserts the determined insertion image frame between two adjacent target frames, thereby obtaining a new target video with uniform picture change.

In an exemplary embodiment, the method of the present embodiment further includes the steps of:

and S160, determining an adjustment strategy according to the playing speed of the target video.

Wherein, the adjustment strategy comprises: the preset frame is inserted between adjacent target frames to reduce the playing speed, or the preset frame in a plurality of target frames is reduced to improve the playing speed.

After the target video is generated, the processor can automatically check the target video or prompt the user to preview, and receive a corresponding feedback instruction after the user previews.

In one example, when the processor determines that the play speed is faster (greater than a threshold) in the target video, or based on a user issuing a play faster instruction during the preview process. The processor can perform frame interpolation processing on the current target video to reduce the playing speed of the target video.

The processor determines where a predetermined frame needs to be inserted. And determining a specific image frame of a preset frame by using an image algorithm according to adjacent frames before and after the position, and inserting the specific image frame. It is understood that the number of the preset frames may be multiple, for example, a corresponding preset frame is inserted between every two adjacent frames in the target video, and the overall speed is reduced.

In another example, when the processor determines that the play speed is slower (less than the threshold) in the target video, or based on a user issuing a slow play instruction during the preview process. The processor can perform frame extraction processing on the current target video to improve the playing speed of the target video.

The processor determines the location at which the preset frame needs to be extracted. For example, a preset frame with a small change in background picture is extracted as compared with an adjacent target frame. Or, in the target frames of the target video, a plurality of preset frames are extracted in an average and interval mode, and the playing speed is integrally increased.

In the video processing method in this embodiment, the size of the target subject can be kept unchanged and always kept at the center of the picture during shooting. After the target video is obtained after shooting, the playing speed and uniformity of the target video can be adjusted. The method not only obtains the sliding zoom video of the picture temperature, but also can realize the effect of accelerating playing or decelerating playing.

In an exemplary embodiment, the present disclosure also provides a video processing apparatus, as shown in fig. 5, the apparatus of the embodiment includes: an obtaining module 110, a first determining module 120, a second determining module 130, and a generating module 140, where the apparatus of this embodiment is used to implement the method shown in fig. 1. The obtaining module 110 is configured to obtain a plurality of first video frames in response to a shooting instruction; wherein the first video frame includes a target subject and a background image. The first determining module 120 is configured to determine a zoom-adjusted second video frame every time one first video frame is acquired; the first video frames correspond to the second video frames one to one, and each second video frame comprises a target main body meeting the preset size. The second determining module 130 is configured to fuse the target subject in each second video frame with the corresponding first video frame, and determine a plurality of target frames, where the target subject is located in a preset region of the target frame. The generating module 140 is configured to generate a target video with a sliding zoom according to a plurality of target frames.

In an exemplary embodiment, still referring to fig. 5, the obtaining module 110 of the present embodiment is configured to: in response to a moving state in a preset direction, a plurality of first video frames are acquired. The first determining module 120 is configured to: acquiring a target main body in each first video frame, and adjusting the target main body to a preset size to serve as a corresponding second video frame; or after each first video frame is obtained, adjusting to preset zooming parameters and focusing with the target main body, and obtaining a second video frame after zooming adjustment.

In an exemplary embodiment, still referring to fig. 5, in this embodiment, the second determining module 130 is specifically configured to: adjusting or cropping each first video frame to enable the target main body to be located in a preset area of the target frame; the preset area is the picture center of the target frame. The second determining module 130 is specifically configured to: determining picture offset information of background images in two adjacent first video frames; cutting a target frame with more background images displayed in two adjacent first video frames in response to the fact that the picture offset information is within a preset range; and in response to the picture offset information being out of the preset range, filling the first video frames with less background images in the two adjacent first video frames.

In an exemplary embodiment, the apparatus of the present embodiment further includes: the third determining module is used for responding to the touch operation and determining a target main body and a target frame in the viewing interface; wherein the target subject is located in the target frame; and the identification module is used for identifying the characteristic points of the target main body in the target frame. In this embodiment, the target frame includes a plurality of identification units, and the identification module is specifically configured to identify, in each first video frame or second video frame, a feature point of a target subject located in each identification unit; and determining a complete target main body in the target frame according to the characteristic points in each identification unit.

In an exemplary embodiment, still referring to fig. 5, in this embodiment, the second determining module 130 is further configured to: in response to the fact that the picture changes of two adjacent target frames are not uniform, determining an insertion image frame; and controlling the display of the insertion image frame between two adjacent target frames. In this embodiment, the second determining module 130 is further configured to: determining an adjustment strategy according to the playing speed of the target video, wherein the adjustment strategy comprises the following steps: the preset frame is inserted between adjacent target frames to reduce the playing speed, or the preset frame in a plurality of target frames is reduced to improve the playing speed.

Fig. 6 is a block diagram of a terminal device. The present disclosure also provides for a terminal device, for example, device 500 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, and the like.

Device 500 may include one or more of the following components: a processing component 502, a memory 504, a power component 506, a multimedia component 508, an audio component 510, an input/output (I/O) interface 512, a sensor component 514, and a communication component 516.

The processing component 502 generally controls overall operation of the device 500, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 502 may include one or more processors 520 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 502 can include one or more modules that facilitate interaction between the processing component 502 and other components. For example, the processing component 502 can include a multimedia module to facilitate interaction between the multimedia component 508 and the processing component 502.

The memory 504 is configured to store various types of data to support operation at the device 500. Examples of such data include instructions for any application or method operating on device 500, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 504 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power component 506 provides power to the various components of device 500. The power components 506 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the apparatus 500.

The multimedia component 508 includes a screen that provides an output interface between the device 500 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 508 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 500 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 510 is configured to output and/or input audio signals. For example, the audio component 510 includes a Microphone (MIC) configured to receive external audio signals when the device 500 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 504 or transmitted via the communication component 516. In some embodiments, audio component 510 further includes a speaker for outputting audio signals.

The I/O interface 512 provides an interface between the processing component 502 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 514 includes one or more sensors for providing various aspects of status assessment for the device 500. For example, the sensor assembly 514 may detect an open/closed state of the device 500, the relative positioning of the components, such as a display and keypad of the device 500, the sensor assembly 514 may also detect a change in the position of the device 500 or a component of the device 500, the presence or absence of user contact with the device 500, orientation or acceleration/deceleration of the device 500, and a change in the temperature of the apparatus 500. The sensor assembly 514 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 514 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 514 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 516 is configured to facilitate communications between the device 500 and other devices in a wired or wireless manner. The device 500 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 516 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 516 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the device 500 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

A non-transitory computer readable storage medium, such as the memory 504 including instructions executable by the processor 520 of the device 500 to perform the method, is provided in another exemplary embodiment of the present disclosure. For example, the computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like. The instructions in the storage medium, when executed by a processor of the terminal device, enable the terminal device to perform the above-described method.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A video processing method, comprising:

2. The video processing method according to claim 1,

the obtaining a plurality of first video frames comprises:

3. The video processing method of claim 1, wherein the target subject is located in a preset area of the target frame, and comprising:

4. The video processing method of claim 3, wherein said adjusting or cropping each of said first video frames comprises:

5. The video processing method of claim 1, wherein the method further comprises:

6. The video processing method according to claim 5, wherein the target frame includes a plurality of recognition units, and the recognizing the feature point of the target subject in the target frame includes:

7. The video processing method of claim 1, wherein the method further comprises:

8. The video processing method of claim 1, wherein the method further comprises:

9. A video processing apparatus, comprising:

10. The video processing apparatus according to claim 9,

the acquisition module is configured to:

the first determination module is to:

11. The video processing apparatus according to claim 9, wherein the second determining module is specifically configured to:

12. The video processing apparatus according to claim 11, wherein the second determining module is specifically configured to:

13. The video processing apparatus of claim 9, wherein the apparatus further comprises:

14. The video processing apparatus according to claim 13, wherein the target frame includes a plurality of recognition units, and the recognition module is specifically configured to:

15. The video processing apparatus of claim 9, wherein the second determining module is further configured to:

16. The video processing apparatus of claim 9, wherein the second determining module is further configured to:

17. A terminal device, comprising:

a processor;

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the video processing method of any of claims 1 to 8.

18. A non-transitory computer-readable storage medium, wherein instructions in the storage medium, when executed by a processor of a terminal device, enable the terminal device to perform the video processing method of any of claims 1 to 8.