CN108960130B

CN108960130B - Intelligent video file processing method and device

Info

Publication number: CN108960130B
Application number: CN201810705480.XA
Authority: CN
Inventors: 杨双新
Original assignee: Lenovo Beijing Ltd
Current assignee: Lenovo Beijing Ltd
Priority date: 2018-06-29
Filing date: 2018-06-29
Publication date: 2021-11-16
Anticipated expiration: 2038-06-29
Also published as: CN108960130A

Abstract

The present disclosure provides a video file intelligent processing method, including: acquiring a video file, wherein the video file comprises at least two frames of images; based on the intelligent analysis of the video file, obtaining scene data of at least one scene presented by the video file through the at least two frames of images; processing a first set of image frames corresponding to a first scene of the at least one scene based on the scene data to cause the first scene presented by the video file when played to highlight a region of interest, the region of interest being presented by the processed first set of image frames. The disclosure also provides an intelligent video file processing device and electronic equipment.

Description

Intelligent video file processing method and device

Technical Field

The disclosure relates to a video file intelligent processing method and device.

Background

With the rapid development of internet technology, video files are increasingly applied to many fields of work, life, entertainment and the like of people due to the richness of contents displayed by the video files. In some cases, the scene of the video file presentation is complex, the presentation content may include multiple targets, and when a user watches video playing, there is no way to quickly and directly find out an object that the video file wishes to focus on from the complex scene and the multiple targets, which results in poor presentation effect of the video file and less than ideal viewing experience of the user.

Disclosure of Invention

One aspect of the present disclosure provides a method for intelligently processing a video file, including: the method comprises the steps of obtaining a video file, wherein the video file comprises at least two frames of images, obtaining scene data of at least one scene presented by the video file through the at least two frames of images based on intelligent analysis of the video file, and processing a first group of image frames corresponding to a first scene in the at least one scene based on the scene data so that the first scene presented by the video file during playing highlights a focus area, wherein the focus area is presented through the processed first group of image frames.

Optionally, the processing a first group of image frames corresponding to a first scene of the at least one scene based on the scene data includes: processing a set of image frames corresponding to each of the at least one scene based on the scene data, the first scene being one of the at least one scene.

Optionally, the method further includes: and obtaining a first trigger operation, wherein the first trigger operation is used for indicating that the video file is output based on the common mode, or/and obtaining a second trigger operation, the second trigger operation is used for indicating that the video file is output based on the enhanced mode, and the video file is output in the enhanced mode at least comprises a processed image frame.

Optionally, the obtaining scene data of at least one scene represented by the video file through the at least two frames of images based on the intelligent analysis of the video file includes: analyzing the at least two frame images, and defining the image meeting the preset condition as a group of image frames, wherein for any frame image, when the association degree of the frame image and the previous frame image is higher than a preset threshold value, the frame image is divided into the group to which the previous frame image belongs. A group of image frames corresponds to a scene, and the image information data of at least one frame of image in the group of image frames is the scene data of the corresponding scene.

Optionally, the method further includes: for any scene, in at least one frame image of image frames corresponding to the scene, determining an object satisfying a predetermined rule except an object of interest of a scene previous to the scene as the object of interest of the scene, wherein the predetermined rule at least comprises one of the following: a predetermined depth of field range, and/or a predetermined target object identification parameter range.

Optionally, the processing a first group of image frames corresponding to a first scene of the at least one scene based on the scene data includes: regarding any frame image in a first group of image frames corresponding to a first scene, taking a region corresponding to a focus object of the first scene as a focus region, and processing any frame image based on the focus region.

Optionally, the processing of the any frame of image based on the attention area includes: and acquiring the depth of field of the attention area, and blurring the object in the image except the depth of field of the attention area.

Another aspect of the present disclosure provides a video file intelligent processing apparatus, including: the acquisition module is used for acquiring a video file, and the video file comprises at least two frames of images. And the analysis module is used for obtaining scene data of at least one scene presented by the video file through the at least two frames of images based on the intelligent analysis of the video file. A processing module, configured to process a first set of image frames corresponding to a first scene of the at least one scene based on the scene data, so that the first scene presented when the video file is played highlights a region of interest, which is presented by the processed first set of image frames.

Optionally, the processing, by the analysis module, a first group of image frames corresponding to a first scene of the at least one scene based on the scene data includes: an analysis module configured to process a set of image frames corresponding to each of the at least one scene based on the scene data, wherein the first scene is one of the at least one scene.

Optionally, the apparatus further comprises: the trigger module is used for obtaining a first trigger operation, wherein the first trigger operation is used for indicating that the video file is output based on a common mode, or/and obtaining a second trigger operation, the second trigger operation is used for indicating that the video file is output based on an enhanced mode, and the video file is output in the enhanced mode at least comprises processed image frames.

Optionally, the obtaining, by the analysis module based on the intelligent analysis of the video file, the scene data of at least one scene represented by the video file through the at least two frames of images includes: and the analysis module is used for analyzing at least two frame images in the video file and defining the images meeting the preset conditions as a group of image frames, wherein for any frame image, when the association degree of the frame image and the previous frame image is higher than a preset threshold value, the frame image is divided into the group to which the previous frame image belongs. A group of image frames corresponds to a scene, and the image information data of at least one frame of image in the group of image frames is the scene data of the corresponding scene.

Optionally, the apparatus further comprises: the device comprises a preprocessing module and a judging module, wherein the preprocessing module is used for determining an object which is except an object of interest of a scene which is previous to the scene and meets a preset rule as the object of interest of the scene in at least one frame of image frames corresponding to the scene. Wherein the predetermined rule includes at least one of: a predetermined depth of field range, and/or a predetermined target object identification parameter range.

Optionally, the processing module, based on the scene data, processing a first group of image frames corresponding to a first scene of the at least one scene includes: and the processing module is used for processing any frame image in the first group of image frames corresponding to the first scene by taking the region corresponding to the attention object of the first scene as the attention region based on the attention region.

Optionally, the processing module processes the image of any frame based on the region of interest includes: and the processing module is used for acquiring the depth of field of the attention area and blurring the objects in the image except the depth of field of the attention area.

Another aspect of the present disclosure provides an electronic device comprising a processor, a memory, and a computer program stored on the memory and executable on the processor, which when executed by the processor implements the method as described above.

Another aspect of the disclosure provides a non-volatile storage medium storing computer-executable instructions for implementing the method as described above when executed.

Another aspect of the disclosure provides a computer program comprising computer executable instructions for implementing the method as described above when executed.

Drawings

For a more complete understanding of the present disclosure and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:

fig. 1 schematically illustrates an application scenario of a video file intelligent processing method and apparatus according to an embodiment of the present disclosure;

FIG. 2 schematically shows a flow diagram of a method of intelligent processing of video files according to an embodiment of the present disclosure;

FIG. 3 schematically shows a schematic diagram of a process of obtaining scene data of a scene presented by a video file, according to an embodiment of the disclosure;

4A-4C schematically illustrate three frames of images in a first scene of a video file, according to an embodiment of the present disclosure;

4D-4F schematically illustrate three frames of images in a second scene of a video file, in accordance with an embodiment of the present disclosure;

FIG. 5 schematically shows a block diagram of a video file intelligent processing apparatus according to an embodiment of the present disclosure;

FIG. 6 schematically shows a block diagram of a video file intelligent processing apparatus according to another embodiment of the present disclosure; and

fig. 7 schematically shows a block diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). It will be further understood by those within the art that virtually any disjunctive word and/or phrase presenting two or more alternative terms, whether in the description, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms. For example, the phrase "a or B" should be understood to include the possibility of "a" or "B", or "a and B".

Some block diagrams and/or flow diagrams are shown in the figures. It will be understood that some blocks of the block diagrams and/or flowchart illustrations, or combinations thereof, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the instructions, which execute via the processor, create means for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks.

Accordingly, the techniques of this disclosure may be implemented in hardware and/or software (including firmware, microcode, etc.). In addition, the techniques of this disclosure may take the form of a computer program product on a computer-readable medium having instructions stored thereon for use by or in connection with an instruction execution system. In the context of this disclosure, a computer-readable medium may be any medium that can contain, store, communicate, propagate, or transport the instructions. For example, the computer readable medium can include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. Specific examples of the computer readable medium include: magnetic storage devices, such as magnetic tape or Hard Disk Drives (HDDs); optical storage devices, such as compact disks (CD-ROMs); a memory, such as a Random Access Memory (RAM) or a flash memory; and/or wired/wireless communication links.

The embodiment of the disclosure provides a video file intelligent processing method and device. The method comprises an intelligent analysis process and an intelligent processing process. In the intelligent analysis process, scene data of each scene presented by the video file is obtained. In the intelligent processing process, the image frames corresponding to the scenes are processed based on the scene data, so that the scenes displayed when the video file is played stand out of the corresponding attention area.

Fig. 1 schematically shows an application scenario of a video file intelligent processing method and apparatus according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a scenario in which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, but does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.

As shown in fig. 1, the application scenario shows a situation that a user watches a video through the electronic device 110, the electronic device 110 may be various electronic devices having a display screen and supporting video playing, including but not limited to a smart phone, a tablet computer, a laptop portable computer, a desktop computer, and the like, the electronic device 110 may or may not have a function of supporting video/image capturing, and when the electronic device 110 has a capturing function, the electronic device 110 may have one camera or more than two cameras.

The method and the device for intelligently processing the video file, which are provided by the embodiment of the disclosure, can be applied to the electronic device shown in fig. 1 to realize intelligent processing of the video file and obtain a video playing effect of highlighting the attention object at each stage during playing.

Fig. 2 schematically shows a flow chart of a video file intelligent processing method according to an embodiment of the present disclosure.

As shown in fig. 2, the method includes operations S201 to S203.

In operation S201, a video file including at least two frames of images is acquired.

The video file acquired by the operation can be a part of the video in the recording process or the video which is already recorded.

In operation S202, scene data of at least one scene presented by the video file through the at least two frames of images is obtained based on the intelligent analysis of the video file.

In the operation, because the video file can present one or more scenes through at least two frames of images, scene data corresponding to any one scene presented by the video file is obtained based on intelligent analysis on the video file, and the scene data corresponding to different scenes are different. The scene is a video display scene constructed by at least one frame of image, in the same scene, the objects of interest of the content displayed by the video file should be consistent, while in different scenes, the objects of interest of the content displayed by the video file should be inconsistent, and the change of the scene in the video reflects the change of the objects of interest.

In operation S203, a first set of image frames corresponding to a first scene of the at least one scene is processed based on the scene data, so that the first scene presented by the video file when played highlights a region of interest, which is presented by the processed first set of image frames.

The first scene represents any one of one or more scenes presented by the video file, and the first scene is a video expression scene constructed by at least one frame of image, so that one scene corresponds to one or more frames of images, and the one or more frames of images form a first group of image frames corresponding to the first scene. In each frame image in the processed first group of image frames, the area corresponding to the attention object of the first scene is highlighted, so that the attention area of the first scene presented when the video file is played is highlighted. It should be noted that, for each frame image in the first group of image frames, the attention object is consistent, but the corresponding region of the attention object in each frame image may be inconsistent.

It can be seen that, in the method shown in fig. 2, different scenes represented by a video file are distinguished and corresponding scene data are obtained through intelligent analysis of the video file, and then a group of image frames corresponding to any scene is processed based on the scene data, so that each scene represented by the video file during playing highlights a corresponding attention area, and further, a user can more naturally pay attention to an attention object displayed by each scene in the video along with the highlighted part in the video during watching the video, and naturally and smoothly change the attention object corresponding to each scene along with switching of the scenes in the video, that is, intellectualization of video playing is realized through intelligent processing of the video file, and the video watching requirements of the user are met.

In one embodiment of the present disclosure, the processing of the first group of image frames corresponding to the first scene of the at least one scene based on the scene data in operation S203 in the method shown in fig. 2 includes: processing a set of image frames corresponding to each of the at least one scene based on the scene data, the first scene being one of the at least one scene. As explained above, the first scene is any one of one or more scenes represented by the video file, and operation S203 is adapted to process a set of image frames corresponding to each scene represented by the video file, so that when the video file is played, the video file is played to any scene, i.e. the attention area corresponding to the scene is highlighted, and the attention object and the change in the video are intelligently shown.

As an optional embodiment, whether to display the image frames corresponding to the processed scenes when the video file is played may be selected as needed, and the method shown in fig. 2 further includes: and obtaining a first trigger operation, wherein the first trigger operation is used for indicating that the video file is output based on a common mode, or/and obtaining a second trigger operation, wherein the second trigger operation is used for indicating that the video file is output based on an enhanced mode, and the video file is output in the enhanced mode at least comprises processed image frames.

Based on this embodiment, after the image frames corresponding to each scene in the video file are processed in operation S203 of the method shown in fig. 2, the obtained processing result of the video file may be stored in the preset storage area, when the user wishes to view the unprocessed original video, a first trigger operation may be performed, the scheme may output and play the image frames in the unprocessed original video file according to a time sequence in response to the first trigger operation, and may perform a second trigger operation when the user wishes to view the intelligently processed video, and the scheme may output and play the image frames in the video file subjected to operation S203 according to a time sequence in response to the second trigger operation, so that the user selects a viewing mode according to a requirement to obtain a corresponding playing effect.

The intelligent analysis and processing process of the video file in the method shown in fig. 2 will be explained in the following.

In an embodiment of the present disclosure, the operation S202 of the method shown in fig. 2, based on the intelligent analysis of the video file, obtaining scene data of at least one scene presented by the video file through the at least two frames of images includes: analyzing the at least two frame images, and defining the image meeting the preset condition as a group of image frames, wherein for any frame image, when the association degree of the frame image and the previous frame image is higher than a preset threshold value, the frame image is divided into the group to which the previous frame image belongs. In a video file, a group of image frames corresponds to a scene, and image information data of at least one frame of image in the group of image frames is scene data of the corresponding scene. The image information data of at least one image in a group of image frames may be one or more of information carried by the image, such as one or more of depth information of the image, intensity information of the image, and the like.

Fig. 3 schematically shows a schematic diagram of a process of obtaining scene data of a scene presented by a video file according to an embodiment of the present disclosure.

As shown in fig. 3, the video file includes 6 frames of images, which are sequentially from front to back in time order: image 1, image 2, image 3, image 4, image 5, and image 6. Based on the intelligent analysis of the video file, the image frames meeting the predetermined condition are specified as a group of image frames, specifically, the image frames are sequentially analyzed from front to back in time sequence, the image 1 is analyzed, since the image 1 is the first frame image in the video file, the image 1 is temporarily and separately placed into the first group of image frames, then the image 2 is analyzed, the association degree of the image 2 and the image 1 is calculated, when the association degree of the image 2 and the image 1 is higher than a preset threshold value, the image 2 is divided into the first group of image frames, when the association degree of the image 2 and the image 1 is not higher than the preset threshold value, the image 2 is temporarily and separately placed into the second group of image frames, at this time, only one image 1 in the first group of image frames can also be determined. And analyzing the image 3, calculating the association degree of the image 3 and the image 2, dividing the image 3 into groups to which the image 2 belongs when the association degree of the image 3 and the image 2 is higher than a preset threshold, indicating that the image 3 and the image 2 do not belong to the same group when the association degree of the image 3 and the image 2 is not higher than the preset threshold, and temporarily and independently placing the image 3 into a new group. And by analogy, analyzing the images 4-6 in sequence, and dividing the images into corresponding groups according to the analysis result. In general, for each frame of image in the video file, when the degree of association between the frame of image and the previous frame of image is higher than a preset threshold, the frame of image is divided into a group to which the previous frame of image belongs, and when the degree of association between the frame of image and the previous frame of image is not higher than the preset threshold, the frame of image is divided into a new group. As can be seen from fig. 3, the image 1 and the image 2 are divided into a first group of image frames, the image 3, the image 4 and the image 5 are divided into a second group of image frames, the image 6 is divided into a third group of image frames, the first group of image frames corresponds to the scene 1, the second group of image frames corresponds to the scene 2, and the third group of image frames corresponds to the scene 3, that is, three scenes presented by the video file are divided through the intelligent analysis, wherein the scene data of the scene 1 includes: depth information and intensity information of the image 1, and depth information and intensity information of the image 2, the scene data of the scene 2 including: depth of field information and intensity information of the image 3, depth of field information and intensity information of the image 4, and depth of field information and intensity information of the image 5, the scene data of the scene 3 including: depth information and intensity information of the image 6.

The method for calculating the degree of association between two adjacent images may be various, for example, taking the image 1 and the image 2 as an example, the degree of similarity and/or the degree of change of the image 1 and the image 2 may be calculated according to the intensity information of the image 1 and the intensity information of the image 2, the degree of association between the image 1 and the image 2 may be represented by the degree of similarity and/or the degree of change, the degree of similarity and/or the degree of change of the image 1 and the image 2 may also be calculated according to the depth information of the image 1 and the depth information of the image 2, the degree of association between the image 1 and the image 2 may be represented by the degree of similarity and/or the degree of change, more precisely, the depth information and the intensity information of the image 1 may constitute a three-dimensional picture corresponding to the image 1, the depth information and the intensity information of the image 2 may constitute a three-dimensional picture corresponding to the image 2, and the degree of similarity and/or the degree of change between the three-dimensional picture corresponding to the image 1 and the image 2 may be calculated to represent the three-dimensional picture corresponding to the image 2 Degree of association of image 1 and image 2. If the above-mentioned intelligent analysis processing process is performed in real time during the video recording process, the association degree between the image 1 and the image 2 can be represented by the change degree of the physical characteristic data of the device recording the video, the physical characteristic data of the device can be the pose data of the device and/or the focal length of the lens of the device, for example, the pose data of the device when the image 1 and the image 2 are recorded is collected by the pose sensor on the device, if the change of the pose data does not exceed the preset first value, the association degree between the image 1 and the image 2 can be considered to be higher than the preset threshold value, if the change of the pose data exceeds the preset first value, it is described that the device has undergone a relatively large rotation, movement, etc. during the recording process, the scenes presented by the recorded video file are necessarily switched, that is, the association degree between the image 1 and the image 2 is not higher than the preset threshold value, similarly, the focal length of the lens of the device is obtained when the images 1 and 2 are recorded through a driving motor of the lens of the device, if the change of the focal length of the lens does not exceed a preset second value, the degree of association between the images 1 and 2 can be considered to be higher than a preset threshold value, and if the change of the focal length of the lens exceeds the preset second value, the device draws or pushes the lens closer to or away from the lens in the recording process, and scenes presented by the recorded video files are necessarily switched, namely the degree of association between the images 1 and 2 is not higher than the preset threshold value.

Further, after dividing each scene from the video file, the attention object of each scene needs to be determined, and the method shown in fig. 2 further includes: for any scene, in at least one frame image of the image frames corresponding to the scene, determining an object meeting a predetermined rule except an object of interest of a scene previous to the scene as the object of interest of the scene. Wherein the predetermined rule includes at least one of: a predetermined depth of field range, and/or a predetermined target object identification parameter range.

Following the example shown in fig. 3 above, the video file is divided into scene 1, scene 2, and scene 3, and the attention objects of adjacent scenes are different when the scene change from one scene to the next corresponds to a scene change in the video. As for the scene 1, the target with the smallest depth of view in the images 1 and 2 may be determined as the target of interest in the scene 1, or the predetermined target identified in the images 1 and 2 may also be determined as the target of interest in the scene 1 by identifying the predetermined target, where the identification of the predetermined target may be a generalized face identification, a face identification of a specific person (such as a star), or an identification of a specific object (such as an animal, a plant, etc.), which is not limited herein. Similarly, for the scene 2, in the images 3 to 5, an object satisfying a predetermined rule other than the object of interest of the scene 1 may be determined as the object of interest of the scene 2, for example, a target having the smallest depth of field other than the object of interest of the scene 1 in the images 3 to 5 may be determined as the object of interest of the scene 2, or a predetermined target object identified other than the object of interest of the scene 1 in the images 3 to 5 may be determined as the object of interest of the scene 1 by identifying the predetermined target object. Similarly, the same is true for the determination of the attention object of the scene 3, and in the image 6, an object satisfying a predetermined rule other than the attention object of the scene 2 may be determined as the attention object of the scene 3, and will not be described again.

On the basis that the foregoing embodiment determines the attention object corresponding to each scene in the video file, as an alternative embodiment, the processing, in operation S203 of the method shown in fig. 2, the first group of image frames corresponding to the first scene in the at least one scene based on the scene data includes: regarding any frame image in a first group of image frames corresponding to a first scene, a region corresponding to a focus object of the first scene is used as a focus region, and the image is processed based on the focus region. The first scene represents any one of one or more scenes presented in the video file, and the embodiment is described by using an image processing procedure of the first scene, and is applicable to an image processing procedure of any one of scenes in the video file.

Specifically, the processing of any one of the frame images based on the attention area may include: and acquiring the depth of field of the attention area, and blurring the object except the depth of field in the image.

Still following the example shown in fig. 3, the video file is divided into scene 1, scene 2, and scene 3, with image 1-image 2 corresponding to scene 1, image 3-image 5 corresponding to scene 2, and image 6 corresponding to scene 3, assuming that it is determined from the above that the object of interest of scene 1 is a first object, the object of interest of scene 2 is a second object, and the object of interest of scene 3 is a third object. For the scene 1, in the image 1, the region corresponding to the first object in the image 1 is the attention region of the image 1, and the blurring processing is performed on the region other than the attention region in the image 1, specifically, the coordinate range of the attention region can be accurately known by the depth of field of the attention region, so that the blurring processing can be accurately performed on the region other than the coordinate range, and similarly, in the image 2, the region corresponding to the first object in the image 2 is the attention region of the image 2, and the blurring processing is performed on the region other than the attention region in the image 2. In the scene 2, in the image 3, the region corresponding to the second object in the image 3 is the region of interest of the image 3, and the blurring process is performed on the region other than the region of interest in the image 3, and similarly, in the image 4, the region corresponding to the second object in the image 4 is the region of interest of the image 4, and the blurring process is performed on the region other than the region of interest in the image 4, and similarly, in the image 5, the region corresponding to the second object in the image 5 is the region of interest of the image 5, and the blurring process is performed on the region other than the region of interest in the image 5. In the scene 3, the region of the image 6 corresponding to the third object in the image 6 is the region of interest in the image 6, and the region of the image 6 other than the region of interest is blurred.

The method shown in fig. 2 is further described with reference to fig. 4A-4F in conjunction with specific embodiments.

A video file shows the dynamic process of a person standing in front (closer to the lens) turning his head backwards looking at a person standing behind (further from the lens). The video file comprises one or more frames of images, the video file is firstly subjected to scene division, namely the one or more frames of images are divided into corresponding image frames, and then the image frames corresponding to the scenes are processed according to scene data of each scene, so that each scene highlights a corresponding attention area when the video file is played.

Fig. 4A-4C schematically illustrate three frames of images in a first scene of a video file according to an embodiment of the present disclosure.

Fig. 4D to 4F schematically illustrate three frames of images in a second scene of a video file according to an embodiment of the present disclosure.

As can be seen from fig. 4A to 4C, the object of interest of the first scene is a person standing in front and turning back. As can be seen from fig. 4D to 4F, the object of interest of the second scene is a person standing behind. When the scene is divided, whether each frame image belongs to the image frame corresponding to the first scene can be determined according to the head change degree of the person standing in front, wherein the head change degree can be represented by the area change data of the face contour, and the face contour can be obtained by means of face recognition. For example, for one frame image, the previous frame image is determined to belong to a first scene, when the degree of change of the head of a person standing in front in the frame image is associated with the degree of change of the head in the previous frame image to a degree higher than a preset threshold, the frame image corresponding to the first scene with the previous frame image is determined, otherwise, the frame image is determined to belong to the image frame corresponding to the second scene. After the scenes are divided, for a first scene, a person with the minimum depth of field in at least one frame of image corresponding to the first scene is determined as an object of interest of the first scene, namely, a person standing in front, and for a second scene, in at least one frame of image corresponding to the second scene, a person with the minimum depth of field except the object of interest of the first scene is determined as an object of interest of the second scene, namely, a person standing in back.

Next, the image frames corresponding to the respective scenes in the video file are processed based on the scene data, for the first scene, depth information corresponding to the person standing in front is acquired in each frame of image corresponding to the first scene, the coordinate contour of the person standing in front is determined based on the depth information, and the area outside the contour is blurred, with the processing results shown in fig. 4A to 4C. For the second scene, in each frame of image corresponding to the second scene, depth information corresponding to a person standing behind is acquired, a coordinate contour of the person standing behind is determined according to the depth information, blurring processing is performed on a region other than the contour, and processing results are shown in fig. 4D to 4F.

Fig. 5 schematically shows a block diagram of a video file intelligent processing device according to an embodiment of the present disclosure.

As shown in fig. 5, the intelligent processing device 500 for video files comprises an acquisition module 510, an analysis module 520, and a processing module 530. The video file intelligent processing device 500 can execute the method described above with reference to fig. 2 to 4F to realize intelligent processing of video files.

Specifically, the obtaining module 510 is configured to obtain a video file, where the video file includes at least two frames of images.

The analysis module 520 is configured to obtain scene data of at least one scene represented by the video file through the at least two frames of images based on the intelligent analysis of the video file.

The processing module 530 is configured to process a first set of image frames corresponding to a first scene of the at least one scene based on the scene data, so that the first scene presented by the video file when playing highlights a region of interest, which is presented by the processed first set of image frames.

It can be seen that, the apparatus shown in fig. 5 distinguishes different scenes represented by a video file and obtains corresponding scene data through intelligent analysis of the video file, and then processes a group of image frames corresponding to any scene based on the scene data, so that each scene represented by the video file during playing highlights a corresponding attention area, and further, a user can more naturally pay attention to an attention object displayed by each scene in the video along with the highlighted part in the video during watching the video, and naturally and smoothly change the attention object corresponding to each scene along with switching of the scenes in the video, that is, the intellectualization of video playing is realized through intelligent processing of the video file, and the video watching requirements of the user are met.

In one embodiment of the present disclosure, processing, by the analysis module 520, a first set of image frames corresponding to a first scene of the at least one scene based on the scene data includes: the analysis module 520 is configured to process a set of image frames corresponding to each of the at least one scene based on the scene data, wherein the first scene is one of the at least one scene.

Fig. 6 schematically shows a block diagram of a video file intelligent processing device according to another embodiment of the present disclosure.

As shown in fig. 6, the intelligent processing device 600 for video files comprises an acquisition module 510, an analysis module 520, a processing module 530, a trigger module 540, and a pre-processing module 550.

The obtaining module 510, the analyzing module 520, and the processing module 530 are described above, and repeated descriptions are omitted.

The trigger module 540 is configured to obtain a first trigger operation, where the first trigger operation is used to instruct the video file to be output based on the normal mode, or/and obtain a second trigger operation, where the second trigger operation is used to instruct the video file to be output based on the enhanced mode, where the video file output in the enhanced mode at least includes processed image frames.

In an embodiment of the disclosure, the obtaining, by the analysis module 520, scene data of at least one scene represented by the video file through the at least two frames of images based on the intelligent analysis of the video file includes: the analysis module 520 is configured to analyze at least two frames of images in the video file, and define an image that meets a predetermined condition as a group of image frames, wherein for any frame of image, when the degree of association between the frame of image and a previous frame of image is higher than a preset threshold, the frame of image is divided into the group to which the previous frame of image belongs. A group of image frames corresponds to a scene, and the image information data of at least one frame of image in the group of image frames is the scene data of the corresponding scene.

On this basis, as an optional embodiment, the preprocessing module 550 is configured to, for any scene, determine, as the attention object of the scene, an object that satisfies a predetermined rule, except the attention object of a scene previous to the scene, in at least one frame of image frames corresponding to the scene. Wherein the predetermined rule includes at least one of: a predetermined depth of field range, and/or a predetermined target object identification parameter range.

In an embodiment of the disclosure, the processing module 530 for processing a first group of image frames corresponding to a first scene of the at least one scene based on the scene data includes: the processing module 530 is configured to, for any frame image in the first group of image frames corresponding to the first scene, take a region corresponding to the attention object of the first scene as an attention region, and process the any frame image based on the attention region.

Specifically, the processing module 530 processes the image of any frame based on the region of interest, including: the processing module 530 is configured to acquire the depth of field of the region of interest, and perform blurring on an object in the image except for the depth of field of the region of interest.

It should be noted that the implementation, solved technical problems, implemented functions, and achieved technical effects of each module/unit/subunit and the like in the apparatus part embodiment are respectively the same as or similar to the implementation, solved technical problems, implemented functions, and achieved technical effects of each corresponding step in the method part embodiment, and are not described herein again.

Any number of modules, sub-modules, units, sub-units, or at least part of the functionality of any number thereof according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, sub-modules, units, and sub-units according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in any other reasonable manner of hardware or firmware by integrating or packaging a circuit, or in any one of or a suitable combination of software, hardware, and firmware implementations. Alternatively, one or more of the modules, sub-modules, units, sub-units according to embodiments of the disclosure may be at least partially implemented as a computer program module, which when executed may perform the corresponding functions.

For example, any of the obtaining module 510, the analyzing module 520, the processing module 530, the triggering module 540, and the preprocessing module 550 may be combined and implemented in one module, or any one of them may be split into multiple modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the obtaining module 510, the analyzing module 520, the processing module 530, the triggering module 540, and the preprocessing module 550 may be at least partially implemented as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or implemented by any one of three implementations of software, hardware, and firmware, or implemented by a suitable combination of any several of them. Alternatively, at least one of the obtaining module 510, the analyzing module 520, the processing module 530, the triggering module 540, and the pre-processing module 550 may be at least partially implemented as a computer program module, which when executed, may perform a corresponding function.

Fig. 7 schematically shows a block diagram of an electronic device adapted to implement the above described method according to an embodiment of the present disclosure. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 7, electronic device 700 includes a processor 710 and a computer-readable storage medium 720. The electronic device 700 may perform a method according to an embodiment of the present disclosure.

In particular, processor 710 may comprise, for example, a general purpose microprocessor, an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), and/or the like. The processor 710 may also include on-board memory for caching purposes. Processor 710 may be a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present disclosure.

Computer-readable storage medium 720 may be, for example, any medium that can contain, store, communicate, propagate, or transport the instructions. For example, a readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. Specific examples of the readable storage medium include: magnetic storage devices, such as magnetic tape or Hard Disk Drives (HDDs); optical storage devices, such as compact disks (CD-ROMs); a memory, such as a Random Access Memory (RAM) or a flash memory; and/or wired/wireless communication links.

The computer-readable storage medium 720 may include a computer program 721, which computer program 721 may include code/computer-executable instructions that, when executed by the processor 710, cause the processor 710 to perform a method according to an embodiment of the disclosure, or any variation thereof.

The computer program 721 may be configured with, for example, computer program code comprising computer program modules. For example, in an example embodiment, code in computer program 721 may include one or more program modules, including 721A, modules 721B, … …, for example. It should be noted that the division and number of modules are not fixed, and those skilled in the art may use suitable program modules or program module combinations according to actual situations, so that the processor 710 may execute the method according to the embodiment of the present disclosure or any variation thereof when the program modules are executed by the processor 710.

According to an embodiment of the present invention, at least one of the obtaining module 510, the analyzing module 520, the processing module 530, the triggering module 540, and the preprocessing module 550 may be implemented as a computer program module described with reference to fig. 7, which, when executed by the processor 710, may implement the respective operations described above.

The embodiment disclosed in the present application is directed to that any video file at least has a normal mode and an enhanced mode when being played, and in the normal mode, the video file is decoded and output for display without further analysis. The enhanced mode requires further image processing for the video frame after decoding and before outputting, and outputs the processed image. For example, for the enhancement mode may be an enhancement for an object of interest in the video file. That is, in the process of outputting the video file in the enhanced mode, at least the image frame having the attention object is output with a higher definition of the attention object relative to other objects in the image frame, and the attention object is clearer relative to other objects even if the attention object is located in the background regardless of whether the attention object is located in the foreground or not in the image frame. Wherein the object of interest may be a certain object (e.g. an actor, a cartoon object, an object, etc.) in the video file selected by the user on the interactive interface provided in the enhanced mode. In addition, the object of interest may be determined in the enhanced mode based on analysis of parameter information of the video file at the time of recording. For example, an object located in the foreground determined based on the depth of field in different scenes is an object of interest; or based on the determined object at the center position of each previous scene in different scenes as the attention object. Therefore, during the playing process of the video file, the attention object in the current scene is dynamically determined based on the AI analysis scene, so that background blurring except the attention object is performed.

The present disclosure also provides a computer-readable medium, which may be embodied in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer readable medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.

According to embodiments of the present disclosure, a computer readable medium may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, optical fiber cable, radio frequency signals, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.

While the disclosure has been shown and described with reference to certain exemplary embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the disclosure as defined by the appended claims and their equivalents. Accordingly, the scope of the present disclosure should not be limited to the above-described embodiments, but should be defined not only by the appended claims, but also by equivalents thereof.

Claims

1. An intelligent video file processing method comprises the following steps:

acquiring a video file, wherein the video file comprises at least two frames of images;

based on the intelligent analysis of the video file, obtaining scene data of at least one scene presented by the video file through the at least two frames of images;

processing a first set of image frames corresponding to a first scene of the at least one scene based on the scene data to cause the first scene presented by the video file when played to highlight a region of interest, the region of interest being presented by the processed first set of image frames;

wherein said processing a first set of image frames corresponding to a first scene of the at least one scene based on the scene data comprises:

processing a set of image frames corresponding to each of the at least one scene based on the scene data, the first scene being one of the at least one scene;

the obtaining scene data of at least one scene represented by the video file through the at least two frames of images based on the intelligent analysis of the video file comprises:

analyzing the at least two frame images, and defining the images meeting the preset condition as a group of image frames, wherein for any frame image, when the association degree of the frame image and the previous frame image is higher than a preset threshold value, the frame image is divided into the group to which the previous frame image belongs;

a group of image frames corresponds to a scene, and the image information data of at least one frame of image in the group of image frames is the scene data of the corresponding scene;

in different scenes, the attention objects of the contents displayed by the video file are different, and the area corresponding to the attention object is used as the attention area.

2. The method of claim 1, further comprising:

obtaining a first trigger operation, wherein the first trigger operation is used for indicating that the video file is output based on a common mode; and/or obtaining a second trigger operation, wherein the second trigger operation is used for indicating that the video file is output based on an enhanced mode, and the video file is output in the enhanced mode and at least comprises processed image frames.

3. The method of claim 1, further comprising:

for any scene, in at least one frame of image frames corresponding to the scene, determining an object meeting a predetermined rule except an object of interest of a scene previous to the scene as the object of interest of the scene;

wherein the predetermined rule includes at least one of: a predetermined depth of field range, and/or a predetermined target object identification parameter range.

4. The method of claim 3, wherein said processing a first set of image frames corresponding to a first scene of the at least one scene based on the scene data comprises:

regarding any frame image in a first group of image frames corresponding to a first scene, a region corresponding to a focus object of the first scene is used as a focus region, and the image is processed based on the focus region.

5. The method of claim 4, wherein the processing of the any frame of images based on the region of interest comprises:

acquiring the depth of field of the region of interest;

and blurring the objects except the depth of field in the image.

6. An intelligent video file processing device, comprising:

the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring a video file, and the video file comprises at least two frames of images;

the analysis module is used for obtaining scene data of at least one scene presented by the video file through the at least two frames of images based on intelligent analysis of the video file;

a processing module for processing a first set of image frames corresponding to a first scene of the at least one scene based on the scene data to cause the first scene presented by the video file when played to highlight a region of interest, the region of interest being presented by the processed first set of image frames;

wherein the processing module processing a first set of image frames corresponding to a first scene of the at least one scene based on the scene data comprises:

the analysis module is configured to process a set of image frames corresponding to each of the at least one scene based on the scene data, where the first scene is one of the at least one scene;

the analysis module obtains scene data of at least one scene represented by the video file through the at least two frames of images based on intelligent analysis of the video file, including: the analysis module is used for analyzing at least two frames of images in the video file and specifying the images meeting the preset conditions into a group of image frames, wherein for any frame of image, when the association degree of the frame of image and the previous frame of image is higher than a preset threshold value, the frame of image is divided into the group to which the previous frame of image belongs; a group of image frames corresponds to a scene, and the image information data of at least one frame of image in the group of image frames is the scene data of the corresponding scene;

7. An electronic device comprising a processor, a memory and a computer program stored on the memory and operable on the processor, the processor implementing the intelligent video file processing method according to any one of claims 1 to 5 when executing the program.