WO2022105740A1 - 视频的处理方法、装置、可读介质和电子设备 - Google Patents

视频的处理方法、装置、可读介质和电子设备 Download PDF

Info

Publication number
WO2022105740A1
WO2022105740A1 PCT/CN2021/130875 CN2021130875W WO2022105740A1 WO 2022105740 A1 WO2022105740 A1 WO 2022105740A1 CN 2021130875 W CN2021130875 W CN 2021130875W WO 2022105740 A1 WO2022105740 A1 WO 2022105740A1
Authority
WO
WIPO (PCT)
Prior art keywords
image frame
target
cropping
target image
cropped
Prior art date
Application number
PCT/CN2021/130875
Other languages
English (en)
French (fr)
Inventor
吴昊
马云涛
王长虎
Original Assignee
北京有竹居网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京有竹居网络技术有限公司 filed Critical 北京有竹居网络技术有限公司
Priority to US18/253,357 priority Critical patent/US11922597B2/en
Publication of WO2022105740A1 publication Critical patent/WO2022105740A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/635Overlay text, e.g. embedded captions in a TV program
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/34Indicating arrangements 
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234363Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by altering the spatial resolution, e.g. for clients with a lower screen resolution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234381Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by altering the temporal resolution, e.g. decreasing the frame rate by frame skipping
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440263Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by altering the spatial resolution, e.g. for displaying on a connected PDA
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440281Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by altering the temporal resolution, e.g. by frame skipping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation

Definitions

  • the present application is based on the CN application number 202011298813.5 and the filing date is Nov. 18, 2020, and claims its priority.
  • the disclosure of the CN application is hereby incorporated into the present application as a whole.
  • the present disclosure relates to the technical field of image processing, and in particular, to a video processing method, apparatus, readable medium, and electronic device.
  • Video advertisements displayed according to the opening screen display are usually divided into two parts.
  • the first part will occupy the entire screen of the terminal device, that is, it will be displayed in full-screen mode to highlight the information contained in the first part, and the second part will be displayed according to the The video ad is displayed at its original size. Since the first part is to be displayed in full-screen mode, the image frames in the original video need to be cropped in advance, so that the cropped image frames conform to the display size of the terminal device.
  • a video processing method comprising:
  • a reserved image frame is determined among the plurality of target image frames, and the reserved image frame is used to indicate that the target video is located in the reserved image frame.
  • the previous image frame is cropped.
  • a video processing apparatus comprising:
  • a preprocessing module for preprocessing the target video to obtain multiple target image frames in the target video
  • an identification module for identifying the position of the specified object in each of the target image frames
  • a first determining module configured to determine a reserved image frame among the plurality of target image frames according to the position of the designated object in each of the target image frames, and the reserved image frame is used to indicate that the target video The image frame located before the reserved image frame is cropped.
  • a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processing apparatus, implements the steps of the method described in the first aspect of the present disclosure.
  • an electronic device comprising:
  • memory for storing one or more programs
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement any one of the aforementioned video processing methods.
  • a computer program comprising:
  • a computer program product comprising instructions that, when executed by a processor, cause the processor to perform any one of the aforementioned video processing methods.
  • FIG. 1 is a flowchart of a video processing method according to an exemplary embodiment
  • FIG. 2 is a flowchart of another video processing method according to an exemplary embodiment
  • FIG. 3 is a flowchart of another video processing method according to an exemplary embodiment
  • FIG. 4 is a flowchart of another video processing method according to an exemplary embodiment
  • FIG. 5 is a flowchart of another video processing method according to an exemplary embodiment
  • FIG. 6 is a flowchart of another video processing method according to an exemplary embodiment
  • FIG. 7 is a flowchart of another video processing method according to an exemplary embodiment
  • FIG. 8 is a schematic diagram of a display screen of a terminal device according to an exemplary embodiment
  • FIG. 9 is a flowchart of another video processing method according to an exemplary embodiment.
  • FIG. 10 is a schematic diagram of a display screen of a terminal device according to an exemplary embodiment
  • FIG. 11 is a block diagram of a video processing apparatus according to an exemplary embodiment
  • FIG. 12 is a block diagram of another video processing apparatus according to an exemplary embodiment
  • FIG. 13 is a block diagram of another video processing apparatus according to an exemplary embodiment
  • FIG. 14 is a block diagram of another video processing apparatus according to an exemplary embodiment
  • FIG. 15 is a block diagram of another video processing apparatus according to an exemplary embodiment
  • 16 is a block diagram of another video processing apparatus according to an exemplary embodiment
  • Fig. 17 is a block diagram of an electronic device according to an exemplary embodiment.
  • the term “including” and variations thereof are open-ended inclusions, ie, "including but not limited to”.
  • the term “based on” is “based at least in part on.”
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; the term “some embodiments” means “at least some embodiments”. Relevant definitions of other terms will be given in the description below.
  • Some embodiments of the present invention provide a video processing method that reduces information loss during cropping of a target video.
  • Fig. 1 is a flowchart of a video processing method according to an exemplary embodiment. As shown in Fig. 1 , the method includes the following steps 101-103.
  • Step 101 Preprocess the target video to obtain multiple target image frames in the target video.
  • the execution subject of the embodiment of the present disclosure may be a terminal device or a server, or some steps may be executed on the terminal device, and another part of the steps may be executed on the server, which is not specifically limited in the present disclosure.
  • the target video can be a video shot by a user (for example, an advertiser or an individual user), or a user browsing the multimedia resource library (the multimedia resource library can be stored in the terminal device or in the server) , the video selected in it, or the video uploaded by the user to the Internet.
  • the target video can be obtained through the identification code or URL (English: Uniform Resource Locator, Chinese: Uniform Resource Locator) address of the target video.
  • the target video may be preprocessed to obtain multiple target image frames included in the target video.
  • the preprocessing may be, for example, extracting the image frames included in the target video to obtain the target image frames.
  • the target video includes 1000 image frames, from which 200 image frames are extracted as target image frames.
  • the preprocessing may also be performing noise reduction processing on the image frames included in the target video to remove noise in the image frames to obtain the target image frames.
  • the preprocessing may also be to perform frame removal processing on the image frames included in the target video, so as to remove frames in the image frames irrelevant to the transmission information to obtain the target image frame.
  • the preprocessing may also be to first extract the image frames included in the target video, and then perform noise reduction processing, frame removal processing, etc. on the extracted image frames to obtain the target image frame.
  • the target image frame may be all image frames included in the target video, or may be part of the image frames included in the target video, which is not specifically limited in the present disclosure.
  • Step 102 identifying the position of the designated object in each target image frame.
  • Step 103 according to the position of the designated object in each target image frame, determine the reserved image frame among the plurality of target image frames, and the reserved image frame is used to instruct to crop the image frame located before the reserved image frame in the target video.
  • each target image frame may be identified according to a preset image recognition algorithm, so as to identify the position of the designated object in each target image frame.
  • the specified object can be understood as the main content to be presented in the target video, or the content to be highlighted, for example, it may include: at least one of a face, a text, a specified mark, or a salient object, and the specified mark may be, for example, a user specified
  • the salient objects can be understood as the objects occupying a large proportion in the target image frame.
  • Retaining the image frame can be understood as the position where the target video is cropped, and can also be understood as dividing the target video into a video displayed in full-screen mode (that is, the first video mentioned later) and a video displayed in the original size (that is, the video is displayed in the original size).
  • the demarcation point of the second video mentioned later, the image frames in the target video before the reserved image frames are all suitable for cropping.
  • the method may further include: cropping the target video according to the reserved image frame, so as to crop the target video into a full-screen video and an original video, and then controlling the full-screen video to be performed in a full-screen mode Display, control the original video to be displayed in its original size.
  • Full-screen video is composed of each image frame before the retained image frame after cropping, and the original video is composed of the retained image frame and each image frame after the retained image frame.
  • the manner of determining the retention of the image frame may be, for example, sequentially comparing the area ratio occupied by the specified object in each target image frame with a preset area threshold, and if the area ratio is greater than the area threshold, the description The importance of the target image frame is high (that is, more information is transmitted), and it is not suitable for cropping; if the area ratio is less than or equal to the area threshold, it indicates that the importance of the target image frame is low (that is, less information is transmitted), Suitable for cutting. Then, the image frame with the smallest sequence number among the target image frames with higher importance is regarded as the reserved image frame, wherein the frame sequence number is used to indicate the sequence of the target image frame in the target video.
  • the image frame can also compare the position of the specified object in each target image frame with the preset cropping position. If the specified object is located inside the cropping position, it indicates that if the target image frame is For cropping, the specified object can still be displayed completely and is suitable for cropping. If the specified object is located outside the cropping position, it means that if the target image frame is cropped, the user cannot see the specified object and is not suitable for cropping. Then, the image frame with the smallest sequence number among the target image frames that are not suitable for cropping is used as the reserved image frame.
  • the target video includes 500 image frames, 250 image frames are extracted from it at an interval of 1 frame, and then the 250 image frames are removed. , 250 target image frames are obtained, wherein the frame numbers in the target video corresponding to the 250 target image frames may be (1, 3, 5, . . . , 497, 499). Then face recognition is performed on 250 target image frames to determine the position of the face in each target image frame. Finally, determine the area ratio of the face in each target image frame in turn. If the obtained area ratio is greater than 60%, it means that the image frame has a high degree of importance and is not suitable for cropping.
  • the target image frame with frame number 15 can be used as the reserved image frame.
  • the present disclosure first preprocesses the target video to obtain multiple target video frames in the target video, then identifies each target video frame to obtain the position of the specified object in the target video, and finally according to The position of the specified object in each target image frame, among multiple target image frames, select the reserved image frame, and the reserved image frame is used to instruct the image frame before the reserved image frame in the target image video to be cropped.
  • the present disclosure determines the reserved image frame based on the position of the designated object in the target image frame, so as to instruct the image frame before the reserved image frame to be cropped, and can determine the reserved image frame suitable for different target videos, and reduces the need for the target video. Information loss during cropping.
  • Fig. 2 is a flowchart of another video processing method according to an exemplary embodiment. As shown in Fig. 2, in some embodiments, the implementation of step 101 may be as shown in step S1011:
  • Step 1011 Extract image frames in the target video according to a preset frame interval to obtain multiple target image frames.
  • step 102 may include:
  • Step 1021 Filter the target image frame to remove the frame in the target image frame.
  • Step 1022 using an image recognition algorithm to identify the filtered target image frame, to determine the position of the designated object in the target image frame, the image recognition algorithm includes: a face recognition algorithm, a text recognition algorithm, a designated logo recognition algorithm, or At least one of the saliency detection algorithms.
  • the preprocessing of the target video may be to extract image frames in the target video according to a preset frame interval.
  • the frame interval may be 5, then the image frames in the target video may be extracted every 5 images.
  • each target image frame can be filtered first, so as to remove the frame contained in the target image frame that cannot transmit information, and then use the preset image recognition algorithm.
  • the filtered target image frame is identified to obtain the position of the designated object in the target image frame.
  • the specified object may include at least one of a face, a text, a specified mark, or a salient object, then correspondingly, the image recognition algorithm may be a face recognition algorithm, a text recognition algorithm, a specified mark recognition algorithm, or a saliency detection algorithm. at least one.
  • Fig. 3 is a flowchart of another video processing method according to an exemplary embodiment. As shown in Fig. 3 , after step 103, the method may further include steps 104-106.
  • Step 104 Determine the crop size according to the original size of the target video and the display size of the terminal device, and match the crop size with the display size.
  • Step 105 Determine a first number of cropping frames according to the cropping size and the preset step value. Each cropping frame has different positions on the image frame in the target video, and each cropping frame is the cropping size.
  • Step 106 according to the position of the designated object in each target image frame to be cropped, and the first number of cropping frames, determine the total cropping path, the target image frame to be cropped is the target image frame in the target video before the reserved image frame, and the total cropping path is
  • the cropping path includes: a cropping frame corresponding to each image frame in the target video before the reserved image frame.
  • a total cropping path suitable for the target video may be further determined.
  • the total cropping path includes multiple cropping frames, each cropping frame corresponds to an image frame located before the reserved image frame in the target video, and the corresponding cropping frame is used to indicate that when the image frame is cropped, Which part to crop and which part to keep, i.e. crop the pixels outside the crop box and keep the pixels within the crop box. That is, the total crop path can indicate how to crop each image frame in the target video that precedes the reserved image frame.
  • a method of determining the total clipping path is exemplarily described below.
  • the original size of the target video can be understood as the resolution of the target video.
  • the display size is, for example, the size of the display screen of the terminal device that needs to display the target video.
  • the crop size is, for example, a resolution matching the display size.
  • the cropping size matches the display size, and it can be understood that the display screen of the terminal device can directly display an image whose resolution is the cropping size. For example, if the original size is 1280*720 and the display size is 1:1, then according to the rule of cropping only one side in the length and width, the cropping size can be determined to be 720*720.
  • the first number of cropping frames can be determined according to the determined cropping size and the preset step value, wherein the size of each cropping frame is the cropping size, and the position of each cropping frame on the image frame is different. same. It can be understood that starting from one side of the image frame, the first type of cropping frame is obtained first, and then the first type of cropping frame is moved along the specified direction by a step value to obtain the second type of cropping frame, and then along the specified direction. Move two steps in the direction to get the third cropping frame, and so on. Taking the original size of the image frame as 1280*720, the crop size as 720*720, and the step value of 20 pixels as an example, starting from the left side of the image frame, the first cropping frame is obtained.
  • the horizontal coordinate range of the pixel is: 1-720
  • the vertical coordinate range is: 1-720. Then move the first cropping frame to the right by 20 pixels to obtain the second cropping frame.
  • the horizontal coordinate range of the pixels in the second cropping frame is: 21-740
  • the vertical coordinate range is: 1-720.
  • each target image frame to be cropped based on the position of the specified object in the target image frame to be cropped, screening is performed in the first number of cropping frames, and a cropping frame with the least information loss is selected as the cropping frame.
  • the cropping frame corresponding to the target image frame to be cropped is obtained.
  • the target image frame to be cropped is the target image frame located before the reserved image frame.
  • the cropping frame corresponding to each image frame in the target video that is located before the reserved image frame can be obtained, thereby obtaining the total cropping path. If the target image frame is extracted from the target video according to the frame interval, the cropping frame corresponding to each target image frame to be cropped may be interpolated to obtain the total cropping path.
  • Fig. 4 is a flowchart showing another video processing method according to an exemplary embodiment. As shown in Fig. 4, after step 106, the method may further include:
  • Step 107 Crop each image frame in the target video before the reserved image frame according to the total clipping path, so as to obtain a first video composed of cropped image frames, and a second video composed of uncropped image frames. video.
  • the image frames in the target video can be divided into two parts according to the reserved image frames: the first part is the image frame before the reserved image frame, the second part is the reserved image frame, and the second part is the reserved image frame.
  • the image frame after the image frame is preserved.
  • the image frames in the second part are composed into a second video, which includes the uncropped image frames.
  • the image frames in the first part are cropped according to the total cropping path, that is to say, the corresponding image frames are cropped according to the cropping frame corresponding to each image frame included in the total cropping path before the reserved image frame, and the cropped
  • the first video consists of image frames.
  • the first video is suitable for displaying on the terminal device in full-screen mode
  • the second video is suitable for displaying on the terminal device in the original size.
  • FIG. 5 is a flowchart of another video processing method according to an exemplary embodiment. As shown in FIG. 5 , multiple target image frames are arranged in the order in the target video.
  • the implementation of step 103 can be:
  • Step 1031 for each target image frame, according to the position of the designated object in the target image frame, determine the target ratio corresponding to the target image frame, and the target weight corresponding to the target image frame, and the target ratio is used to indicate that the designated object occupies the The proportion of the target image frame, and the target weight is determined according to the weight of each pixel in the target image frame;
  • Step 1032 the corresponding target ratio is greater than the preset ratio threshold, and the target image frame at the front of the order is used as the first target image frame;
  • Step 1033 the target weight is greater than the preset weight threshold, and the target image frame at the front of the order is used as the second target image frame;
  • Step 1034 Determine the reserved image frame according to the first target image frame and the second target image frame.
  • the multiple target image frames obtained in step 101 can be arranged in the order in the target video, which can be understood as the multiple target image frames are arranged according to the frame number, for example, the target video includes 500 image frames, According to the interval of every other frame, 250 target image frames are extracted therefrom, and the frame numbers are (1, 3, 5, ..., 497, 499), and the target image frames can be arranged in ascending order.
  • the target ratio can indicate the proportion of the designated object in the target image frame, which can be understood as the area ratio of the designated object in the target image frame.
  • a target image frame includes a total of 1000 pixels, of which the designated object occupies 550 pixels, then the target ratio is 55%.
  • the target ratio can also be understood as binarizing the target image frame according to whether a pixel belongs to the specified object, to obtain one or more connected regions, and then compare the area of the circumscribed rectangle of each connected region with the The ratio of the area of the target image frame is taken as the target ratio.
  • the first target image frame may be determined according to the target ratio and a preset ratio threshold (for example, it may be 60%), and the first target image frame corresponds to the target ratio greater than the ratio threshold, and the first target image frame is sequentially included in the multiple target image frames.
  • a second target image frame may be determined according to the target weight and a preset weight threshold (for example, it may be 1.3), and the second target image frame corresponds to the target weight greater than the weight threshold, and the order of the multiple target image frames is the first. (that is, the target image frame with the smallest frame number).
  • the reserved image frame is determined according to the first target image frame and the second target image frame.
  • the manner of determining to retain the image frame in step 1034 can be divided into the following two types:
  • the target image frame to be selected in the target video is located before the preset latest image frame, the target image frame to be selected is used as the reserved image frame, and the target image frame to be selected is the first target image frame and the second target image frame.
  • the target image frame in the preceding order.
  • the latest image frame is used as the reserved image frame.
  • the target image frame in the previous order that is, first determine which frame number of the first target image frame and the second target image frame is the smallest, and use it as the target image frame.
  • the target image frame to be selected Then, the sequence of the target image frame to be selected and the preset latest image frame in the target video are compared, and the image frame in the previous sequence is used as the reserved image frame.
  • the latest image frame can be understood as the pre-specified latest image frame.
  • the target video only needs to be displayed in full-screen mode for a period of time, and the latest image frame can be determined according to the longest display time in full-screen mode.
  • the maximum time displayed in full-screen mode is 10s
  • the frame rate of the target video is 30 frames per second
  • the frame number of the latest image frame is 300.
  • the image frame with the smallest frame serial number among the target image frame to be selected and the latest image frame is used as the reserved image frame.
  • Fig. 6 is a flowchart showing another video processing method according to an exemplary embodiment. As shown in Fig. 6, step 106 may include:
  • Step 1061 Determine a third number of cropping paths according to the first number of cropping frames and the second number of target image frames to be cropped, and each cropping path includes: among the second number of target image frames to be cropped, each The cropping frame corresponding to the target image frame;
  • Step 1062 Determine the total amount of cropping loss corresponding to each cropping path according to the position of the specified object in each target image frame to be cropped, and the total amount of cropping loss is used to indicate the second number of target images to be cropped according to the cropping path. The size of the loss caused by frame cropping;
  • Step 1063 taking a clipping path with the smallest total clipping loss as the target clipping path
  • Step 1064 Interpolate the target clipping path to obtain a total clipping path.
  • a third number of cropping paths may be determined according to the first number of cropping frames and the second number of target image frames to be cropped.
  • Each cropping path includes a cropping frame corresponding to each target image frame to be cropped.
  • the way of determining the third quantity can be understood as the first quantity of cropping frames, which are arranged and combined according to the second quantity. Taking 5 cropping frames and 6 target image frames to be cropped as an example, then 31250 (ie, the third number) cropping paths can be obtained. After that, the total amount of clipping loss for each clipping path can be calculated separately.
  • the total amount of cropping loss is used to indicate the size of the information loss caused by cropping the second number of target image frames to be cropped according to the cropping path. Then a clipping path with the smallest total clipping loss is used as the target clipping path. That is to say, the target cropping path is the cropping path with the least loss of information among the third number of cropping paths, so that the target video can be preserved as much as possible on the premise of cropping the target video into the first video and the second video. Information. Finally, if the target image frame is extracted from the target video according to the frame interval, the target cropping path can be interpolated to obtain a total cropping path including the cropping frame corresponding to each image frame before the reserved image frame.
  • the total amount of clipping loss in step 1062 can be obtained through the following steps:
  • Step 1) According to the position of the designated object in each target image frame to be cropped, determine the amount of cropping loss corresponding to the target image frame to be cropped, and the amount of cropping loss is used to indicate the target image frame to be cropped included in the first cropping path.
  • the corresponding cropping frame, the size of the loss caused by cropping the target image frame to be cropped, and the first cropping path is any one of the third number of cropping paths;
  • Step 2) Summing the amount of cropping loss corresponding to each target image frame to be cropped to obtain the total amount of cropping loss corresponding to the first cropping path.
  • the cropped image frame obtained by cropping the target image frame to be cropped according to the first cropping path may be determined first. Then, according to each target image frame to be cropped and the corresponding image frame after cropping, the amount of cropping loss corresponding to the target image frame to be cropped is obtained, wherein the amount of cropping loss is used to indicate the size of the information loss caused by the cropping.
  • the clipping loss can consist of three parts: importance score, completeness score, and transfer score.
  • the determination method of the importance score may be: according to the proportion of the specified object in the cropped image frame to the cropped image frame, and the proportion of the specified object in the to-be-cropped target image frame to the to-be-cropped target image frame, determine the to-be-cropped target image frame.
  • the importance score corresponding to the image frame.
  • the manner of determining the integrity score may be: determining the integrity score corresponding to the target image frame to be trimmed according to the integrity of the specified object in the image frame after trimming.
  • the degree of completeness can be understood as the coverage of the location of the specified object and the cropping frame, for example, the ratio of the area repeated between the specified object and the cropping frame to the specified object.
  • the way of determining the transfer score may be: according to the distance between the first cropping frame and the second cropping frame, determine the transfer score corresponding to the target image frame to be cropped, and the first cropping frame is the to-be-cropped frame included in the first cropping path.
  • the cropping frame corresponding to the target image frame, the second cropping frame is included in the first cropping path, and the cropping frame corresponding to the previous target image frame to be cropped of the target image frame to be cropped.
  • the amount of cropping loss corresponding to the target image frame to be cropped may be determined according to the importance score, integrity score and transition score corresponding to the target image frame to be cropped.
  • the formula for calculating the clipping loss (which can be expressed as Loss) can be: Among them, Loss i represents the cropping loss corresponding to the i-th target image frame to be cropped, N represents the second quantity, ⁇ 1 represents the weight corresponding to the importance score, ⁇ 2 represents the weight corresponding to the integrity score, and ⁇ 3 represents the transfer score. corresponding weight.
  • the cropping loss corresponding to each target image frame to be cropped is obtained, the cropping loss corresponding to each target image frame to be cropped is summed to obtain the total cropping loss corresponding to the first cropping path.
  • the target clipping path with the smallest total clipping loss is selected.
  • Fig. 7 is a flowchart of another video processing method according to an exemplary embodiment. As shown in Fig. 7 , after step 107, the method further includes steps 108-110.
  • Step 108 controlling the first video to be played in a full-screen mode on the terminal device.
  • Step 109 when the zoomed image frame is played, the zoomed image frame is controlled to be zoomed out to a target position in the play area on the terminal device within a preset zooming duration.
  • the scaled image frame is the last image frame in the first video. If the image frame before scaling is played in the playback area according to its original size, the target position is the position of the cropping frame corresponding to the image frame before scaling included in the total cropping path, and the image frame before scaling is the frame corresponding to the scaling image in the target video. Cropped image frame.
  • Step 110 Control the second video to be played in the play area according to the original size.
  • the first video may be controlled to be played in a full-screen mode on the terminal device.
  • the information contained in the first video can be highlighted.
  • the zoomed image frame can be controlled to be reduced to the playback area on the terminal device within a preset zooming duration (for example, it can be 1s). target location within.
  • the target position can be understood as, if the zoomed image frame corresponds to the target video, and the uncropped image frame (that is, the image frame before zooming) is played in the playback area according to the original size, then the cropping frame corresponding to the image frame before zooming is in The displayed position within the playback area.
  • the play area may be the middle area of the display screen of the terminal device, or may be other areas.
  • the second video is controlled to be played in the play area according to the original size.
  • the first video and the second video can be connected to reduce the abruptness caused by the image frame during the process of switching the playback mode.
  • area 1 is the entire display screen, which is framed by a solid line in FIG. 8
  • the first video is displayed in area 1
  • area 2 is the playback area. If the original size is played in the playback area, the image frame before scaling is located in area 2, and the position of the cropping frame corresponding to the image frame before scaling included in the total cropping path is area 3. Then first control the first video to play in area 1, and then when playing to the zoomed image frame, control the zoomed image frame to zoom from area 1 to area 3 within the zoom duration, and finally control the second video to play in area 2 .
  • Fig. 9 is a flowchart showing another video processing method according to an exemplary embodiment. As shown in Fig. 9, step 109 may include:
  • Step 1091 determine a plurality of first vertices of the zoomed image, and each first vertex corresponds to the second vertex at the target position;
  • Step 1092 according to the distance between each first vertex and the corresponding second vertex, and the scaling duration, determine the scaling speed of the first vertex;
  • Step 1093 Control the zoomed image to be reduced to the target position according to the zooming speed of each first vertex.
  • a plurality of first vertices of the zoomed image may be determined first, and each first vertex corresponds to a second vertex at the target position .
  • the four first vertices of the zoomed image are A, B, C, and D
  • the four second vertices of the target position are a, b, c, and d
  • the first and second vertices are in one-to-one correspondence.
  • the length of the line segment Aa is 100 pixels, divided by the scaling time 2s, that is, the scaling speed of the first vertex A is 50 pixels/second.
  • the zoomed image can be controlled to shrink to the target position according to the zooming speed of each first vertex. For example, starting from the current moment, the first vertex A of the zoomed image is located at the vertex of the display screen of the terminal device, then, after 10ms, the first vertex A moves forward 5 pixels along the line segment Aa, and after 20ms, the first vertex A A moves forward 10 pixels along the line segment Aa, where the first vertex A will pass through A', and so on.
  • the area outside the zoomed image can be displayed as a preset fill color (for example, black), or an application running in the background can be displayed.
  • a preset fill color for example, black
  • an application running in the background can be displayed.
  • the application interface of the program is not specifically limited in this disclosure.
  • the present disclosure first preprocesses the target video to obtain multiple target video frames in the target video, then identifies each target video frame to obtain the position of the specified object in the target video, and finally according to The position of the specified object in each target image frame, among multiple target image frames, select the reserved image frame, and the reserved image frame is used to instruct the image frame before the reserved image frame in the target image video to be cropped.
  • the present disclosure determines the reserved image frame on the basis of the position of the specified object in the target image frame, thereby instructing to crop the image frame before the reserved image frame, and can determine the reserved image frame suitable for different target videos, and reduces the need for the target video. Information loss during cropping.
  • FIG. 11 is a block diagram of a video processing apparatus according to an exemplary embodiment. As shown in FIG. 11 , the apparatus 200 includes:
  • the preprocessing module 201 is used for preprocessing the target video to obtain multiple target image frames in the target video;
  • the identification module 202 is used to identify the position of the specified object in each target image frame
  • the first determination module 203 which may also be referred to as a "reserved image frame determination module" is used to determine the reserved image frames among the plurality of target image frames according to the position of the specified object in each target image frame, and the reserved image frames are used to indicate Crops image frames in the target video before the reserved image frames.
  • the specified object includes at least one of a face, a text, a specified logo, or a salient object.
  • the preprocessing module 201 is configured to: extract image frames in the target video according to a preset frame interval to obtain multiple target image frames.
  • the identification module 202 is used for: filtering the target image frame to remove the frame in the target image frame.
  • the filtered target image frame is identified by an image recognition algorithm to determine the position of the designated object in the target image frame.
  • the image recognition algorithm includes: face recognition algorithm, text recognition algorithm, designated logo recognition algorithm, or saliency detection at least one of the algorithms.
  • FIG. 12 is a block diagram of another video processing apparatus according to an exemplary embodiment. As shown in FIG. 12 , the apparatus 200 further includes:
  • the second determination module 204 which can also be referred to as a “cropping size determination module", is used to determine the reserved image frame in the multiple target image frames according to the position of the specified object in each target image frame, according to the original image frame of the target video. Size and display size of the terminal device, determine the crop size, and the crop size matches the display size;
  • the third determination module 205 which may also be referred to as a "cropping frame determination module", is further configured to determine a first number of cropping frames according to the cropping size and a preset step value, and each cropping frame is an image frame in the target video. The position on the screen is different, and each cropping frame is the cropping size;
  • the fourth determination module 206 which can also be referred to as a "cropping path determination module", is used to determine the total cropping path according to the position of the specified object in each target image frame to be cropped, and the first number of cropping frames, and the target image to be cropped.
  • the frame is a target image frame located before the reserved image frame in the target video
  • the total cropping path includes: a cropping frame corresponding to each image frame located before the reserved image frame in the target video.
  • FIG. 13 is a block diagram of another video processing apparatus according to an exemplary embodiment. As shown in FIG. 13 , the apparatus 200 further includes:
  • the cropping module 207 is configured to, after determining the total cropping path according to the position of the specified object in each target image frame to be cropped, and the first number of cropping frames, according to the total cropping path to each target video before the reserved image frame.
  • the image frames are cropped to obtain a first video composed of cropped image frames and a second video composed of uncropped image frames.
  • Fig. 14 is a block diagram of another video processing apparatus according to an exemplary embodiment. As shown in Fig. 14, multiple target image frames are arranged in the order in the target video, and the first determining module 203 may include:
  • the first determination sub-module 2031 is used for each target image frame, according to the position of the specified object in the target image frame, to determine the target ratio corresponding to the target image frame, and the target weight corresponding to the target image frame.
  • the target weight is determined according to the weight of each pixel in the target image frame;
  • the second determination sub-module 2032 is configured to use the target image frame whose corresponding target ratio is greater than the preset ratio threshold and is the first in the sequence as the first target image frame.
  • the target weight is greater than the preset weight threshold, and the target image frame at the front of the order is used as the second target image frame;
  • the third determination sub-module 2033 is configured to determine the reserved image frame according to the first target image frame and the second target image frame.
  • the manner in which the third determining sub-module 2033 determines the reserved image frame can be divided into the following two types:
  • the target image frame to be selected in the target video is located before the preset latest image frame, the target image frame to be selected is used as the reserved image frame, and the target image frame to be selected is the first target image frame and the second target image frame.
  • the target image frame in the preceding order.
  • the latest image frame is used as the reserved image frame.
  • FIG. 15 is a block diagram of another video processing apparatus according to an exemplary embodiment.
  • the fourth determining module 206 may include:
  • the fourth determination submodule 2061 is used to determine a third number of cropping paths according to the first number of cropping frames and the second number of target image frames to be cropped, and each cropping path includes: a second number of target image frames to be cropped , the cropping frame corresponding to each target image frame to be cropped;
  • the fifth determination sub-module 2062 is used to determine the total amount of cropping loss corresponding to each cropping path according to the position of the specified object in each target image frame to be cropped, and the total amount of cropping loss is used to indicate the second cropping path according to the cropping path.
  • the size of the loss caused by cropping the number of target image frames to be cropped;
  • the sixth determination sub-module 2063 is configured to use a clipping path with the smallest total clipping loss as the target clipping path. Interpolate the target clipping path to get the total clipping path.
  • the fifth determination sub-module 2062 can be used to perform the following steps:
  • Step 1) According to the position of the designated object in each target image frame to be cropped, determine the amount of cropping loss corresponding to the target image frame to be cropped, and the amount of cropping loss is used to indicate the target image frame to be cropped included in the first cropping path.
  • the corresponding cropping frame, the size of the loss caused by cropping the target image frame to be cropped, and the first cropping path is any one of the third number of cropping paths;
  • Step 2) Summing the amount of cropping loss corresponding to each target image frame to be cropped to obtain the total amount of cropping loss corresponding to the first cropping path.
  • step 1) may include:
  • Step 1a) According to the ratio of the designated object in the clipped image frame to the clipped image frame, and the ratio of the designated object in the clipped target image frame to the to-be clipped target image frame, determine the corresponding importance of the to-be clipped target image frame Scoring, the cropped image frame is an image frame obtained by cropping the target image frame to be cropped according to the cropping frame corresponding to the target image frame to be cropped included in the first cropping path;
  • Step 1b) according to the completeness of the designated object in the cropped image frame, determine the corresponding integrity score of the target image frame to be cropped;
  • Step 1c) According to the distance between the first cropping frame and the second cropping frame, determine the transfer score corresponding to the target image frame to be cropped, and the first cropping frame is the corresponding target image frame to be cropped included in the first cropping path. a cropping frame, where the second cropping frame is included in the first cropping path, and is the cropping frame corresponding to the previous target image frame to be cropped of the target image frame to be cropped;
  • Step 1d) According to the importance score, integrity score and transition score corresponding to the target image frame to be cropped, determine the amount of cropping loss corresponding to the target image frame to be cropped.
  • FIG. 16 is a block diagram of another video processing apparatus according to an exemplary embodiment. As shown in FIG. 16 , the apparatus 200 further includes:
  • the full-screen control module 208 is used for cropping each image frame located before the reserved image frame in the target video according to the total cropping path, so as to obtain a first video composed of cropped image frames, and an uncropped image. After the second video composed of frames, control the first video to be played in full-screen mode on the terminal device;
  • the zoom control module 209 is configured to control the zoomed image frame to be zoomed out to the target position in the playback area on the terminal device within the preset zooming duration when the zoomed image frame is played, and the zoomed image frame is the last one in the first video.
  • An image frame, the target position is the position of the cropping frame corresponding to the pre-scaled image frame included in the total cropping path if the pre-scaled image frame is played in the playback area according to its original size, and the pre-scaled image frame is the scaled image frame corresponding to the target video. , the uncropped image frame;
  • the original size control module 210 is configured to control the second video to be played according to the original size in the play area.
  • the zoom control module 209 may be used to:
  • a plurality of first vertices of the scaled image are determined, and each first vertex corresponds to a second vertex at the target location.
  • the zooming speed of each first vertex is determined according to the distance between each first vertex and the corresponding second vertex, and the zooming duration. According to the zoom speed of each first vertex, control the zoom image to shrink to the target position.
  • the present disclosure first preprocesses the target video to obtain multiple target video frames in the target video, then identifies each target video frame to obtain the position of the specified object in the target video, and finally according to The position of the specified object in each target image frame, among multiple target image frames, select the reserved image frame, and the reserved image frame is used to instruct the image frame before the reserved image frame in the target image video to be cropped.
  • the present disclosure determines the reserved image frame on the basis of the position of the specified object in the target image frame, thereby instructing to crop the image frame before the reserved image frame, and can determine the reserved image frame suitable for different target videos, and reduces the need for the target video. Information loss during cropping.
  • FIG. 17 it shows a schematic structural diagram of an electronic device (ie, an executive body of the above-mentioned video processing method) 300 suitable for implementing an embodiment of the present disclosure.
  • the electronic device in the embodiment of the present disclosure may be a server, and the server may be, for example, a local server or a cloud server.
  • the electronic device can also be a terminal device, and the terminal device can include, but is not limited to, such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle-mounted terminal Mobile terminals such as in-vehicle navigation terminals, etc., and stationary terminals such as digital TVs, desktop computers, and the like.
  • the electronic device shown in FIG. 17 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.
  • an electronic device 300 may include a processing device (eg, a central processing unit, a graphics processor, etc.) 301 that may be loaded into random access according to a program stored in a read only memory (ROM) 302 or from a storage device 308 Various appropriate actions and processes are executed by the programs in the memory (RAM) 303 .
  • RAM 303 various programs and data necessary for the operation of the electronic device 300 are also stored.
  • the processing device 301, the ROM 302, and the RAM 303 are connected to each other through a bus 304.
  • An input/output (I/O) interface 305 is also connected to bus 304 .
  • the following devices may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration An output device 307 of a computer, etc.; a storage device 308 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 309. Communication means 309 may allow electronic device 300 to communicate wirelessly or by wire with other devices to exchange data. While FIG. 17 shows electronic device 300 having various means, it should be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
  • the computer program may be downloaded and installed from the network via the communication device 309, or from the storage device 308, or from the ROM 302.
  • the processing device 301 When the computer program is executed by the processing device 301, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.
  • the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • the computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.
  • terminal devices and servers can use any currently known or future developed network protocols such as HTTP (HyperText Transfer Protocol) to communicate, and can communicate with digital data in any form or medium Communication (eg, a communication network) interconnects.
  • network protocols such as HTTP (HyperText Transfer Protocol) to communicate, and can communicate with digital data in any form or medium Communication (eg, a communication network) interconnects.
  • Examples of communication networks include local area networks (“LAN”), wide area networks (“WAN”), the Internet (eg, the Internet), and peer-to-peer networks (eg, ad hoc peer-to-peer networks), as well as any currently known or future development network of.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or may exist alone without being assembled into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device causes the electronic device to: preprocess the target video to obtain multiple targets in the target video image frame; identifying the position of the designated object in each of the target image frames; determining the reserved image frame in the plurality of target image frames according to the position of the designated object in each of the target image frames, the reserved image frame
  • the image frame is used to instruct to crop the image frame located before the reserved image frame in the target video.
  • Computer program code for performing operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and This includes conventional procedural programming languages - such as the "C" language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider to via Internet connection).
  • LAN local area network
  • WAN wide area network
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.
  • the modules involved in the embodiments of the present disclosure may be implemented in software or hardware. Wherein, the name of the module does not constitute a limitation of the module itself under certain circumstances, for example, the preprocessing module can also be described as "a module for acquiring target image frames".
  • exemplary types of hardware logic components include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), Systems on Chips (SOCs), Complex Programmable Logical Devices (CPLDs) and more.
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs Systems on Chips
  • CPLDs Complex Programmable Logical Devices
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • a video processing method comprising: preprocessing a target video to obtain multiple target image frames in the target video; identifying each target image The position of the specified object in the frame; according to the position of the specified object in each of the target image frames, a reserved image frame is determined among the plurality of target image frames, and the reserved image frame is used to indicate that the target video The image frame located before the reserved image frame is cropped.
  • the preprocessing of the target video to obtain multiple target image frames in the target video includes: according to a preset The frame interval of the target video is extracted, and the image frames in the target video are extracted to obtain the plurality of target image frames;
  • the specified objects include: at least one of a face, a text, a specified logo, or a salient object;
  • the recognizing the position of the designated object in each of the target image frames includes: filtering the target image frame to remove the frame in the target image frame; using an image recognition algorithm to identify the filtered target image frame,
  • the image recognition algorithm includes at least one of a face recognition algorithm, a text recognition algorithm, a designated mark recognition algorithm, or a saliency detection algorithm.
  • the method further includes: in the position according to the specified object in each target image frame, in the After it is determined to retain the image frame among the multiple target image frames, the cropping size is determined according to the original size of the target video and the display size of the terminal device, and the cropping size matches the display size; according to the cropping size and the preset size step value, determine the first number of cropping frames, each cropping frame has a different position on the image frame in the target video, and each cropping frame is the cropping size; according to each target image frame to be cropped
  • the position of the specified object described in, and the first number of cropping frames, determine the total cropping path, and the target image frame to be cropped is the target image frame located before the reserved image frame in the target video, and the The total cropping path includes: a cropping frame corresponding to each image frame in the target video that is located before the reserved image frame.
  • the method further includes: in the position of the specified object in each target image frame to be cropped, and a first A number of cropping frames, after determining the total cropping path, each image frame in the target video before the reserved image frame is cropped according to the total cropping path, so as to obtain a first image frame consisting of cropped image frames. video, and a second video consisting of uncropped image frames.
  • a plurality of the target image frames are arranged in sequence in the target video, and the target image frames are arranged according to each target image frame.
  • the position of the designated object described in , determining the reserved image frame among the plurality of target image frames includes: for each of the target image frames, determining the target image according to the position of the designated object in the target image frame The target ratio corresponding to the frame, and the target weight corresponding to the target image frame, the target ratio is used to indicate the proportion of the specified object in the target image frame, and the target weight is based on the target image frame.
  • the weight is determined; the corresponding target ratio is greater than the preset ratio threshold, and the target image frame with the first order is regarded as the first target image frame; the target weight is greater than the preset weight threshold, and the order first
  • the target image frame is used as the second target image frame; the reserved image frame is determined according to the first target image frame and the second target image frame.
  • the determining the reserved image frame according to the first target image frame and the second target image frame includes: If the target image frame to be selected in the target video is located before the preset latest image frame, the target image frame to be selected is used as the reserved image frame, and the target image frame to be selected is the first target image frame and the first target image frame in the second target image frame; if the target image frame to be selected in the target video is located after the latest image frame, or the target image frame to be selected is the same as the target image frame.
  • the latest image frame is the same, and the latest image frame is used as the reserved image frame.
  • the total number of cropping frames is determined.
  • the cropping path includes: determining a third number of cropping paths according to the first number of cropping frames and the second number of the target image frames to be cropped, and each cropping path includes: a second number of the target image frames to be cropped , the cropping frame corresponding to each target image frame to be cropped; the total amount of cropping loss corresponding to each cropping path is determined according to the position of the designated object in each target image frame to be cropped, and the cropping loss The total amount is used to indicate the size of the loss caused by the clipping of the second number of the target image frames to be clipped according to the clipping path; the clipping path with the smallest total clipping loss is used as the target clipping path; The clipping path is interpolated to obtain the total clipping path.
  • the cropping path corresponding to each cropping path is determined according to the position of the specified object in each target image frame to be cropped
  • the total amount of loss including: determining, according to the position of the designated object in each target image frame to be cropped, the amount of cropping loss corresponding to the target image frame to be cropped, where the amount of cropping loss is used to indicate that according to the first cropping path
  • the clipping frame corresponding to the target image frame to be clipped included in the clipping frame is the size of the loss caused by clipping the target image frame to be clipped, and the first clipping path is any one of the third number of clipping paths;
  • the cropping loss corresponding to each target image frame to be cropped is summed to obtain the total cropping loss corresponding to the first cropping path.
  • the corresponding image frame of the target image frame to be cropped is determined.
  • the amount of cropping loss including: according to the proportion of the specified object in the cropped image frame to the cropped image frame, and the target image frame to be cropped.
  • the specified object accounts for the proportion of the target image frame to be cropped, determine The importance score corresponding to the target image frame to be cropped, wherein the image frame after cropping is a cropping frame corresponding to the target image frame to be cropped included in the first cropping path.
  • the integrity score corresponding to the target image frame to be cropped is the integrity score corresponding to the target image frame to be cropped; according to the distance between the first cropping frame and the second cropping frame , determine the transition score corresponding to the target image frame to be cropped, the first cropping frame is the cropping frame corresponding to the target image frame to be cropped included in the first cropping path, and the second cropping frame is the first cropping frame.
  • the cropping frame corresponding to the previous target image frame to be cropped of the target image frame to be cropped determines the target image frame to be cropped The amount of cropping loss corresponding to the target image frame.
  • the method further includes: in the target video according to the total clipping path, in the reserved image frame After each previous image frame is cropped to obtain a first video composed of cropped image frames and a second video composed of uncropped image frames, control the first video on the terminal device Play in full-screen mode; when playing to a zoomed image frame, control the zoomed image frame to zoom out to a target position in the play area on the terminal device within a preset zooming duration, wherein the zoomed image frame is the last image frame in the first video; if the image frame before scaling is played in the playback area according to the original size, the target position is the image frame before scaling included in the total cropping path The position of the corresponding cropping frame; the image frame before zooming is the image frame that the zoomed image frame corresponds to in the target video and has not been cropped; the second video is controlled in the playback area, according to the original size to play.
  • the zoomed image frame is controlled to be zoomed out within a preset zoom duration to a size in a playback area on the terminal device.
  • the target position includes: determining a plurality of first vertices of the zoomed image, and each of the first vertices corresponds to a second vertex at the target position; according to each of the first vertices and the corresponding first vertex The distance between the two vertices and the zooming duration determine the zooming speed of the first vertex; according to the zooming speed of each first vertex, the zoomed image is controlled to be reduced to the target position.
  • a video processing apparatus comprising: a preprocessing module for preprocessing a target video to obtain multiple target image frames in the target video; identifying a module for identifying the position of the designated object in each of the target image frames; a first determining module for identifying the position of the designated object in each of the target image frames according to the position of the designated object in each of the target image frames A reserved image frame is determined, where the reserved image frame is used to instruct to crop an image frame located before the reserved image frame in the target video.
  • a computer-readable storage medium having stored thereon a computer program that, when executed by a processing apparatus, implements the steps of the methods described in Example 1 to Example 11.
  • an electronic device comprising: one or more processors; and a memory for storing one or more programs; when the one or more programs are One or more processors execute such that the one or more processors implement any one of the aforementioned video processing methods.
  • a computer program comprising instructions that, when executed by a processor, cause the processor to perform any one of the aforementioned video processing methods.
  • a computer program product comprising instructions that, when executed by a processor, cause the processor to perform any of the aforementioned video processing methods.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Television Signal Processing For Recording (AREA)
  • Image Processing (AREA)

Abstract

本公开涉及一种视频的处理方法、装置、可读介质和电子设备,涉及图像处理技术领域,该方法包括:对目标视频进行预处理,以得到目标视频中的多个目标图像帧,识别每个目标图像帧中指定对象的位置,根据每个目标图像帧中指定对象的位置,在多个目标图像帧中确定保留图像帧,保留图像帧用于指示对目标视频中位于保留图像帧之前的图像帧进行裁剪。本公开以目标图像帧中指定对象的位置为依据,确定保留图像帧,从而指示对保留图像帧之前的图像帧进行裁剪,能够确定适用于不同目标视频的保留图像帧,减少在对目标视频进行裁剪过程中造成的信息损失。

Description

视频的处理方法、装置、可读介质和电子设备
相关申请的交叉引用
本申请是以CN申请号为202011298813.5,申请日为2020年11月18日的申请为基础,并主张其优先权,该CN申请的公开内容在此作为整体引入本申请中。
技术领域
本公开涉及图像处理技术领域,具体地,涉及一种视频的处理方法、装置、可读介质和电子设备。
背景技术
随着终端技术和电子信息技术的不断发展,终端设备在人们日常生活中的重要性越来越高,人们可以通过终端设备上安装的各种应用程序来获取信息,因此广告商经常会在应用程序内投放视频广告。视频广告有多种展示方式,开屏展示是其中比较常见的一种展示方式,视频广告在用户打开应用程序时进行播放。
按照开屏展示进行展示的视频广告,通常被分为两部分,第一部分会占用终端设备的整个屏幕,即按照全屏模式进行展示,用以凸显第一部分中所包含的信息,第二部分会按照视频广告的原始尺寸进行展示。第一部分由于要按照全屏模式进行展示,因此需要预先对原始视频中的图像帧进行裁剪,以使裁剪后的图像帧符合终端设备的显示尺寸。
发明内容
提供该发明内容部分以便以简要的形式介绍构思,这些构思将在后面的具体实施方式部分被详细描述。该发明内容部分并不旨在标识要求保护的技术方案的关键特征或必要特征,也不旨在用于限制所要求的保护的技术方案的范围。
根据本公开一些实施例的第一方面,提供一种视频的处理方法,该方法包括:
对目标视频进行预处理,以得到所述目标视频中的多个目标图像帧;
识别每个所述目标图像帧中指定对象的位置;
根据每个所述目标图像帧中所述指定对象的位置,在所述多个目标图像帧中确定保留图像帧,所述保留图像帧用于指示对所述目标视频中位于所述保留图像帧之前的 图像帧进行裁剪。
根据本公开一些实施例的第二方面,提供一种视频的处理装置,所述装置包括:
预处理模块,用于对目标视频进行预处理,以得到所述目标视频中的多个目标图像帧;
识别模块,用于识别每个所述目标图像帧中指定对象的位置;
第一确定模块,用于根据每个所述目标图像帧中所述指定对象的位置,在所述多个目标图像帧中确定保留图像帧,所述保留图像帧用于指示对所述目标视频中位于所述保留图像帧之前的图像帧进行裁剪。
根据本公开一些实施例的第三方面,提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理装置执行时实现本公开第一方面所述方法的步骤。
根据本公开一些实施例的第四方面,提供一种电子设备,包括:
一个或多个处理器;和
存储器,用于存储一个或多个程序;
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现前述任意一种视频处理方法。
根据本公开一些实施例的第五个方面,提供一种计算机程序,包括:
指令,所述指令当由处理器执行时使所述处理器执行前述任意一种视频处理方法。
根据本公开一些实施例的第六个方面,提供一种计算机程序产品,包括指令,所述指令当由处理器执行时使所述处理器执行前述任意一种视频处理方法。
本公开的其他特征和优点将在随后的具体实施方式部分予以详细说明。
附图说明
结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。在附图中:
图1是根据一示例性实施例示出的一种视频的处理方法的流程图;
图2是根据一示例性实施例示出的另一种视频的处理方法的流程图;
图3是根据一示例性实施例示出的另一种视频的处理方法的流程图;
图4是根据一示例性实施例示出的另一种视频的处理方法的流程图;
图5是根据一示例性实施例示出的另一种视频的处理方法的流程图;
图6是根据一示例性实施例示出的另一种视频的处理方法的流程图;
图7是根据一示例性实施例示出的另一种视频的处理方法的流程图;
图8是根据一示例性实施例示出的终端设备的显示屏幕的示意图;
图9是根据一示例性实施例示出的另一种视频的处理方法的流程图;
图10是根据一示例性实施例示出的终端设备的显示屏幕的示意图;
图11是根据一示例性实施例示出的一种视频的处理装置的框图;
图12是根据一示例性实施例示出的另一种视频的处理装置的框图;
图13是根据一示例性实施例示出的另一种视频的处理装置的框图;
图14是根据一示例性实施例示出的另一种视频的处理装置的框图;
图15是根据一示例性实施例示出的另一种视频的处理装置的框图;
图16是根据一示例性实施例示出的另一种视频的处理装置的框图;
图17是根据一示例性实施例示出的一种电子设备的框。
具体实施方式
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的, 本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
由于视频广告中可能存在不适于裁剪的图像帧,如果对这些图像帧进行裁剪,可能会导致信息损失过大,无法给用户传递有效的信息。本发明的一些实施例提供了一种减少在对目标视频进行裁剪过程中造成的信息损失的视频处理方法。
图1是根据一示例性实施例示出的一种视频的处理方法的流程图,如图1所示,该方法包括以下步骤101-103。
步骤101,对目标视频进行预处理,以得到目标视频中的多个目标图像帧。
举例来说,本公开中实施例的执行主体可以是终端设备,也可以是服务器,也可以一部分步骤在终端设备上执行,另一部分步骤在服务器上执行,本公开对此不作具体限定。首先确定目标视频,目标视频可以是用户(例如:广告商或者个人用户)拍摄的视频,也可以是用户通过浏览多媒体资源库(多媒体资源库可以存储在终端设备中,也可以存储在服务器中),在其中选择的视频,还可以是用户上传到互联网上的视频。可以通过目标视频的标识码或者URL(英文:Uniform Resource Locator,中文:统一资源定位器)地址来获取目标视频。在获取目标视频之后,可以对目标视频进行预处理,以得到目标视频中包括的多个目标图像帧。预处理例如可以为对目标视频中包括的图像帧进行抽取,得到目标图像帧。例如,目标视频中包括1000个图像帧,从中抽取200个图像帧作为目标图像帧。预处理还可以为对目标视频中包括的图像帧进行降噪处理,以去除图像帧中的噪声,得到目标图像帧。预处理也可以为对目标视频中包括的图像帧进行去边框处理,以去除图像帧中与传递信息无关的边框,得到目标图像帧。预处理还可以为先对目标视频中包括的图像帧进行抽取,再对抽取后的图像帧进行降噪处理、去边框处理等,得到目标图像帧。需要说明的是,目标图像帧可以为目标视频中包括的全部图像帧,也可以为目标视频中包括的部分图像帧,本公开对此不作具体限定。
步骤102,识别每个目标图像帧中指定对象的位置。
步骤103,根据每个目标图像帧中指定对象的位置,在多个目标图像帧中确定保留图像帧,保留图像帧用于指示对目标视频中位于保留图像帧之前的图像帧进行裁剪。
示例的,在得到目标图像帧之后,可以按照预设的图像识别算法对每个目标图像 帧进行识别,以识别出每个目标图像帧中指定对象的位置。指定对象可以理解为目标视频中想要表现的主要内容,或者想要凸显的内容,例如可以包括:人脸、文字、指定标识、或显著对象中的至少一种,指定标识例如可以是用户指定的水印、商标等,显著对象可以理解为目标图像帧中占据比例较大的对象。最后根据每个目标图像帧中指定对象的位置,在多个目标图像帧中,选取保留图像帧,从而指示对目标视频中位于保留图像帧之前的图像帧进行裁剪。保留图像帧可以理解为,对目标视频进行裁剪的位置,也可以理解为将目标视频分为按照全屏模式展示的视频(即后文提及的第一视频)和按照原始尺寸展示的视频(即后文提及的第二视频)的分界点,目标视频中位于保留图像帧之前的图像帧均适于裁剪。需要说明的是,在步骤103确定保留图像帧之后,该方法还可以包括:根据保留图像帧对目标视频进行裁剪,以将目标视频裁剪为全屏视频和原始视频,然后控制全屏视频按照全屏模式进行展示,控制原始视频按照原始尺寸进行展示。全屏视频是由位于保留图像帧之前的每个图像帧经过裁剪后组成的,原始视频是由保留图像帧和位于保留图像帧之后的每个图像帧组成的。
在一些实施例中,确定保留图像帧的方式,例如可以是依次将指定对象在每个目标图像帧中所占的面积比,与预设的面积阈值进行比较,如果面积比大于面积阈值,说明该目标图像帧的重要程度较高(即传递的信息较多),不适于裁剪;如果面积比小于或等于面积阈值,说明该目标图像帧的重要程度较低(即传递的信息较少),适于裁剪。然后重要程度较高的目标图像帧中序号最小的图像帧作为保留图像帧,其中帧序号用于表示该目标图像帧在目标视频中的顺序。确定保留图像帧的方式,还可以是依次将指定对象在每个目标图像帧中的位置,与预设的裁剪位置进行比较,如果指定对象位于裁剪位置的内部,说明如果对该目标图像帧进行裁剪,指定对象仍然能够完整展示,适于裁剪,如指定对象位于裁剪位置的外部,说明如果对该目标图像帧进行裁剪,用户看不到指定对象,不适于裁剪。然后将不适于裁剪的目标图像帧中序号最小的图像帧作为保留图像帧。
以目标视频为护肤品广告,指定对象为人脸为例,目标视频中包括500个图像帧,按照每隔1帧的间隔,从中抽取250个图像帧,再对这250个图像帧进行去边框处理,得到250个目标图像帧,其中,250个目标图像帧对应在目标视频中的帧序号可以为(1,3,5,…,497,499)。之后对250个目标图像帧进行人脸识别,以确定每个目标图像帧中人脸的位置。最后,依次确定人脸在每个目标图像帧中所占的面积比,若求得的面积比大于60%,那么说明该图像帧的重要程度较高,不适于裁剪,若求得的 面积比小于或等于60%,说明该图像帧的重要程度较低,适于裁剪。若确定250个目标图像帧中,帧序号为15、19、21、23、35等目标图像帧不适于裁剪,那么可以将帧序号为15的目标图像帧作为保留图像帧。
综上所述,本公开首先对目标视频进行预处理,以获取目标视频中的多个目标视频帧,之后对每个目标视频帧进行识别,以得到指定对象在目标视频中的位置,最后根据每个目标图像帧中指定对象的位置,在多个目标图像帧中,选取保留图像帧,保留图像帧用于指示对目标图像视频中位于保留图像帧之前的图像帧进行裁剪。本公开以目标图像帧中指定对象的位置为依据,确定保留图像帧,从而指示对保留图像帧之前的图像帧进行裁剪,能够确定适用于不同目标视频的保留图像帧,减少在对目标视频进行裁剪过程中造成的信息损失。
图2是根据一示例性实施例示出的另一种视频的处理方法的流程图,如图2所示,在一些实施例中,步骤101的实现方式可以如步骤S1011所示:
步骤1011,按照预设的帧间隔,对目标视频中的图像帧进行抽取,以得到多个目标图像帧。
相应的,步骤102可以包括:
步骤1021,对该目标图像帧进行过滤,以去除该目标图像帧中的边框。
步骤1022,利用图像识别算法对过滤后的该目标图像帧进行识别,以确定该目标图像帧中指定对象的位置,图像识别算法包括:人脸识别算法、文字识别算法、指定标识识别算法、或显著性检测算法中的至少一种。
示例的,对目标视频的预处理,可以为按照预设的帧间隔对目标视频中的图像帧进行抽取,例如帧间隔可以为5,那么可以对目标视频中的图像帧,每隔5个图像帧抽取一个图像帧作为目标图像帧。在识别每个目标图像帧中指定对象的位置时,可以先对每个目标图像帧进行过滤,从而去除该目标图像帧中包含的,不能传递信息的边框,之后再利用预设的图像识别算法对过滤后的该目标图像帧进行识别,从而得到该目标图像帧中指定对象的位置。指定对象可以包括人脸、文字、指定标识、或显著对象中的至少一种,那么相应的,图像识别算法可以为人脸识别算法、文字识别算法、指定标识识别算法、或显著性检测算法中的至少一种。
图3是根据一示例性实施例示出的另一种视频的处理方法的流程图,如图3所示,在步骤103之后,该方法还可以包括步骤104-106。
步骤104,根据目标视频的原始尺寸和终端设备的显示尺寸,确定裁剪尺寸,裁 剪尺寸与显示尺寸匹配。
步骤105,根据裁剪尺寸和预设的步进值,确定第一数量种裁剪框,每种裁剪框在目标视频中的图像帧上的位置不同,每种裁剪框均为裁剪尺寸。
步骤106,根据每个待裁剪目标图像帧中指定对象的位置,和第一数量种裁剪框,确定总裁剪路径,待裁剪目标图像帧为目标视频中位于保留图像帧之前的目标图像帧,总裁剪路径包括:目标视频中位于保留图像帧之前的每个图像帧对应的裁剪框。
在一种应用场景中,在确定了保留图像帧之后,还可以进一步确定适用于目标视频的总裁剪路径。可以理解为,总裁剪路径中包括了多个裁剪框,每个裁剪框对应于目标视频中位于保留图像帧之前的一个图像帧,对应的裁剪框用于指示在对该图像帧进行裁剪时,裁剪哪一部分,保留哪一部分,即裁剪位于裁剪框之外的像素,保留位于裁剪框之内的像素。也就是说,总裁剪路径能够指示如何对目标视频中位于保留图像帧之前的每个图像帧进行裁剪。
下面示例性地描述一种确定总裁剪路径的方法。首先,根据目标视频的原始尺寸和终端设备的显示尺寸,确定裁剪尺寸。目标视频的原始尺寸可以理解为目标视频的分辨率。显示尺寸例如为需要展示目标视频的终端设备的显示屏幕的大小。裁剪尺寸例如为与显示尺寸匹配的分辨率。裁剪尺寸与显示尺寸匹配,可以理解为终端设备的显示屏幕可以直接显示分辨率为裁剪尺寸的图像。例如,原始尺寸是1280*720,显示尺寸为1:1,那么以在长和宽中只裁剪一条边为规则,可以确定裁剪尺寸为720*720。
之后,可以根据确定的裁剪尺寸和预设的步进值,确定第一数量种裁剪框,其中,每种裁剪框的大小均为裁剪尺寸,并且每种裁剪框在图像帧上的位置都不相同。可以理解为,从图像帧的一侧开始,先得到第一种裁剪框,然后将第一种裁剪框沿着指定的方向移动一个步进值,得到第二种裁剪框,再沿着指定的方向移动两个步进值,得到第三种裁剪框,以此类推。以图像帧的原始尺寸是1280*720,裁剪尺寸为720*720,步进值为20个像素为例,从图像帧的左侧开始,得到第一种裁剪框,第一种裁剪框内的像素的横坐标范围为:1-720,纵坐标范围为:1-720。然后将第一种裁剪框向右移动20个像素,得到第二种裁剪框,第二种裁剪框内的像素的横坐标范围为:21-740,纵坐标范围为:1-720,以此类推,可以得到29种裁剪框。
最后,可以针对每个待裁剪目标图像帧,以该待裁剪目标图像帧中指定对象的位置为依据,在第一数量种裁剪框中进行筛选,筛选出信息损失最少的一种裁剪框作为该待裁剪目标图像帧对应的裁剪框。从而得到每个待裁剪目标图像帧对应的裁剪框。 待裁剪目标图像帧为位于保留图像帧之前的目标图像帧。进一步的,可以根据每个待裁剪目标图像帧对应的裁剪框得到目标视频中位于保留图像帧之前的每个图像帧对应的裁剪框,从而得到总裁剪路径。若目标图像帧是按照帧间隔从目标视频中抽取得到的,那么可以对每个待裁剪目标图像帧对应的裁剪框进行插值,以得到总裁剪路径。
图4是根据一示例性实施例示出的另一种视频的处理方法的流程图,如图4所示,在步骤106之后,该方法还可以包括:
步骤107,按照总裁剪路径对目标视频中位于保留图像帧之前的每个图像帧进行裁剪,以得到由经过裁剪的图像帧组成的第一视频,和由未经过裁剪的图像帧组成的第二视频。
示例的,在确定总裁剪类路径之后,可以按照保留图像帧将目标视频中的图像帧分为两部分:第一部分为位于保留图像帧之前的图像帧,第二部分为保留图像帧,和位于保留图像帧之后的图像帧。将第二部分中的图像帧组成第二视频,其中包括的是未经过裁剪的图像帧。对第一部分中的图像帧按照总裁剪路径进行裁剪,也就是说按照总裁剪路径中包括的位于保留图像帧之前的每个图像帧对应的裁剪框,对相应的图像帧进行裁剪,将经过裁剪的图像帧组成的第一视频。第一视频适于在终端设备上按照全屏模式进行展示,第二视频适于在终端设备上按照原始尺寸进行展示。
图5是根据一示例性实施例示出的另一种视频的处理方法的流程图,如图5所示,多个目标图像帧按照在目标视频中的顺序排列,步骤103的实现方式可以为:
步骤1031,针对每个目标图像帧,根据该目标图像帧中指定对象的位置,确定该目标图像帧对应的目标比例,和该目标图像帧对应的目标权重,目标比例用于指示指定对象占该目标图像帧的比例,目标权重为根据该目标图像帧中每个像素的权重确定的;
步骤1032,将对应的目标比例大于预设的比例阈值,且顺序最前的目标图像帧作为第一目标图像帧;
步骤1033,将目标权重大于预设的权重阈值,且顺序最前的目标图像帧作为第二目标图像帧;
步骤1034,根据第一目标图像帧和第二目标图像帧,确定保留图像帧。
举例来说,步骤101中得到的多个目标图像帧可以按照在目标视频中的顺序进行排列,可以理解为多个目标图像帧按照帧序号进行排列,例如,目标视频中包括500个图像帧,按照每隔1帧的间隔,从中抽取250个目标图像帧,帧序号为(1,3,5,…, 497,499),可以按照从小到大的顺序对目标图像帧进行排列。
针对每个目标图像帧,首先根据该目标图像帧中指定对象的位置,确定该目标图像帧对应的目标比例,和该目标图像帧对应的目标权重。其中,目标比例能够指示指定对象占该目标图像帧的比例,可以理解为指定对象在该目标图像帧中所占的面积比,例如,一个目标图像帧总共包括1000个像素,其中指定对象占550个像素,那么目标比例即为55%。目标比例还可以理解为按照一个像素是否属于指定对象,对该目标图像帧进行二值化处理,求得其中的一个或多个连通区域,然后将每个连通区域的外接矩形的面积,与该目标图像帧的面积的比例作为目标比例。目标权重为根据该目标图像帧中每个像素的权重确定的,可以理解为该目标图像帧中,每个像素都有一个权重,属于指定对象的像素的权重高,不属于指定对象的像素的权重低,然后将该目标图像帧中全部像素的权重求和再求平均,得到的即为目标权重。例如,一个目标图像帧中有500个像素,每个像素的权重初始值为1,如果一个像素属于指定对象,那么该像素的权重设置为2。500个像素中有300个像素的权重为2,其余像素的权重为1,那么将500个像素的权重求和再求平均,即为800/500=1.6。
之后,可以根据目标比例与预设的比例阈值(例如可以为60%),确定第一目标图像帧,第一目标图像帧为对应的目标比例大于比例阈值,并且在多个目标图像帧中顺序最前(即帧序号最小)的目标图像帧。同时,可以根据目标权重和预设的权重阈值(例如可以是1.3),确定第二目标图像帧,第二标图像帧为对应的目标权重大于权重阈值,并且在多个目标图像帧中顺序最前(即帧序号最小)的目标图像帧。最后,根据第一目标图像帧和第二目标图像帧,确定保留图像帧。
在一种实现场景中,步骤1034中确定保留图像帧的方式可以分为以下两种:
方式一,若目标视频中待选目标图像帧位于预设的最迟图像帧之前,将待选目标图像帧作为保留图像帧,待选目标图像帧为第一目标图像帧和第二目标图像帧中顺序在前的目标图像帧。
方式二,若目标视频中待选目标图像帧位于最迟图像帧之后,或者待选目标图像帧与最迟图像帧相同,将最迟图像帧作为保留图像帧。
示例的,可以现在第一目标图像帧和第二目标图像帧中,选出顺序在前的目标图像帧,即先确定第一目标图像帧和第二目标图像帧哪个帧序号最小,将其作为待选目标图像帧。然后,比较待选目标图像帧和预设的最迟图像帧在目标视频中的顺序,将顺序在前的图像帧作为保留图像帧。其中,最迟图像帧可以理解为预先指定的,最迟 的图像帧。通常来说,目标视频中只需要进行按照全屏模式展示一段时间,可以根据全屏模式展示的最长时间来确定最迟图像帧。以全屏模式展示的最长时间为10s,目标视频的帧率为30帧/秒来举例,那么最迟图像帧的帧序号为300。将待选目标图像帧和最迟图像帧中,帧序号最小的图像帧作为保留图像帧。
图6是根据一示例性实施例示出的另一种视频的处理方法的流程图,如图6所示,步骤106可以包括:
步骤1061,根据第一数量种裁剪框和待裁剪目标图像帧的第二数量,确定第三数量种裁剪路径,每种裁剪路径包括:第二数量个待裁剪目标图像帧中,每个待裁剪目标图像帧对应的裁剪框;
步骤1062,根据每个待裁剪目标图像帧中指定对象的位置,确定每种裁剪路径对应的裁剪损失总量,裁剪损失总量用于指示按照该种裁剪路径对第二数量个待裁剪目标图像帧进行裁剪所造成的损失大小;
步骤1063,将裁剪损失总量最小的一种裁剪路径作为目标裁剪路径;
步骤1064,对目标裁剪路径进行插值,得到总裁剪路径。
下面针对总裁剪路径的获取方法,进行具体的说明。首先,可以根据第一数量种裁剪框和待裁剪目标图像帧的第二数量,确定第三数量种裁剪路径。每一种裁剪路径中,都包括每个待裁剪目标图像帧对应的裁剪框。第三数量的确定方式可以理解为第一数量种裁剪框,按照第二数量进行排列组合。以5种裁剪框,6个待裁剪目标图像帧来举例,那么可以得到31250(即第三数量)种裁剪路径。之后,可以分别计算每种裁剪路径的裁剪损失总量。裁剪损失总量用于指示按照该种裁剪路径对第二数量个待裁剪目标图像帧进行裁剪,所造成的信息损失的大小。再将裁剪损失总量最小的一种裁剪路径作为目标裁剪路径。也就是说,目标裁剪路径为第三数量种裁剪路径中,造成的信息损失最小的裁剪路径,这样能够在将目标视频裁剪为第一视频和第二视频的前提下,尽可能保留目标视频中的信息。最后,若目标图像帧是按照帧间隔从目标视频中抽取得到的,可以对目标裁剪路径进行插值,得到包括了位于保留图像帧之前的每个图像帧对应的裁剪框的总裁剪路径。
在一种实现方式中,步骤1062中的裁剪损失总量可以通过以下步骤来获取:
步骤1)根据每个待裁剪目标图像帧中指定对象的位置,确定该待裁剪目标图像帧对应的裁剪损失量,裁剪损失量用于指示按照第一裁剪路径中包括的该待裁剪目标图像帧对应的裁剪框,对该待裁剪目标图像帧进行裁剪所造成的损失大小,第一裁剪 路径为第三数量种裁剪路径中的任一种裁剪路径;
步骤2)对每个待裁剪目标图像帧对应的裁剪损失量求和,得到第一裁剪路径对应的裁剪损失总量。
以第一裁剪路径来举例说明,可以先确定如果按照第一裁剪路径对待裁剪目标图像帧进行裁剪,得到的裁剪后图像帧。之后根据每个待裁剪目标图像帧和对应的裁剪后图像帧,得到该待裁剪目标图像帧对应的裁剪损失量,其中,裁剪损失量用于指示裁剪后所造成的信息损失的大小。裁剪损失量可以包括三个部分:重要性评分、完整性评分和转移评分。
重要性评分的确定方式可以为:根据裁剪后图像帧中指定对象占裁剪后图像帧的比例,与该待裁剪目标图像帧中指定对象占该待裁剪目标图像帧的比例,确定该待裁剪目标图像帧对应的重要性评分。重要性评分(可以表示为IMP)的计算公式可以为:IMP i=1-[I(C i)/I(O i)],其中,IMP i表示第i个待裁剪目标图像帧对应的重要性评分,I(C i)表示第i个待裁剪目标图像帧对应的裁剪后图像帧中指定对象占裁剪后图像帧的比例,I(O i)表示第i个待裁剪目标图像帧中指定对象占该待裁剪目标图像帧的比例。
完整性评分的确定方式可以为:根据裁剪后图像帧中指定对象的完整程度,确定该待裁剪目标图像帧对应的完整性评分。完整程度可以理解为指定对象所在的位置与裁剪框的覆盖率,例如可以为指定对象与裁剪框之间重复的面积,与指定对象的比值。完整性评分(可以表示为COM)的计算公式可以为:
Figure PCTCN2021130875-appb-000001
其中COM i表示第i个待裁剪目标图像帧对应的完整性评分,x i,j表示第i个待裁剪目标图像帧中第j个指定对象所在的位置与裁剪框的覆盖率,M表示第i个待裁剪目标图像帧中指定对象的个数。以指定对象为文字来举例,待目标图像帧中识别出10(即M=10)段文字,可以将每一段文字所在的位置作为一个文本框,然后确定每个文本框与裁剪框之间重复的面积,与该文本框的面积的比值(即x i,j)。例如,一个文本框的面积为100个像素,该文本框和裁剪框重复的面积为20个像素,那么该文本框所在位置与裁剪框的覆盖率为20%。
转移评分的确定方式可以为:根据第一裁剪框和第二裁剪框之间的距离,确定该待裁剪目标图像帧对应的转移评分,第一裁剪框为第一裁剪路径中包括的该待裁剪目标图像帧对应的裁剪框,第二裁剪框为第一裁剪路径中包括的,该待裁剪目标图像帧的上一待裁剪目标图像帧对应的裁剪框。转移评分(可以表示为TRA)的计算公式可 以为TRA i=T(C i)-T(C i-1),其中,TRA i表示第i个待裁剪目标图像帧对应的转移评分,T(C i)表示第一裁剪框的坐标,T(C i-1)表示第二裁剪框的坐标。
进一步的,可以根据该待裁剪目标图像帧对应的重要性评分、完整性评分和转移评分,确定该待裁剪目标图像帧对应的裁剪损失量。裁剪损失量(可以表示为Loss)的计算公式可以为:
Figure PCTCN2021130875-appb-000002
其中,Loss i表示第i个待裁剪目标图像帧对应的裁剪损失量,N表示第二数量,λ 1表示重要性评分对应的权重,λ 2表示完整性评分对应的权重,λ 3表示转移评分对应的权重。
最后,在求得每个待裁剪目标图像帧对应的裁剪损失量之后,将每个待裁剪目标图像帧对应的裁剪损失量求和,得到第一裁剪路径对应的裁剪损失总量,然后从第三数量种裁剪路径中,选出裁剪损失总量最小的目标裁剪路径。
图7是根据一示例性实施例示出的另一种视频的处理方法的流程图,如图7所示,在步骤107之后,该方法还包括步骤108-110。
步骤108,控制第一视频在终端设备上以全屏模式进行播放。
步骤109,在播放至缩放图像帧时,控制缩放图像帧在预设的缩放时长内,缩小至终端设备上的播放区域内的目标位置。缩放图像帧为第一视频中的最后一个图像帧。若缩放前图像帧按照原始尺寸在播放区域内播放,目标位置为总裁剪路径中包括的缩放前图像帧对应的裁剪框的位置,缩放前图像帧为缩放图像帧对应在目标视频中,未经过裁剪的图像帧。
步骤110,控制第二视频在播放区域内,按照原始尺寸进行播放。
举例来说,在将目标视频裁剪为第一视频和第二视频之后,可以先控制第一视频在终端设备上以全屏模式进行播放。这样,可以凸显第一视频中所包含的信息。进一步的,在播放到第一视频中的最后一个图像帧(即缩放图像帧)时,可以控制缩放图像帧在预设的缩放时长(例如可以为1s)内,缩小至终端设备上的播放区域内的目标位置。目标位置可以理解为,若将缩放图像帧对应在目标视频中,未经过裁剪的图像帧(即缩放前图像帧)按照原始尺寸在播放区域内进行播放,那么缩放前图像帧对应的裁剪框在播放区域内显示的位置。其中,播放区域可以为终端设备的显示屏幕的中间区域,也可以是其他区域。
在缩放图像帧缩小至终端设备上的播放区域内的目标位置之后,再控制第二视频在播放区域内,按照原始尺寸进行播放。这样,能够将第一视频和第二视频衔接起来, 减少图像帧在切换播放模式的过程中造成的突兀感。
以图8所示的显示屏幕来举例,区域1为整个显示屏幕,在图8中以实线框框出,第一视频展示在区域1内,区域2为播放区域,若控制缩放前图像帧按照原始尺寸在播放区域内播放,那么缩放前图像帧位于区域2,总裁剪路径中包括的缩放前图像帧对应的裁剪框的位置为区域3。那么首先控制第一视频在区域1内进行播放,然后播放至缩放图像帧时,控制缩放图像帧在缩放时长内,从区域1缩放至区域3,最后再控制第二视频在区域2中进行播放。
图9是根据一示例性实施例示出的另一种视频的处理方法的流程图,如图9所示,步骤109可以包括:
步骤1091,确定缩放图像的多个第一顶点,和每个第一顶点对应在目标位置的第二顶点;
步骤1092,根据每个第一顶点与对应的第二顶点之间的距离,和缩放时长,确定该第一顶点的缩放速度;
步骤1093,按照每个第一顶点的缩放速度,控制缩放图像缩小至目标位置。
在一种应用场景中,控制缩放图像帧缩小至终端设备上的播放区域内的目标位置,可以先确定缩放图像的多个第一顶点,和每个第一顶点对应在目标位置的第二顶点。以图10所示,缩放图像的四个第一顶点为A、B、C、D,目标位置的四个第二顶点为a、b、c、d,第一顶点和第二顶点一一对应。之后,确定每个第一顶点与对应的第二顶点之间的距离,再根据缩放时长,确定该第一顶点的缩放速度。以第一顶点A和对应的第二顶点a来举例,用线段Aa的长度100个像素,除以缩放时长2s,即为第一顶点A的缩放速度为50像素/秒。最后,可以按照每个第一顶点的缩放速度,控制缩放图像缩小至目标位置。例如,从当前时刻开始,缩放图像的第一顶点A,位于终端设备的显示屏幕的顶点,然后,在10ms之后,第一顶点A沿着线段Aa向前5个像素,20ms之后,第一顶点A沿着线段Aa向前10个像素,其中,第一顶点A会经过A’,依次类推。需要说明的是,控制缩放图像缩小至目标位置的过程中,终端设备的显示屏幕上,在缩放图像之外的区域可以显示为预设的填充色(例如黑色),也可以显示后台运行的应用程序的应用界面,本公开对此不作具体限定。
综上所述,本公开首先对目标视频进行预处理,以获取目标视频中的多个目标视频帧,之后对每个目标视频帧进行识别,以得到指定对象在目标视频中的位置,最后根据每个目标图像帧中指定对象的位置,在多个目标图像帧中,选取保留图像帧,保 留图像帧用于指示对目标图像视频中位于保留图像帧之前的图像帧进行裁剪。本公开以目标图像帧中指定对象的位置为依据,确定保留图像帧,从而指示对保留图像帧之前的图像帧进行裁剪,能够确定适用于不同目标视频的保留图像帧,减少在对目标视频进行裁剪过程中造成的信息损失。
图11是根据一示例性实施例示出的一种视频的处理装置的框图,如图11所示,该装置200包括:
预处理模块201,用于对目标视频进行预处理,以得到目标视频中的多个目标图像帧;
识别模块202,用于识别每个目标图像帧中指定对象的位置;
第一确定模块203,也可以称为“保留图像帧确定模块”,用于根据每个目标图像帧中指定对象的位置,在多个目标图像帧中确定保留图像帧,保留图像帧用于指示对目标视频中位于保留图像帧之前的图像帧进行裁剪。
在一种实现场景中,指定对象包括:人脸、文字、指定标识、或显著对象中的至少一种。预处理模块201用于:按照预设的帧间隔,对目标视频中的图像帧进行抽取,以得到多个目标图像帧。
识别模块202用于:对该目标图像帧进行过滤,以去除该目标图像帧中的边框。利用图像识别算法对过滤后的该目标图像帧进行识别,以确定该目标图像帧中指定对象的位置,图像识别算法包括:人脸识别算法、文字识别算法、指定标识识别算法、或显著性检测算法中的至少一种。
图12是根据一示例性实施例示出的另一种视频的处理装置的框图,如图12所示,该装置200还包括:
第二确定模块204,也可以称为“裁剪尺寸确定模块”,用于在根据每个目标图像帧中指定对象的位置,在多个目标图像帧中确定保留图像帧之后,根据目标视频的原始尺寸和终端设备的显示尺寸,确定裁剪尺寸,裁剪尺寸与显示尺寸匹配;
第三确定模块205,也可以称为“裁剪框确定模块”,还用于根据裁剪尺寸和预设的步进值,确定第一数量种裁剪框,每种裁剪框在目标视频中的图像帧上的位置不同,每种裁剪框均为裁剪尺寸;
第四确定模块206,也可以称为“裁剪路径确定模块”,用于根据每个待裁剪目标图像帧中指定对象的位置,和第一数量种裁剪框,确定总裁剪路径,待裁剪目标图像帧为目标视频中位于保留图像帧之前的目标图像帧,总裁剪路径包括:目标视频中 位于保留图像帧之前的每个图像帧对应的裁剪框。
图13是根据一示例性实施例示出的另一种视频的处理装置的框图,如图13所示,该装置200还包括:
裁剪模块207,用于在根据每个待裁剪目标图像帧中指定对象的位置,和第一数量种裁剪框,确定总裁剪路径之后,按照总裁剪路径对目标视频中位于保留图像帧之前的每个图像帧进行裁剪,以得到由经过裁剪的图像帧组成的第一视频,和由未经过裁剪的图像帧组成的第二视频。
图14是根据一示例性实施例示出的另一种视频的处理装置的框图,如图14所示,多个目标图像帧按照在目标视频中的顺序排列,第一确定模块203可以包括:
第一确定子模块2031,用于针对每个目标图像帧,根据该目标图像帧中指定对象的位置,确定该目标图像帧对应的目标比例,和该目标图像帧对应的目标权重,目标比例用于指示指定对象占该目标图像帧的比例,目标权重为根据该目标图像帧中每个像素的权重确定的;
第二确定子模块2032,用于将对应的目标比例大于预设的比例阈值,且顺序最前的目标图像帧作为第一目标图像帧。将目标权重大于预设的权重阈值,且顺序最前的目标图像帧作为第二目标图像帧;
第三确定子模块2033,用于根据第一目标图像帧和第二目标图像帧,确定保留图像帧。
在一种应用场景中,第三确定子模块2033中确定保留图像帧的方式可以分为以下两种:
方式一,若目标视频中待选目标图像帧位于预设的最迟图像帧之前,将待选目标图像帧作为保留图像帧,待选目标图像帧为第一目标图像帧和第二目标图像帧中顺序在前的目标图像帧。
方式二,若目标视频中待选目标图像帧位于最迟图像帧之后,或者待选目标图像帧与最迟图像帧相同,将最迟图像帧作为保留图像帧。
图15是根据一示例性实施例示出的另一种视频的处理装置的框图,如图15所示,第四确定模块206可以包括:
第四确定子模块2061,用于根据第一数量种裁剪框和待裁剪目标图像帧的第二数量,确定第三数量种裁剪路径,每种裁剪路径包括:第二数量个待裁剪目标图像帧中,每个待裁剪目标图像帧对应的裁剪框;
第五确定子模块2062,用于根据每个待裁剪目标图像帧中指定对象的位置,确定每种裁剪路径对应的裁剪损失总量,裁剪损失总量用于指示按照该种裁剪路径对第二数量个待裁剪目标图像帧进行裁剪所造成的损失大小;
第六确定子模块2063,用于将裁剪损失总量最小的一种裁剪路径作为目标裁剪路径。对目标裁剪路径进行插值,得到总裁剪路径。
在一种应用场景中,第五确定子模块2062可以用于执行以下步骤:
步骤1)根据每个待裁剪目标图像帧中指定对象的位置,确定该待裁剪目标图像帧对应的裁剪损失量,裁剪损失量用于指示按照第一裁剪路径中包括的该待裁剪目标图像帧对应的裁剪框,对该待裁剪目标图像帧进行裁剪所造成的损失大小,第一裁剪路径为第三数量种裁剪路径中的任一种裁剪路径;
步骤2)对每个待裁剪目标图像帧对应的裁剪损失量求和,得到第一裁剪路径对应的裁剪损失总量。
在一些实施例中,步骤1)可以包括:
步骤1a)根据裁剪后图像帧中指定对象占裁剪后图像帧的比例,与该待裁剪目标图像帧中指定对象占该待裁剪目标图像帧的比例,确定该待裁剪目标图像帧对应的重要性评分,裁剪后图像帧为按照第一裁剪路径中包括的该待裁剪目标图像帧对应的裁剪框,对该待裁剪目标图像帧进行裁剪后得到的图像帧;
步骤1b)根据裁剪后图像帧中指定对象的完整程度,确定该待裁剪目标图像帧对应的完整性评分;
步骤1c)根据第一裁剪框和第二裁剪框之间的距离,确定该待裁剪目标图像帧对应的转移评分,第一裁剪框为第一裁剪路径中包括的该待裁剪目标图像帧对应的裁剪框,第二裁剪框为第一裁剪路径中包括的,该待裁剪目标图像帧的上一待裁剪目标图像帧对应的裁剪框;
步骤1d)根据该待裁剪目标图像帧对应的重要性评分、完整性评分和转移评分,确定该待裁剪目标图像帧对应的裁剪损失量。
图16是根据一示例性实施例示出的另一种视频的处理装置的框图,如图16所示,该装置200还包括:
全屏控制模块208,用于在按照总裁剪路径对目标视频中位于保留图像帧之前的每个图像帧进行裁剪,以得到由经过裁剪的图像帧组成的第一视频,和由未经过裁剪的图像帧组成的第二视频之后,控制第一视频在终端设备上以全屏模式进行播放;
缩放控制模块209,用于在播放至缩放图像帧时,控制缩放图像帧在预设的缩放时长内,缩小至终端设备上的播放区域内的目标位置,缩放图像帧为第一视频中的最后一个图像帧,目标位置为若缩放前图像帧按照原始尺寸在播放区域内播放,总裁剪路径中包括的缩放前图像帧对应的裁剪框的位置,缩放前图像帧为缩放图像帧对应在目标视频中,未经过裁剪的图像帧;
原始尺寸控制模块210,用于控制第二视频在播放区域内,按照原始尺寸进行播放。
在一种实现方式中,缩放控制模块209可以用于:
确定缩放图像的多个第一顶点,和每个第一顶点对应在目标位置的第二顶点。根据每个第一顶点与对应的第二顶点之间的距离,和缩放时长,确定该第一顶点的缩放速度。按照每个第一顶点的缩放速度,控制缩放图像缩小至目标位置。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
综上所述,本公开首先对目标视频进行预处理,以获取目标视频中的多个目标视频帧,之后对每个目标视频帧进行识别,以得到指定对象在目标视频中的位置,最后根据每个目标图像帧中指定对象的位置,在多个目标图像帧中,选取保留图像帧,保留图像帧用于指示对目标图像视频中位于保留图像帧之前的图像帧进行裁剪。本公开以目标图像帧中指定对象的位置为依据,确定保留图像帧,从而指示对保留图像帧之前的图像帧进行裁剪,能够确定适用于不同目标视频的保留图像帧,减少在对目标视频进行裁剪过程中造成的信息损失。
下面参考图17,其示出了适于用来实现本公开实施例的电子设备(即上述视频的处理方法的执行主体)300的结构示意图。本公开实施例中的电子设备可以是服务器,该服务器例如可以是本地服务器或者云服务器。电子设备也可以是终端设备,终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图17示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图17所示,电子设备300可以包括处理装置(例如中央处理器、图形处理器等)301,其可以根据存储在只读存储器(ROM)302中的程序或者从存储装置308加载到随机访问存储器(RAM)303中的程序而执行各种适当的动作和处理。在RAM 303中,还存储有电子设备300操作所需的各种程序和数据。处理装置301、ROM 302以及RAM 303通过总线304彼此相连。输入/输出(I/O)接口305也连接至总线304。
通常,以下装置可以连接至I/O接口305:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置306;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置307;包括例如磁带、硬盘等的存储装置308;以及通信装置309。通信装置309可以允许电子设备300与其他设备进行无线或有线通信以交换数据。虽然图17示出了具有各种装置的电子设备300,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置309从网络上被下载和安装,或者从存储装置308被安装,或者从ROM 302被安装。在该计算机程序被处理装置301执行时,执行本公开实施例的方法中限定的上述功能。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
在一些实施方式中,终端设备、服务器可以利用诸如HTTP(HyperText Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:对目标视频进行预处理,以得到所述目标视频中的多个目标图像帧;识别每个所述目标图像帧中指定对象的位置;根据每个所述目标图像帧中所述指定对象的位置,在所述多个目标图像帧中确定保留图像帧,所述保留图像帧用于指示对所述目标视频中位于所述保留图像帧之前的图像帧进行裁剪。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言——诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)——连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的模块可以通过软件的方式实现,也可以通过硬 件的方式来实现。其中,模块的名称在某种情况下并不构成对该模块本身的限定,例如,预处理模块还可以被描述为“获取目标图像帧的模块”。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
根据本公开的一个或多个实施例,提供了一种视频的处理方法,包括:对目标视频进行预处理,以得到所述目标视频中的多个目标图像帧;识别每个所述目标图像帧中指定对象的位置;根据每个所述目标图像帧中所述指定对象的位置,在所述多个目标图像帧中确定保留图像帧,所述保留图像帧用于指示对所述目标视频中位于所述保留图像帧之前的图像帧进行裁剪。
根据本公开的一个或多个实施例,在本公开提供的视频的处理方法中,所述对目标视频进行预处理,以得到所述目标视频中的多个目标图像帧,包括:按照预设的帧间隔,对所述目标视频中的图像帧进行抽取,以得到所述多个目标图像帧;所述指定对象包括:人脸、文字、指定标识、或显著对象中的至少一种;所述识别每个所述目标图像帧中指定对象的位置,包括:对该目标图像帧进行过滤,以去除该目标图像帧中的边框;利用图像识别算法对过滤后的该目标图像帧进行识别,以确定该目标图像帧中所述指定对象的位置,所述图像识别算法包括:人脸识别算法、文字识别算法、指定标识识别算法、或显著性检测算法中的至少一种。
根据本公开的一个或多个实施例,在本公开提供的视频的处理方法中,所述方法还包括:在所述根据每个所述目标图像帧中所述指定对象的位置,在所述多个目标图像帧中确定保留图像帧之后,根据所述目标视频的原始尺寸和终端设备的显示尺寸, 确定裁剪尺寸,所述裁剪尺寸与所述显示尺寸匹配;根据所述裁剪尺寸和预设的步进值,确定第一数量种裁剪框,每种裁剪框在所述目标视频中的图像帧上的位置不同,每种裁剪框均为所述裁剪尺寸;根据每个待裁剪目标图像帧中所述指定对象的位置,和第一数量种裁剪框,确定总裁剪路径,所述待裁剪目标图像帧为所述目标视频中位于所述保留图像帧之前的所述目标图像帧,所述总裁剪路径包括:所述目标视频中位于所述保留图像帧之前的每个图像帧对应的裁剪框。
根据本公开的一个或多个实施例,在本公开提供的视频的处理方法中,所述方法还包括:在所述根据每个待裁剪目标图像帧中所述指定对象的位置,和第一数量种裁剪框,确定总裁剪路径之后,按照所述总裁剪路径对所述目标视频中位于所述保留图像帧之前的每个图像帧进行裁剪,以得到由经过裁剪的图像帧组成的第一视频,和由未经过裁剪的图像帧组成的第二视频。
根据本公开的一个或多个实施例,在本公开提供的视频的处理方法中,多个所述目标图像帧按照在所述目标视频中的顺序排列,所述根据每个所述目标图像帧中所述指定对象的位置,在所述多个目标图像帧中确定保留图像帧,包括:针对每个所述目标图像帧,根据该目标图像帧中所述指定对象的位置,确定该目标图像帧对应的目标比例,和该目标图像帧对应的目标权重,所述目标比例用于指示所述指定对象占该目标图像帧的比例,所述目标权重为根据该目标图像帧中每个像素的权重确定的;将对应的所述目标比例大于预设的比例阈值,且顺序最前的所述目标图像帧作为第一目标图像帧;将所述目标权重大于预设的权重阈值,且顺序最前的所述目标图像帧作为第二目标图像帧;根据所述第一目标图像帧和所述第二目标图像帧,确定所述保留图像帧。
根据本公开的一个或多个实施例,在本公开提供的视频的处理方法中,所述根据所述第一目标图像帧和所述第二目标图像帧,确定所述保留图像帧,包括:若所述目标视频中待选目标图像帧位于预设的最迟图像帧之前,将所述待选目标图像帧作为所述保留图像帧,所述待选目标图像帧为所述第一目标图像帧和所述第二目标图像帧中顺序在前的目标图像帧;若所述目标视频中所述待选目标图像帧位于所述最迟图像帧之后,或者所述待选目标图像帧与所述最迟图像帧相同,将所述最迟图像帧作为所述保留图像帧。
根据本公开的一个或多个实施例,在本公开提供的视频的处理方法中,所述根据每个待裁剪目标图像帧中所述指定对象的位置,和第一数量种裁剪框,确定总裁剪路 径,包括:根据第一数量种裁剪框和所述待裁剪目标图像帧的第二数量,确定第三数量种裁剪路径,每种裁剪路径包括:第二数量个所述待裁剪目标图像帧中,每个所述待裁剪目标图像帧对应的裁剪框;根据每个所述待裁剪目标图像帧中所述指定对象的位置,确定每种裁剪路径对应的裁剪损失总量,所述裁剪损失总量用于指示按照该种裁剪路径对第二数量个所述待裁剪目标图像帧进行裁剪所造成的损失大小;将裁剪损失总量最小的一种裁剪路径作为目标裁剪路径;对所述目标裁剪路径进行插值,得到所述总裁剪路径。
根据本公开的一个或多个实施例,在本公开提供的视频的处理方法中,所述根据每个所述待裁剪目标图像帧中所述指定对象的位置,确定每种裁剪路径对应的裁剪损失总量,包括:根据每个所述待裁剪目标图像帧中所述指定对象的位置,确定该待裁剪目标图像帧对应的裁剪损失量,所述裁剪损失量用于指示按照第一裁剪路径中包括的该待裁剪目标图像帧对应的裁剪框,对该待裁剪目标图像帧进行裁剪所造成的损失大小,所述第一裁剪路径为第三数量种裁剪路径中的任一种裁剪路径;对每个所述待裁剪目标图像帧对应的裁剪损失量求和,得到所述第一裁剪路径对应的裁剪损失总量。
根据本公开的一个或多个实施例,在本公开提供的视频的处理方法中,所述根据每个所述待裁剪目标图像帧中所述指定对象的位置,确定该待裁剪目标图像帧对应的裁剪损失量,包括:根据裁剪后图像帧中所述指定对象占所述裁剪后图像帧的比例,与该待裁剪目标图像帧中所述指定对象占该待裁剪目标图像帧的比例,确定该待裁剪目标图像帧对应的重要性评分,其中,所述裁剪后图像帧为按照所述第一裁剪路径中包括的该待裁剪目标图像帧对应的裁剪框,对该待裁剪目标图像帧进行裁剪后得到的图像帧;根据所述裁剪后图像帧中所述指定对象的完整程度,确定该待裁剪目标图像帧对应的完整性评分;根据第一裁剪框和第二裁剪框之间的距离,确定该待裁剪目标图像帧对应的转移评分,所述第一裁剪框为所述第一裁剪路径中包括的该待裁剪目标图像帧对应的裁剪框,所述第二裁剪框为所述第一裁剪路径中包括的,该待裁剪目标图像帧的上一待裁剪目标图像帧对应的裁剪框;根据该待裁剪目标图像帧对应的重要性评分、完整性评分和转移评分,确定该待裁剪目标图像帧对应的裁剪损失量。
根据本公开的一个或多个实施例,在本公开提供的视频的处理方法中,,所述方法还包括:在所述按照所述总裁剪路径对所述目标视频中位于所述保留图像帧之前的每个图像帧进行裁剪,以得到由经过裁剪的图像帧组成的第一视频,和由未经过裁剪的图像帧组成的第二视频之后,控制所述第一视频在所述终端设备上以全屏模式进行 播放;在播放至缩放图像帧时,控制所述缩放图像帧在预设的缩放时长内,缩小至所述终端设备上的播放区域内的目标位置,其中,所述缩放图像帧为所述第一视频中的最后一个图像帧;若缩放前图像帧按照所述原始尺寸在所述播放区域内播放,所述目标位置为所述总裁剪路径中包括的所述缩放前图像帧对应的裁剪框的位置;所述缩放前图像帧为所述缩放图像帧对应在目标视频中,未经过裁剪的图像帧;控制所述第二视频在所述播放区域内,按照所述原始尺寸进行播放。
根据本公开的一个或多个实施例,在本公开提供的视频的处理方法中,所述控制所述缩放图像帧在预设的缩放时长内,缩小至所述终端设备上的播放区域内的目标位置,包括:确定所述缩放图像的多个第一顶点,和每个所述第一顶点对应在所述目标位置的第二顶点;根据每个所述第一顶点与对应的所述第二顶点之间的距离,和所述缩放时长,确定该第一顶点的缩放速度;按照每个所述第一顶点的缩放速度,控制所述缩放图像缩小至所述目标位置。
根据本公开的一个或多个实施例,提供了一种视频的处理装置,包括:预处理模块,用于对目标视频进行预处理,以得到所述目标视频中的多个目标图像帧;识别模块,用于识别每个所述目标图像帧中指定对象的位置;第一确定模块,用于根据每个所述目标图像帧中所述指定对象的位置,在所述多个目标图像帧中确定保留图像帧,所述保留图像帧用于指示对所述目标视频中位于所述保留图像帧之前的图像帧进行裁剪。
根据本公开的一个或多个实施例,提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理装置执行时实现示例1至示例11中所述方法的步骤。
根据本公开的一个或多个实施例,提供了一种电子设备,包括:一个或多个处理器;和存储器,用于存储一个或多个程序;当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现前述任意一种视频处理方法。
根据本公开的一个或多个实施例,提供了一种计算机程序,包括:指令,所述指令当由处理器执行时使所述处理器执行前述任意一种视频处理方法。
根据本公开的一个或多个实施例,提供了一种计算机程序产品,包括指令,所述指令当由处理器执行时使所述处理器执行前述任意一种视频处理方法。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同 特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。

Claims (16)

  1. 一种视频的处理方法,包括:
    对目标视频进行预处理,以得到所述目标视频中的多个目标图像帧;
    识别每个所述目标图像帧中指定对象的位置;
    根据每个所述目标图像帧中所述指定对象的位置,在所述多个目标图像帧中确定保留图像帧,所述保留图像帧用于指示对所述目标视频中位于所述保留图像帧之前的图像帧进行裁剪。
  2. 根据权利要求1所述的方法,其中,
    所述对目标视频进行预处理,以得到所述目标视频中的多个目标图像帧,包括:
    按照预设的帧间隔,对所述目标视频中的图像帧进行抽取,以得到所述多个目标图像帧;
    所述指定对象包括:人脸、文字、指定标识、或显著对象中的至少一种;
    所述识别每个所述目标图像帧中指定对象的位置,包括:
    对所述目标图像帧进行过滤,以去除所述目标图像帧中的边框;
    利用图像识别算法对过滤后的所述目标图像帧进行识别,以确定所述目标图像帧中所述指定对象的位置,所述图像识别算法包括:人脸识别算法、文字识别算法、指定标识识别算法、或显著性检测算法中的至少一种。
  3. 根据权利要求1所述的方法,还包括:
    在所述根据每个所述目标图像帧中所述指定对象的位置,在所述多个目标图像帧中确定保留图像帧之后,根据所述目标视频的原始尺寸和终端设备的显示尺寸,确定裁剪尺寸,所述裁剪尺寸与所述显示尺寸匹配;
    根据所述裁剪尺寸和预设的步进值,确定第一数量种裁剪框,每种裁剪框在所述目标视频中的图像帧上的位置不同,每种裁剪框均为所述裁剪尺寸;
    根据每个待裁剪目标图像帧中所述指定对象的位置,和第一数量种裁剪框,确定总裁剪路径,所述待裁剪目标图像帧为所述目标视频中位于所述保留图像帧之前的所述目标图像帧,所述总裁剪路径包括:所述目标视频中位于所述保留图像帧之前的每个图像帧对应的裁剪框。
  4. 根据权利要求3所述的方法,还包括:
    在所述根据每个待裁剪目标图像帧中所述指定对象的位置,和第一数量种裁剪框,确定总裁剪路径之后,按照所述总裁剪路径对所述目标视频中位于所述保留图像帧之前的每个图像帧进行裁剪,以得到由经过裁剪的图像帧组成的第一视频,和由未经过裁剪的图像帧组成的第二视频。
  5. 根据权利要求1-4中任一项所述的方法,其中,多个所述目标图像帧按照在所述目标视频中的顺序排列,所述根据每个所述目标图像帧中所述指定对象的位置,在所述多个目标图像帧中确定保留图像帧,包括:
    针对每个所述目标图像帧,根据所述目标图像帧中所述指定对象的位置,确定所述目标图像帧对应的目标比例,和所述目标图像帧对应的目标权重,所述目标比例用于指示所述指定对象占所述目标图像帧的比例,所述目标权重为根据所述目标图像帧中每个像素的权重确定的;
    将对应的所述目标比例大于预设的比例阈值,且顺序最前的所述目标图像帧作为第一目标图像帧;
    将所述目标权重大于预设的权重阈值,且顺序最前的所述目标图像帧作为第二目标图像帧;
    根据所述第一目标图像帧和所述第二目标图像帧,确定所述保留图像帧。
  6. 根据权利要求5所述的方法,其中,所述根据所述第一目标图像帧和所述第二目标图像帧,确定所述保留图像帧,包括:
    若所述目标视频中待选目标图像帧位于预设的最迟图像帧之前,将所述待选目标图像帧作为所述保留图像帧,所述待选目标图像帧为所述第一目标图像帧和所述第二目标图像帧中顺序在前的目标图像帧;
    若所述目标视频中所述待选目标图像帧位于所述最迟图像帧之后,或者所述待选目标图像帧与所述最迟图像帧相同,将所述最迟图像帧作为所述保留图像帧。
  7. 根据权利要求3或4所述的方法,其中,所述根据每个待裁剪目标图像帧中所述指定对象的位置,和第一数量种裁剪框,确定总裁剪路径,包括:
    根据第一数量种裁剪框和所述待裁剪目标图像帧的第二数量,确定第三数量种裁剪路径,每种裁剪路径包括:第二数量个所述待裁剪目标图像帧中,每个所述待裁剪目标图像帧对应的裁剪框;
    根据每个所述待裁剪目标图像帧中所述指定对象的位置,确定每种裁剪路径对应的裁剪损失总量,所述裁剪损失总量用于指示按照所述裁剪路径对第二数量个所述待裁剪目标图像帧进行裁剪所造成的损失大小;
    将裁剪损失总量最小的一种裁剪路径作为目标裁剪路径;
    对所述目标裁剪路径进行插值,得到所述总裁剪路径。
  8. 根据权利要求7所述的方法,其中,所述根据每个所述待裁剪目标图像帧中所述指定对象的位置,确定每种裁剪路径对应的裁剪损失总量,包括:
    根据每个所述待裁剪目标图像帧中所述指定对象的位置,确定所述待裁剪目标图像帧对应的裁剪损失量,所述裁剪损失量用于指示按照第一裁剪路径中包括的所述待裁剪目标图像帧对应的裁剪框,对所述待裁剪目标图像帧进行裁剪所造成的损失大小,所述第一裁剪路径为第三数量种裁剪路径中的任一种裁剪路径;
    对每个所述待裁剪目标图像帧对应的裁剪损失量求和,得到所述第一裁剪路径对应的裁剪损失总量。
  9. 根据权利要求8所述的方法,其中,所述根据每个所述待裁剪目标图像帧中所述指定对象的位置,确定所述待裁剪目标图像帧对应的裁剪损失量,包括:
    根据裁剪后图像帧中所述指定对象占所述裁剪后图像帧的比例,与所述待裁剪目标图像帧中所述指定对象占所述待裁剪目标图像帧的比例,确定所述待裁剪目标图像帧对应的重要性评分,其中,所述裁剪后图像帧为按照所述第一裁剪路径中包括的所述待裁剪目标图像帧对应的裁剪框,对所述待裁剪目标图像帧进行裁剪后得到的图像帧;
    根据所述裁剪后图像帧中所述指定对象的完整程度,确定所述待裁剪目标图像帧对应的完整性评分;
    根据第一裁剪框和第二裁剪框之间的距离,确定所述待裁剪目标图像帧对应的转移评分,所述第一裁剪框为所述第一裁剪路径中包括的所述待裁剪目标图像帧对应的裁剪框,所述第二裁剪框为所述第一裁剪路径中包括的,所述待裁剪目标图像帧的上 一待裁剪目标图像帧对应的裁剪框;
    根据所述待裁剪目标图像帧对应的重要性评分、完整性评分和转移评分,确定所述待裁剪目标图像帧对应的裁剪损失量。
  10. 根据权利要求4所述的方法,还包括:
    在所述按照所述总裁剪路径对所述目标视频中位于所述保留图像帧之前的每个图像帧进行裁剪,以得到由经过裁剪的图像帧组成的第一视频,和由未经过裁剪的图像帧组成的第二视频之后,控制所述第一视频在所述终端设备上以全屏模式进行播放;
    在播放至缩放图像帧时,控制所述缩放图像帧在预设的缩放时长内,缩小至所述终端设备上的播放区域内的目标位置,其中,所述缩放图像帧为所述第一视频中的最后一个图像帧;若缩放前图像帧按照所述原始尺寸在所述播放区域内播放,所述目标位置为所述总裁剪路径中包括的所述缩放前图像帧对应的裁剪框的位置;所述缩放前图像帧为所述缩放图像帧对应在目标视频中,未经过裁剪的图像帧;
    控制所述第二视频在所述播放区域内,按照所述原始尺寸进行播放。
  11. 根据权利要求10所述的方法,其中,所述控制所述缩放图像帧在预设的缩放时长内,缩小至所述终端设备上的播放区域内的目标位置,包括:
    确定所述缩放图像的多个第一顶点,和每个所述第一顶点对应在所述目标位置的第二顶点;
    根据每个所述第一顶点与对应的所述第二顶点之间的距离,和所述缩放时长,确定所述第一顶点的缩放速度;
    按照每个所述第一顶点的缩放速度,控制所述缩放图像缩小至所述目标位置。
  12. 一种视频的处理装置,包括:
    预处理模块,用于对目标视频进行预处理,以得到所述目标视频中的多个目标图像帧;
    识别模块,用于识别每个所述目标图像帧中指定对象的位置;
    第一确定模块,用于根据每个所述目标图像帧中所述指定对象的位置,在所述多个目标图像帧中确定保留图像帧,所述保留图像帧用于指示对所述目标视频中位于所述保留图像帧之前的图像帧进行裁剪。
  13. 一种计算机可读存储介质,其上存储有计算机程序,该程序被处理装置执行时实现权利要求1-11中任一项所述方法的步骤。
  14. 一种电子设备,包括:
    一个或多个处理器;和
    存储器,用于存储一个或多个程序;
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-11中任一项所述的视频处理方法。
  15. 一种计算机程序,包括:
    指令,所述指令当由处理器执行时使所述处理器执行根据权利要求1-11中任一项所述的视频处理方法。
  16. 一种计算机程序产品,包括指令,所述指令当由处理器执行时使所述处理器执行根据权利要求1-11中任一项所述的视频处理方法。
PCT/CN2021/130875 2020-11-18 2021-11-16 视频的处理方法、装置、可读介质和电子设备 WO2022105740A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/253,357 US11922597B2 (en) 2020-11-18 2021-11-16 Video processing method and apparatus, readable medium, and electronic device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011298813.5A CN112423021B (zh) 2020-11-18 2020-11-18 视频的处理方法、装置、可读介质和电子设备
CN202011298813.5 2020-11-18

Publications (1)

Publication Number Publication Date
WO2022105740A1 true WO2022105740A1 (zh) 2022-05-27

Family

ID=74773539

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/130875 WO2022105740A1 (zh) 2020-11-18 2021-11-16 视频的处理方法、装置、可读介质和电子设备

Country Status (3)

Country Link
US (1) US11922597B2 (zh)
CN (1) CN112423021B (zh)
WO (1) WO2022105740A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112423021B (zh) * 2020-11-18 2022-12-06 北京有竹居网络技术有限公司 视频的处理方法、装置、可读介质和电子设备
CN113840159B (zh) * 2021-09-26 2024-07-16 北京沃东天骏信息技术有限公司 视频处理方法、装置、计算机系统及可读存储介质
CN113660516B (zh) * 2021-10-19 2022-01-25 北京易真学思教育科技有限公司 视频显示方法、装置、设备及介质
CN114627036B (zh) * 2022-03-14 2023-10-27 北京有竹居网络技术有限公司 多媒体资源的处理方法、装置、可读介质和电子设备
CN114584832B (zh) * 2022-03-16 2024-03-08 中信建投证券股份有限公司 视频自适应多尺寸动态播放方法及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060257048A1 (en) * 2005-05-12 2006-11-16 Xiaofan Lin System and method for producing a page using frames of a video stream
US20090251594A1 (en) * 2008-04-02 2009-10-08 Microsoft Corporation Video retargeting
JP2014123908A (ja) * 2012-12-21 2014-07-03 Jvc Kenwood Corp 画像処理装置、画像切り出し方法、及びプログラム
CN110189378A (zh) * 2019-05-23 2019-08-30 北京奇艺世纪科技有限公司 一种视频处理方法、装置及电子设备
CN111881755A (zh) * 2020-06-28 2020-11-03 腾讯科技(深圳)有限公司 一种视频帧序列的裁剪方法及装置
CN112423021A (zh) * 2020-11-18 2021-02-26 北京有竹居网络技术有限公司 视频的处理方法、装置、可读介质和电子设备

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2196930C (en) * 1997-02-06 2005-06-21 Nael Hirzalla Video sequence recognition
JP4226730B2 (ja) * 1999-01-28 2009-02-18 株式会社東芝 物体領域情報生成方法及び物体領域情報生成装置並びに映像情報処理方法及び情報処理装置
US8085302B2 (en) * 2005-11-21 2011-12-27 Microsoft Corporation Combined digital and mechanical tracking of a person or object using a single video camera
US8209733B2 (en) * 2008-05-28 2012-06-26 Broadcom Corporation Edge device that enables efficient delivery of video to handheld device
CN102541494B (zh) * 2010-12-30 2016-01-06 中国科学院声学研究所 一种面向显示终端的视频尺寸转换系统与方法
US8743222B2 (en) * 2012-02-14 2014-06-03 Nokia Corporation Method and apparatus for cropping and stabilization of video images
CN106797499A (zh) 2014-10-10 2017-05-31 索尼公司 编码装置和方法、再现装置和方法以及程序
US9973711B2 (en) * 2015-06-29 2018-05-15 Amazon Technologies, Inc. Content-based zooming and panning for video curation
CN106231399A (zh) 2016-08-01 2016-12-14 乐视控股(北京)有限公司 视频分割方法、设备以及系统
EP3482286A1 (en) * 2016-11-17 2019-05-15 Google LLC Media rendering with orientation metadata
EP3340104B1 (en) * 2016-12-21 2023-11-29 Axis AB A method for generating alerts in a video surveillance system
CN109068150A (zh) * 2018-08-07 2018-12-21 深圳市创梦天地科技有限公司 一种视频的精彩画面提取方法、终端及计算机可读介质
CN111010590B (zh) * 2018-10-08 2022-05-17 阿里巴巴(中国)有限公司 一种视频裁剪方法及装置
CN111652043A (zh) * 2020-04-15 2020-09-11 北京三快在线科技有限公司 对象状态识别方法、装置、图像采集设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060257048A1 (en) * 2005-05-12 2006-11-16 Xiaofan Lin System and method for producing a page using frames of a video stream
US20090251594A1 (en) * 2008-04-02 2009-10-08 Microsoft Corporation Video retargeting
JP2014123908A (ja) * 2012-12-21 2014-07-03 Jvc Kenwood Corp 画像処理装置、画像切り出し方法、及びプログラム
CN110189378A (zh) * 2019-05-23 2019-08-30 北京奇艺世纪科技有限公司 一种视频处理方法、装置及电子设备
CN111881755A (zh) * 2020-06-28 2020-11-03 腾讯科技(深圳)有限公司 一种视频帧序列的裁剪方法及装置
CN112423021A (zh) * 2020-11-18 2021-02-26 北京有竹居网络技术有限公司 视频的处理方法、装置、可读介质和电子设备

Also Published As

Publication number Publication date
CN112423021A (zh) 2021-02-26
CN112423021B (zh) 2022-12-06
US11922597B2 (en) 2024-03-05
US20230394625A1 (en) 2023-12-07

Similar Documents

Publication Publication Date Title
WO2022105740A1 (zh) 视频的处理方法、装置、可读介质和电子设备
US11443438B2 (en) Network module and distribution method and apparatus, electronic device, and storage medium
CN112954450B (zh) 视频处理方法、装置、电子设备和存储介质
CN112101305B (zh) 多路图像处理方法、装置及电子设备
CN112561840B (zh) 视频裁剪方法、装置、存储介质及电子设备
CN107084740B (zh) 一种导航方法和装置
CN112182299B (zh) 一种视频中精彩片段的获取方法、装置、设备和介质
CN110781823B (zh) 录屏检测方法、装置、可读介质及电子设备
CN104918107A (zh) 视频文件的标识处理方法及装置
JP7331146B2 (ja) サブタイトルのクロスボーダーの処理方法、装置及び電子装置
CN112565890B (zh) 视频裁剪方法、装置、存储介质及电子设备
WO2022116772A1 (zh) 视频裁剪方法、装置、存储介质及电子设备
CN112101258A (zh) 图像处理方法、装置、电子设备和计算机可读介质
CN112085733B (zh) 图像处理方法、装置、电子设备和计算机可读介质
CN112598571B (zh) 一种图像缩放方法、装置、终端及存储介质
US11810336B2 (en) Object display method and apparatus, electronic device, and computer readable storage medium
US20220084314A1 (en) Method for obtaining multi-dimensional information by picture-based integration and related device
CN114399696A (zh) 一种目标检测方法、装置、存储介质及电子设备
CN111221455B (zh) 素材展示方法、装置、终端及存储介质
CN114863392A (zh) 车道线检测方法、装置、车辆及存储介质
CN111353929A (zh) 图像处理方法、装置和电子设备
EP4395355A1 (en) Video processing method and apparatus, and electronic device and storage medium
CN112651909B (zh) 图像合成方法、装置、电子设备及计算机可读存储介质
US20230376122A1 (en) Interface displaying method, apparatus, device and medium
CN116320617A (zh) 有效区域的确定方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21893887

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21893887

Country of ref document: EP

Kind code of ref document: A1